Back to blog

Dogfooding DevSpark: Building the Plane While Flying It

April 17, 20269 min read

An exploration of what happens when you use DevSpark to build DevSpark — the meta-level traps, the override paradoxes, and the practical decisions that emerged when the prompts driving the work were the same prompts under revision.

DevSpark Series — 24 articles
  1. DevSpark: Constitution-Driven AI for Software Development
  2. Getting Started with DevSpark: Requirements Quality Matters
  3. DevSpark: Constitution-Based Pull Request Reviews
  4. Why I Built DevSpark
  5. Taking DevSpark to the Next Level
  6. From Oracle CASE to Spec-Driven AI Development
  7. Fork Management: Automating Upstream Integration
  8. DevSpark: The Evolution of AI-Assisted Software Development
  9. DevSpark: Months Later, Lessons Learned
  10. DevSpark in Practice: A NuGet Package Case Study
  11. DevSpark: From Fork to Framework — What the Commits Reveal
  12. DevSpark v0.1.0: Agent-Agnostic, Multi-User, and Built for Teams
  13. DevSpark Monorepo Support: Governing Multiple Apps in One Repository
  14. The DevSpark Tiered Prompt Model: Resolving Context at Scale
  15. A Governed Contribution Model for DevSpark Prompts
  16. Prompt Metadata: Enforcing the DevSpark Constitution
  17. Bring Your Own AI: DevSpark Unlocks Multi-Agent Collaboration
  18. Workflows as First-Class Artifacts: Defining Operations for AI
  19. Observability in AI Workflows: Exposing the Black Box
  20. Autonomy Guardrails: Bounding Agent Action Safely
  21. Dogfooding DevSpark: Building the Plane While Flying It
  22. Closing the Loop: Automating Feedback with Suggest-Improvement
  23. Designing the DevSpark CLI UX: Commands vs Prompts
  24. The Alias Layer: Masking Complexity in Agent Invocations

A Question Without a Clean Answer

There's a moment that comes up early in any tooling project, and it's worth examining before you write a line of framework code: when you go to use the thing you're building, which version are you actually running — the one you shipped, or the one you're changing right now?

For most software, the question barely registers. Microsoft writes Windows on Windows, but the build system, the editor, and the operating system occupy distinct layers. Slack runs on Slack, but typing a message doesn't recompile the client. The product and the workbench live in different rooms of the same house.

DevSpark doesn't have that separation. The prompts are the product. When I invoke @devspark.specify to draft a spec for a DevSpark feature, the prompt that runs is the same file I might be editing in the next pane. Change a word in the template, and the next invocation has the new behavior. No build step. No deploy. No deployment lag between edit and experience. That tightness is part of what makes DevSpark feel useful — and it's also exactly what makes dogfooding it interesting.

The Old Term, the Old Commercial

The phrase "eating your own dog food" has been knocking around the industry since at least 1988, when a Microsoft manager named Paul Maritz sent an internal email titled "Eating our own Dogfood" pushing the company to use its own products more aggressively. The folk story credits a Kal Kan executive who supposedly ate his company's pet food at shareholder meetings to prove its quality. Whether that's history or marketing legend, the metaphor stuck.

It's worth knowing the variants, because each one carries a slightly different attitude:

  • Eating your own dog food — the original. Blunt.
  • Drinking your own champagne — the European rebrand. Same idea, better optics.
  • Icecreaming — what Google used for a while, presumably because nobody wants to think about kibble.
  • Eating your own cooking — the version that survives an executive presentation.
  • Self-hosting — what compiler people say. Building a compiler with itself is a rite of passage.
  • Bootstrapping — adjacent but distinct. A system that can build itself from scratch.

The core idea never changes: if you won't use what you're shipping, why should anyone else?

There's also that old EDS commercial that's hard to shake — a crew assembling an airplane mid-flight, riveting on the wings as the fuselage hurtles through the sky. I worked at EDS early in my career, back when those Fallon-produced spots were still echoing around the company, and I loved them. They captured something true about the work: the absurdity, the stakes, and the weird pride that comes from doing serious engineering on a moving target. (If you've never seen them — or you have and you want the full backstory of the trilogy that put EDS on the cultural map — I wrote about the whole campaign in A Full History of the EDS Super Bowl Commercials.)

The "Airplane" spot has stuck with me for twenty-some years, and it keeps coming back when I'm working on tools that change while you're using them. The tagline was some variation of "We solve complex problems." It was theatrical. It was also a surprisingly accurate metaphor for what happens when the tool you're using is the tool you're changing.

Where the Recursion Bites

Once you commit to using DevSpark to build DevSpark, a handful of meta-level problems show up that don't appear in normal projects. They're worth naming, because each one shapes a design decision.

The Stale Copy Trap: Reaching for the Wrong Wrench

DevSpark's normal install copies stock prompts into .devspark/defaults/commands/. For a consumer repository, that's exactly what you want — a stable snapshot of the framework that doesn't shift under you. In the source repo, those same copies become a trap.

Edit templates/commands/specify.md to fix a phrasing problem. Forget to re-copy it to .devspark/defaults/commands/devspark.specify.md. The next time you invoke the command, you're running yesterday's prompt while assuming you're running today's. I know this because I did it. Thirty minutes tracing behavior in .devspark/defaults/commands/, watching the agent produce responses that didn't match what I'd just changed — and the fix had been sitting in templates/commands/ the whole time. I'd edited the right file and never synchronized it. No error. No 404. Just a subtly wrong response and the slow realization that I'd been running stale code since morning.

It's the mid-flight equivalent of reaching for the right wrench and grabbing last week's replacement — identical in your hand, different in what it'll fix. This kind of failure is universal — every developer has lived through some version of "did you rebuild?" or "did you restart the server?" — but in a prompt-driven system there's no compiler error to flag it.

The Override Paradox: Autopilot Fighting the Manual Controls

DevSpark has a 3-tier override system: personal overrides shadow team overrides, which shadow stock defaults. It's a clever pattern for customization, and a hostile one for dogfooding.

If a contributor creates a personal override to test a prompt variation, they're no longer testing the source. Worse, they may not realize it. The override system is doing exactly what it was designed to do — quietly preferring the most specific version it can find. It just happens to do that at the worst possible moment, which is when you need to be sure you're exercising the canonical behavior. Think of it as the autopilot silently fighting the manual controls: you think you're steering the system, but a background process has been steering for you the whole time.

The Upgrade Impossibility: You Are the Destination

There's an upgrade command that checks your installed version against the latest release and refreshes framework files. In the source repo, this command is philosophically incoherent. You can't upgrade to the latest version when you are the latest version. Running it would either do nothing — confusing — or overwrite source files with their own copies, which is destructive in addition to pointless.

The Chicken-and-Egg Spec Problem: Writing the Manual Mid-Flight

This is the one that genuinely amuses me. You can absolutely use /devspark.specify to write a spec for improving /devspark.specify. It's valid. It's also recursive in a way that matters: the spec you produce is shaped by the current version of the specify prompt. If that prompt has the flaw you're trying to fix, the spec inherits it. You're using a broken tool to write the repair manual for the same broken tool.

It's not theoretical. It happens. The rest of this article solves recursion traps with mechanical architecture — agent shims, direct source resolution, guard clauses. The chicken-and-egg problem deserves the same treatment.

Two procedural circuit breakers apply here:

Clean room practice. When writing a spec for the specify command itself, step outside DevSpark entirely. Open a vanilla, unprompted LLM chat — no system context, no constitution rules injected in the background. Draft the foundational spec there, then bring it back. By physically leaving the environment, you mechanically break the recursive loop rather than hoping fresh eyes catch what the tool's priors buried.

Manual bootstrapping mode. A harder but more elegant solution is a --bootstrap flag on specify that temporarily disables all prompt augmentations and constitution injections. Run it with the raw model, produce the seed spec, then re-enter the framework to refine it. You sever the loop procedurally rather than relying on the developer's awareness to notice when the tool's confidence has outrun the underlying logic.

Human judgment remains the final backstop — but you don't build a plane mid-flight by telling the crew to be really aware of gravity today. You build harnesses.

Cutting the Indirection

The fix turned out to be smaller than I expected: stop pretending the source repo is a consumer repo.

In a consumer install, every command resolves through the override chain — personal, then team, then the copy in .devspark/defaults/commands/. In the source repo, every agent shim points directly at the source file:

Read and follow the instructions in `templates/commands/specify.md` exactly.

That's the entire GitHub Copilot shim for devspark.specify. The Claude Code variant is the same with User input: $ARGUMENTS appended. No override chain. No copied file. No indirection. Edit templates/commands/specify.md, invoke the command, and you're running the code you just wrote. The feedback loop is as tight as a prompt-driven system can be.

Only .devspark/VERSION and .devspark/schemas/ survived — metadata that doesn't duplicate source content. The directories that would normally hold copied defaults don't exist here, because the shims don't need them. They point at the source.

Commands That Earned a Bouncer

The software industry is obsessed with frictionless tools. Hide the mechanics, anticipate the workflow, never force the user to think about the underlying system. That instinct produces excellent consumer software.

When you're dogfooding, it produces a trap.

Using DevSpark to build DevSpark means constantly shifting between two incompatible identities: consumer and contributor. The consumer expects the tool to serve them. The contributor is responsible for the tool's underlying logic. Drifting between those two mindsets mid-session is cognitively expensive — and the drift happens silently, by default. One minute I'm using the framework to draft a spec; the next I'm supposed to remember that I am the framework's canonical behavior. Those aren't the same job.

The blocked commands are the mechanical response to that drift. Think about the difference between an automatic and a manual transmission. A consumer-grade tool is the automatic: it shifts gears invisibly, hides the RPMs, lets you focus entirely on the road. When you're dogfooding, you aren't just driving — you're tuning the engine while it's moving. That context demands a manual transmission. The friction of pressing the clutch before shifting isn't a defect. It forces you to feel what the engine is doing.

Consider upgrade. A normal user invokes it to fetch the latest framework release and refresh their install. Running it inside the source repository would overwrite unreleased code with the published version — destroying work in progress. The blocked command doesn't just prevent an error; it snaps me out of consumer mode with a message that explains why I'm in the wrong room. That's documentation arriving exactly when it's needed.

personalize works the same way. A consumer creates personal overrides to tune the tool for their machine. Running it in the source repo would hard-code personal tweaks into the global repository that everyone else will eventually download. The bouncer blocks it and redirects to editing the source directly — the right action, explained at the moment of attempted wrong action.

Failing loudly, rather than failing silently, violently snaps the developer out of consumer mindset and back into contributor mindset. The bouncer isn't there to make the workflow harder. It's there to stop you from walking into the wrong room and burning the building down.

CommandWhy it's blockedWhat to do instead
upgradeYou ARE the latest versionEdit CHANGELOG.md and .devspark/VERSION
personalizeOverrides would shadow sourceEdit templates/commands/{name}.md directly
add-applicationNot a multi-app monorepoTest with tests/fixtures/ or examples/todo-app/
list-applicationsSameSame
discover-constitutionConstitution already exists as the sourceUse evolve-constitution or edit directly
archiveDeprecatedUse harvest

The other 21 commands work normally and resolve straight to source. Every time a contributor hits one of those STOP messages, they learn something concrete about how the source repo differs from a consumer repo. That distinction matters more than I anticipated.

What Actually Gets Better

It would be easy to frame this whole exercise as overhead — extra rules, extra guard clauses, an architecture that exists to manage its own recursion. That's part of it. But the more interesting effect is how dogfooding changes what you notice, and I mean that concretely.

The first time I invoked /devspark.specify against a DevSpark feature and watched the agent hedge around a phrase I'd written myself, I understood the problem before I finished reading the output. The word was "appropriate" — I'd used it in the template because it sounded professional. But following the prompt, I realized it meant nothing. Appropriate for what context? What audience? What constraint? The agent was as lost as any reasonable reader would be. I'd been writing prompts that worked well enough when I trusted myself to fill in the gaps. Following them as their own first user, I had to make those gaps explicit — and the gaps stopped working. No review process would have caught that. It took being the one trying to comply.

The constitution does the same thing at a higher level. DevSpark's constitution defines non-negotiable principles — they're supposed to be load-bearing. Most days they live in the background, referenced at commit time. When I'm using the tool to build itself, constitutional violations stop being review-checklist concerns and become immediate friction. I feel a principle when I'm about to break it, because there's no distance between "the framework says" and "I'm doing."

The bouncers teach something harder to document: that the source repo and a consumer install are categorically different environments, not just structurally different ones. Every time I hit an upgrade block or a personalize redirect, I feel the boundary that the architecture drew. That's not learning from a README. That's learning from being stopped.

Which brings me back to the EDS spot. The crew on the wing wasn't being reckless — they were doing the only kind of engineering the situation allowed, paying close attention to which bolt was which. I've been building this plane while flying it, and what I've learned is that dogfooding a tool teaches you things the tool can't know about itself. The architecture is the answer, but using it is the question — and the question doesn't stop after the first session.

Explore More