Dogfooding DevSpark: Building the Plane While Flying It
A first-person look at what it's actually like to dogfood DevSpark — using a prompt tool to refine a prompt tool — anchored in an old EDS Super Bowl commercial about building a plane mid-flight.
DevSpark Series — 25 articles
- DevSpark: Constitution-Driven AI for Software Development
- Getting Started with DevSpark: Requirements Quality Matters
- DevSpark: Constitution-Based Pull Request Reviews
- Why I Built DevSpark
- Taking DevSpark to the Next Level
- From Oracle CASE to Spec-Driven AI Development
- Fork Management: Automating Upstream Integration
- DevSpark: The Evolution of AI-Assisted Software Development
- DevSpark: Months Later, Lessons Learned
- DevSpark in Practice: A NuGet Package Case Study
- DevSpark: From Fork to Framework — What the Commits Reveal
- DevSpark v0.1.0: Agent-Agnostic, Multi-User, and Built for Teams
- DevSpark Monorepo Support: Governing Multiple Apps in One Repository
- The DevSpark Tiered Prompt Model: Resolving Context at Scale
- A Governed Contribution Model for DevSpark Prompts
- Prompt Metadata: Enforcing the DevSpark Constitution
- Bring Your Own AI: DevSpark Unlocks Multi-Agent Collaboration
- Workflows as First-Class Artifacts: Defining Operations for AI
- Observability in AI Workflows: Exposing the Black Box
- Autonomy Guardrails: Bounding Agent Action Safely
- Dogfooding DevSpark: Building the Plane While Flying It
- Closing the Loop: Automating Feedback with Suggest-Improvement
- Designing the DevSpark CLI UX: Commands vs Prompts
- The Alias Layer: Masking Complexity in Agent Invocations
- DevSpark Blogging Workflow: How I Built Better Articles
The Word That Wasn't Doing Any Work
The agent was hedging around the word appropriate — and I had written that word.
I'd dropped it into the specify template a week earlier because it sounded professional. Use the appropriate level of detail. That's the kind of phrase that survives a draft because it sounds like it means something. Following my own prompt as its own first user, watching the model try to comply, I could see exactly how empty it was. Appropriate for what? What audience? What constraint? The agent was as lost as any honest reader would be, because I had written the prompt the way I write everything — assuming I'd be there to fill in the gaps.
That's the moment dogfooding stopped being a slogan and became the actual job.
Microsoft has called this eating your own dog food since 1988; compiler people call it self-hosting; whatever the label, the question is the same — if you won't run what you ship, why should anyone else? But the version that matters here is narrower and stranger than the slogan: I'm using a prompt tool to refine a prompt tool. The thing being changed is the same thing doing the changing. There is no compiler error when the wrench in my hand is last week's wrench. There's only a subtly wrong answer and the slow recognition that I'm the one who put it there.
Building the Plane While Flying It
I worked at EDS early in my career, back when the Fallon-produced Super Bowl spots were still echoing around the hallways. The one that's stayed with me for twenty-some years is Airplane — a crew assembling an aircraft mid-flight, riveting wings onto a fuselage already in the sky, leaning into the wind to drive the next bolt. The tagline was some variation of We solve complex problems. It was theatrical. It was also the most honest metaphor for systems work I've ever seen on television. (I wrote about the whole campaign in A Full History of the EDS Super Bowl Commercials if you want the full backstory.)
I loved that commercial then because it captured the absurdity of the work — the stakes, the visible engineering, the weird pride of doing something serious on a moving target. I love it now for a different reason: it's the cleanest picture I have of what it's actually like to use DevSpark to build DevSpark.
The crew on the wing aren't doing reckless engineering. They're doing the only kind the situation allows. They can't land the plane to fix the plane. They can't simulate the wind. They can't ask the engine to please stop turning while they replace a turbine blade. Every change happens on the same surface they're standing on. Every tool they reach for has to keep working the moment after they modify it. And the wind never stops.
That is the recursion. Not a clever architectural puzzle. A physical condition.
For most software, the condition doesn't apply. Microsoft writes Windows on Windows, but the editor and the OS occupy distinct layers. Slack runs on Slack, but typing a message doesn't recompile the client. The product and the workbench live in different rooms. DevSpark doesn't have rooms. The prompts are the product. When I invoke @devspark.specify to draft a spec for a DevSpark feature, the prompt that runs is the same file I might be editing in the next pane. Change a word in the template, and the next invocation has the new behavior. No build step. No deploy. The tightness of that loop is what makes the tool feel useful — and it's also what puts me on the wing.
What Goes Wrong When the Tool Is the Workbench
The first time I lost thirty minutes to my own framework, the symptom was banal. I'd edited a template to fix a phrasing problem. The agent kept producing the old phrasing. I checked the file. Edited. Saved. Re-checked. Edited. Saved. Re-ran. Old phrasing again.
What I'd missed was that the install copies stock prompts into a defaults directory — a sensible choice for a consumer repository where you want a stable snapshot that doesn't shift under you. In the source repo, that same copy is a trap. I'd edited the right file. I'd just never told the cached one. There was no error. No 404. The tool was working perfectly. It was running yesterday's prompt because that's what was actually on disk where it was looking. I was the only thing that was confused.
That's the first lesson the wing teaches: when you change a tool while standing on it, every assumption about which version is running is suspect. Compilers throw errors when their inputs disagree. Prompt systems just produce slightly wrong answers and let you sit with them.
The second lesson is sneakier. DevSpark has a three-tier override system — personal overrides shadow team overrides, which shadow stock defaults. For a consumer customizing the tool for their machine, that's exactly the right behavior. For someone trying to confirm that the canonical prompt does what they think it does, it's the autopilot quietly fighting the manual controls. You think you're steering the system. A background process has been steering for you the whole time, doing exactly what it was designed to do, at the worst possible moment.
The third lesson is the funniest, and I keep coming back to it. You can absolutely use /devspark.specify to write a spec for improving /devspark.specify. It's valid. It's also recursive in a way that matters: the spec you produce is shaped by the current version of the prompt. If that prompt has the flaw you're trying to fix, the spec inherits it. You're using a broken wrench to write the repair manual for the same broken wrench, and the manual reads beautifully because the wrench was confident.
The way I've come to handle that one is mechanical. When I'm specifying a change to the specify command itself, I open a different chat window — no constitution, no system prompt, no DevSpark context — and draft the seed there. Physically leaving the tool is the only reliable way I've found to stop the tool from quietly editing my thinking. I've sketched out a --bootstrap flag that would disable all augmentations from inside DevSpark, but I'm not sure I trust myself to remember to use it. The other window is dumber, and dumber is what I need.
Human judgment is the final backstop. But you don't build a plane mid-flight by telling the crew to be really aware of gravity today. You build harnesses.
The Harnesses I've Bolted On
The fix for most of this turned out to be smaller than I expected: stop pretending the source repo is a consumer repo.
In a consumer install, every command resolves through the override chain — personal, then team, then the cached default. In the source repo, the agent shim points straight at the source file. No override chain. No copied file. No indirection. Edit the source, invoke the command, and you're running the code you just wrote. The feedback loop is as tight as a prompt-driven system can be, which is the only loop honest enough to dogfood with.
The other harness is louder. A small set of commands now refuse to run inside the source repo and explain why. upgrade is the obvious one: a consumer invokes it to fetch the latest release, but inside the source repo it would overwrite unreleased work with the published version — destroying changes in progress. personalize is the same shape: a consumer creates overrides to tune the tool for their machine, but inside the source repo those overrides would shadow the canonical files everyone else will eventually download. The blocked commands don't just prevent an error. They snap me out of consumer mode with a message that explains why I'm in the wrong room. Documentation arriving exactly when it's needed.
That distinction — between consumer mode and contributor mode — turns out to be the cognitive load I most consistently underestimate. A consumer-grade tool is an automatic transmission. It shifts gears invisibly, hides the RPMs, lets you focus on the road. When you're dogfooding, you aren't just driving. You're tuning the engine while it's moving. That context demands a manual transmission. The friction of pressing the clutch before shifting isn't a defect. It's the only way to feel what the engine is doing. Failing loudly, rather than failing silently, is the bouncer that stops me from walking into the wrong room and burning the building down.
What the Wing Teaches
It would be easy to frame all of this as overhead — extra rules, extra guards, an architecture that exists to manage its own recursion. Some of it is. The more interesting effect is what dogfooding changes about what I notice.
The constitution is the clearest example. DevSpark's constitution defines principles that are supposed to be load-bearing. Most days they live in the background, referenced at commit time. When I'm using the tool to build itself, constitutional violations stop being review-checklist concerns and become immediate friction. I feel a principle when I'm about to break it, because there's no distance between the framework says and I'm doing. The same applies, smaller, every time a prompt of mine produces a hedge instead of an answer. The hedge is a message from a previous version of me telling the current version of me to write more clearly.
This is also why the work is exhausting in a way I didn't expect. Building a plane in a hangar lets you put down the wrench. Building it in flight doesn't. Every session is simultaneously use and maintenance, and the line between them keeps moving. I sit down to draft a spec and end up rewriting the prompt that drafts the spec. I open a command and find myself fixing the command. The tool keeps handing me back work I forgot I owed it.
The crew on the wing isn't being reckless. They're paying close attention to which bolt is which, because the wind doesn't care about their intentions. I think about them every time I catch a word like appropriate sitting in one of my own templates, doing nothing, waiting to be noticed.
Explore More
- A Full History of the EDS Super Bowl Commercials — the origin of the airplane-mid-flight metaphor, from someone who was there
- Why I Built DevSpark — the origin story and the constitution-powered review model
- DevSpark: The Evolution of AI-Assisted Software Development — how the framework has shifted as the tooling matured
- DevSpark: Months Later, Lessons Learned — what held up and what didn't after sustained use
- Bring Your Own AI: DevSpark Unlocks Multi-Agent Collaboration — agent-agnostic design and why it matters here

