Back to blog

Closing the Loop: Automating Feedback with Suggest-Improvement

April 18, 20267 min read

How the suggest-improvement workflow alias captures developer friction in context and closes the loop between daily use and framework evolution.

DevSpark Series — 24 articles
  1. DevSpark: Constitution-Driven AI for Software Development
  2. Getting Started with DevSpark: Requirements Quality Matters
  3. DevSpark: Constitution-Based Pull Request Reviews
  4. Why I Built DevSpark
  5. Taking DevSpark to the Next Level
  6. From Oracle CASE to Spec-Driven AI Development
  7. Fork Management: Automating Upstream Integration
  8. DevSpark: The Evolution of AI-Assisted Software Development
  9. DevSpark: Months Later, Lessons Learned
  10. DevSpark in Practice: A NuGet Package Case Study
  11. DevSpark: From Fork to Framework — What the Commits Reveal
  12. DevSpark v0.1.0: Agent-Agnostic, Multi-User, and Built for Teams
  13. DevSpark Monorepo Support: Governing Multiple Apps in One Repository
  14. The DevSpark Tiered Prompt Model: Resolving Context at Scale
  15. A Governed Contribution Model for DevSpark Prompts
  16. Prompt Metadata: Enforcing the DevSpark Constitution
  17. Bring Your Own AI: DevSpark Unlocks Multi-Agent Collaboration
  18. Workflows as First-Class Artifacts: Defining Operations for AI
  19. Observability in AI Workflows: Exposing the Black Box
  20. Autonomy Guardrails: Bounding Agent Action Safely
  21. Dogfooding DevSpark: Building the Plane While Flying It
  22. Closing the Loop: Automating Feedback with Suggest-Improvement
  23. Designing the DevSpark CLI UX: Commands vs Prompts
  24. The Alias Layer: Masking Complexity in Agent Invocations

The moment an internal tool stops feeling useful, most developers don't file a ticket. They work around it — a local script here, a manual step there — and the tool quietly loses the trust it takes months to build. I've watched this happen with tooling I've maintained, and I've done it myself with tools that weren't mine. The friction of reporting an issue consistently outweighs the perceived benefit of doing so.

That's the problem the suggest-improvement workflow exists to solve.

The Anatomy of the Problem

When an AI agent produces a subpar result in DevSpark — it misreads the project structure, applies the wrong architectural pattern, or hallucinates a codebase convention — the instinct is to re-prompt and move on. The problem is that re-prompting fixes the immediate output without capturing anything about why the original prompt failed. The next session hits the same issue. The session after that does too.

What makes this particularly tricky is that AI prompt failures are context-sensitive. The failure isn't just "the agent got it wrong" — it's "the agent got it wrong given this specific constitution, this specific spec format, this specific project structure, and this specific command version." A generic bug report loses all of that context. An in-context feedback mechanism captures it.

The Suggest-Improvement Workflow

In DevSpark v2.1.0, suggest-improvement ships as a first-class workflow alias alongside create-spec and execute-plan. When I encounter a command that produced the wrong output, I run it from the same context window where the failure occurred:

devspark run suggest-improvement

The workflow captures the execution context — which command was invoked, which agent adapter handled it, the assembled prompt, the run artifacts — and scaffolds an improvement proposal. I add my qualitative annotation: what I expected versus what happened, and what specific behavior should change. The result is a structured proposal that includes enough technical context to be actionable, not just a description of what went wrong.

The routing logic matters too. The framework distinguishes between three tiers of ownership: improvements to the framework baseline (.devspark/ tier), improvements to the project-level override (.documentation/ tier), and improvements to my user-scoped personalization. The workflow checks which tier owns the prompt that failed and routes the proposal accordingly. A framework-tier issue becomes a PR proposal against the DevSpark repository. A project-tier issue goes into the project's own improvement backlog. A user-scoped issue is just a file edit I make myself.

Why Context Is the Entire Point

The observability article covers how DevSpark's run artifacts capture every action a workflow takes. The suggest-improvement workflow draws on those same artifacts. When I invoke it after a failure, it doesn't ask me to reconstruct what happened — it reads the run record and pre-populates the proposal with the execution state.

This changes the quality of the feedback significantly. Instead of "the critic was too aggressive," the proposal says: "In command version X, when the constitution includes principle Y, the critic classified Z as a showstopper severity. Expected classification: MEDIUM. Proposed change: add an exemption condition for situations where..." That's actionable. The vague version isn't.

It also removes the context-switch penalty. I stay in my workflow. I annotate what went wrong. The system handles packaging it for the appropriate tier. The friction reduction matters because it changes the economics of reporting: when the cost is low enough, developers will actually do it.

Closing the Loop Back to the Framework

The improvement mechanism is only half the loop. The other half is what happens to proposals once they're filed. For framework-tier proposals, the DevSpark contributing guide documents a PR-based process where improvements are tested against the regression suite before merging. The suggest-improvement workflow scaffolds this PR draft as part of its output — it's not just a report, it's the beginning of the fix.

For project-tier improvements, the proposal lands in the project's .documentation/ directory as a tracked artifact. It doesn't go anywhere automatically, but it's visible, versioned, and ready to act on during the next project retrospective or constitution review cycle.

What I've found is that the act of running suggest-improvement — even before the proposal goes anywhere — is itself valuable. It forces me to articulate what I expected the tool to do. More than once, that articulation revealed that the tool was behaving correctly and my mental model of the command was wrong. The loop closes in a different direction: I updated my constitution instead of the command.

That's the honest version of a feedback loop. Sometimes the tool needs to change. Sometimes I do.