Autonomy Guardrails: Bounding Agent Action Safely
How DevSpark's act/plan execution modes and per-step tool scoping let me expand agent autonomy incrementally — starting with review, earning toward execution.
DevSpark Series — 24 articles
- DevSpark: Constitution-Driven AI for Software Development
- Getting Started with DevSpark: Requirements Quality Matters
- DevSpark: Constitution-Based Pull Request Reviews
- Why I Built DevSpark
- Taking DevSpark to the Next Level
- From Oracle CASE to Spec-Driven AI Development
- Fork Management: Automating Upstream Integration
- DevSpark: The Evolution of AI-Assisted Software Development
- DevSpark: Months Later, Lessons Learned
- DevSpark in Practice: A NuGet Package Case Study
- DevSpark: From Fork to Framework — What the Commits Reveal
- DevSpark v0.1.0: Agent-Agnostic, Multi-User, and Built for Teams
- DevSpark Monorepo Support: Governing Multiple Apps in One Repository
- The DevSpark Tiered Prompt Model: Resolving Context at Scale
- A Governed Contribution Model for DevSpark Prompts
- Prompt Metadata: Enforcing the DevSpark Constitution
- Bring Your Own AI: DevSpark Unlocks Multi-Agent Collaboration
- Workflows as First-Class Artifacts: Defining Operations for AI
- Observability in AI Workflows: Exposing the Black Box
- Autonomy Guardrails: Bounding Agent Action Safely
- Dogfooding DevSpark: Building the Plane While Flying It
- Closing the Loop: Automating Feedback with Suggest-Improvement
- Designing the DevSpark CLI UX: Commands vs Prompts
- The Alias Layer: Masking Complexity in Agent Invocations
Autonomy is not a binary state. That's the insight I kept coming back to as I designed the execution model for DevSpark's Harness Runtime. The question isn't "should this agent act autonomously?" — it's "how much autonomy, for which operations, under which conditions?" The answer changes by workflow, by step, by the specific action being taken.
Getting this wrong in either direction has a cost. An agent that can only propose and never execute produces recommendations I have to manually implement, defeating the purpose of automation. An agent with unrestricted execution authority is genuinely dangerous — one misread prompt or misunderstood project structure can overwrite critical files, execute deployment scripts, or modify infrastructure configuration in ways that are painful to undo.
The guardrail system in DevSpark v2.0.0 was designed around that spectrum rather than a single on/off switch.
Two Execution Modes
The simplest guardrail is at the workflow level. Every Harness run accepts an execution mode flag:
--mode plan runs the workflow in read-only mode. Steps that would write files, commit to git, or execute terminal commands are skipped. Prompts are still assembled and sent to the agent, but the agent receives an instruction prefix indicating it's operating in plan mode — it should describe what it would do rather than do it. The output shows the intended sequence of changes without making them.
--mode act is the default — full execution, all steps, all writes, all commands.
The practical use: before I run a workflow that touches deployment configuration or modifies shared infrastructure, I run --mode plan first. I review the assembled context, verify the agent's stated intent, and confirm the step sequence is what I expected. Ten seconds of friction that has caught mismatches between my mental model and the workflow's actual behavior more than once.
This isn't a novel concept — it's the same principle behind terraform plan before terraform apply. The AI-specific version just needs to be explicit because the agent's decision-making is less transparent than a declarative infrastructure diff.
Per-Step Tool Scoping
The Harness Runtime adds a second layer of guardrails at the step level: each step in a workflow spec can declare its own tool access scope. A documentation-generation step that should only read source files and write markdown can be constrained to exactly that — read access to the source directory, write access to the docs directory, no terminal execution.
- id: generate-docs
adapter: claude_code
prompt_ref: .devspark/templates/prompts/docs.md
tools:
allowed:
- read_file
- create_file
restricted_paths:
- .github/workflows/**
- infrastructure/**If the agent attempts to access a path outside its declared scope, the runtime blocks the tool call before it executes. The agent doesn't get an error message that reveals the restriction — it simply doesn't have access. This limits the blast radius of a hallucination or an adversarial prompt that somehow reaches the agent through injected content.
The restricted_paths field is particularly useful for protecting CI/CD configuration and infrastructure definitions. These are the files where an unintended modification has the most potential for operational damage. Keeping them outside the scope of all but specifically authorized steps means a workflow focused on application code genuinely cannot touch them, regardless of what the agent decides to do.
The Manual Adapter as a Guardrail
One of the five Harness adapters — manual — is itself a guardrail. When a step uses the manual adapter, the runtime pauses execution and waits for a human to complete the step. The agent assembles the context and the prompt, but a person reviews and applies the result.
This is useful for steps that require judgment I'm not yet comfortable delegating to automation: reviewing generated database migrations, approving changes to authentication logic, or validating that a generated API contract matches the business requirements. The manual step keeps those decisions human without requiring me to run the rest of the workflow manually too.
Over time, as I gain confidence in specific workflows producing consistent results, I migrate manual steps to automated ones. Trust is earned through observation, not granted upfront. The manual adapter makes that progression explicit rather than requiring a binary choice at workflow design time.
Retry Limits and Human Escalation
One pattern that surprised me in practice: agents can get stuck in failure loops. A step fails, the retry prompt attempts a repair, the repair produces an output that also fails, and the cycle continues consuming tokens without making progress.
The Harness Runtime addresses this through configurable retry limits and a requireHumanAfter threshold. After a specified number of automatic retries, the workflow pauses and surfaces for human review rather than continuing to retry. The run record captures the full failure sequence — every retry attempt, every repair prompt, every validation failure — so I can see exactly what the agent tried before escalating.
This is the same principle that governs the economics of spec-driven development: catching failures early is cheaper than catching them late. An agent that fails three times on the same validation rule is signaling that something in the spec, the prompt, or the validation definition needs to change. Escalating to human review at that point is the right call — and building that escalation into the framework rather than leaving it to willpower means it actually happens.
Autonomy as a Dial
The architecture I've landed on treats autonomy as a property you configure per workflow and per step, not a global setting you flip once. New workflows start more restricted — --mode plan by default, manual review steps for high-risk operations, narrow tool scopes. As the workflow proves itself through observed behavior, I expand its autonomy incrementally.
This mirrors how I'd onboard a new team member. Full code access comes after the first few PRs prove judgment. Deployment authority comes after familiarity with the production environment. The underlying insight — that trust is earned through consistent, observable behavior — applies equally to AI agents.
The observability infrastructure in DevSpark provides the "observable" part. Every run is recorded. Every tool call is logged. Every file change is tracked. That record is what makes informed trust decisions possible, rather than hoping the agent behaves as expected because it was told to.
