Autonomy Guardrails: Bounding Agent Action Safely

April 16, 20266 min read

How DevSpark's act/plan execution modes and per-step tool scoping let me expand agent autonomy incrementally — starting with review, earning toward execution.

DevSpark Series — 25 articles

Autonomy is not a binary state. That's the insight I kept coming back to as I designed the execution model for DevSpark's Harness Runtime. The question isn't "should this agent act autonomously?" — it's "how much autonomy, for which operations, under which conditions?" The answer changes by workflow, by step, by the specific action being taken.

Getting this wrong in either direction has a cost. An agent that can only propose and never execute produces recommendations I have to manually implement, defeating the purpose of automation. An agent with unrestricted execution authority is genuinely dangerous — one misread prompt or misunderstood project structure can overwrite critical files, execute deployment scripts, or modify infrastructure configuration in ways that are painful to undo.

The guardrail system in DevSpark v2.0.0 was designed around that spectrum rather than a single on/off switch.

Two Execution Modes

The simplest guardrail is at the workflow level. Every Harness run accepts an execution mode flag:

--mode plan runs the workflow in read-only mode. Steps that would write files, commit to git, or execute terminal commands are skipped. Prompts are still assembled and sent to the agent, but the agent receives an instruction prefix indicating it's operating in plan mode — it should describe what it would do rather than do it. The output shows the intended sequence of changes without making them.

--mode act is the default — full execution, all steps, all writes, all commands.

The practical use: before I run a workflow that touches deployment configuration or modifies shared infrastructure, I run --mode plan first. I review the assembled context, verify the agent's stated intent, and confirm the step sequence is what I expected. Ten seconds of friction that has caught mismatches between my mental model and the workflow's actual behavior more than once.

This isn't a novel concept — it's the same principle behind terraform plan before terraform apply. The AI-specific version just needs to be explicit because the agent's decision-making is less transparent than a declarative infrastructure diff.

Per-Step Tool Scoping

The Harness Runtime adds a second layer of guardrails at the step level: each step in a workflow spec can declare its own tool access scope. A documentation-generation step that should only read source files and write markdown can be constrained to exactly that — read access to the source directory, write access to the docs directory, no terminal execution.

- id: generate-docs
  adapter: claude_code
  prompt_ref: .devspark/templates/prompts/docs.md
  tools:
    allowed:
      - read_file
      - create_file
    restricted_paths:
      - .github/workflows/**
      - infrastructure/**

If the agent attempts to access a path outside its declared scope, the runtime blocks the tool call before it executes. The agent doesn't get an error message that reveals the restriction — it simply doesn't have access. This limits the blast radius of a hallucination or an adversarial prompt that somehow reaches the agent through injected content.

The restricted_paths field is particularly useful for protecting CI/CD configuration and infrastructure definitions. These are the files where an unintended modification has the most potential for operational damage. Keeping them outside the scope of all but specifically authorized steps means a workflow focused on application code genuinely cannot touch them, regardless of what the agent decides to do.

The Manual Adapter as a Guardrail

One of the five Harness adapters — manual — is itself a guardrail. When a step uses the manual adapter, the runtime pauses execution and waits for a human to complete the step. The agent assembles the context and the prompt, but a person reviews and applies the result.

This is useful for steps that require judgment I'm not yet comfortable delegating to automation: reviewing generated database migrations, approving changes to authentication logic, or validating that a generated API contract matches the business requirements. The manual step keeps those decisions human without requiring me to run the rest of the workflow manually too.

Over time, as I gain confidence in specific workflows producing consistent results, I migrate manual steps to automated ones. Trust is earned through observation, not granted upfront. The manual adapter makes that progression explicit rather than requiring a binary choice at workflow design time.

Retry Limits and Human Escalation

One pattern that surprised me in practice: agents can get stuck in failure loops. A step fails, the retry prompt attempts a repair, the repair produces an output that also fails, and the cycle continues consuming tokens without making progress.

The Harness Runtime addresses this through configurable retry limits and a requireHumanAfter threshold. After a specified number of automatic retries, the workflow pauses and surfaces for human review rather than continuing to retry. The run record captures the full failure sequence — every retry attempt, every repair prompt, every validation failure — so I can see exactly what the agent tried before escalating.

This is the same principle that governs the economics of spec-driven development: catching failures early is cheaper than catching them late. An agent that fails three times on the same validation rule is signaling that something in the spec, the prompt, or the validation definition needs to change. Escalating to human review at that point is the right call — and building that escalation into the framework rather than leaving it to willpower means it actually happens.

Autonomy as a Dial

The architecture I've landed on treats autonomy as a property you configure per workflow and per step, not a global setting you flip once. New workflows start more restricted — --mode plan by default, manual review steps for high-risk operations, narrow tool scopes. As the workflow proves itself through observed behavior, I expand its autonomy incrementally.

This mirrors how I'd onboard a new team member. Full code access comes after the first few PRs prove judgment. Deployment authority comes after familiarity with the production environment. The underlying insight — that trust is earned through consistent, observable behavior — applies equally to AI agents.

The observability infrastructure in DevSpark provides the "observable" part. Every run is recorded. Every tool call is logged. Every file change is tracked. That record is what makes informed trust decisions possible, rather than hoping the agent behaves as expected because it was told to.

Explore More

Workflows as First-Class Artifacts: Defining Operations for AI -- How DevSpark's Harness Runtime turns ad-hoc AI interactions into version
Why I Built DevSpark -- Building the tool I needed to survive the reality of brownfield developm
Getting Started with DevSpark: Requirements Quality Matters -- Enforcing requirements quality before code generation
DevSpark: Constitution-Driven AI for Software Development -- DevSpark aligns AI coding agents with project architecture and governanc
DevSpark: Constitution-Based Pull Request Reviews -- How a well-written constitution for your codebase can power automated co

Autonomy Guardrails: Bounding Agent Action Safely

Two Execution Modes

Per-Step Tool Scoping

The Manual Adapter as a Guardrail

Retry Limits and Human Escalation

Autonomy as a Dial

Explore More

Related posts

DevSpark Blogging Workflow: How I Built Better Articles

Closing the Loop: Automating Feedback with Suggest-Improvement

Designing the DevSpark CLI UX: Commands vs Prompts