Workflows as First-Class Artifacts: Defining Operations for AI

April 16, 20269 min read

How DevSpark's Harness Runtime turns ad-hoc AI interactions into version-controlled, validated, reproducible workflow specs — and what changed.

DevSpark Series — 24 articles

Imagine managing a production database by manually typing SQL statements into a terminal every time a schema change is needed. We abandoned that practice in favor of migrations, version control, and CI/CD pipelines. The discipline holds: if it matters, it gets a file, a diff, and a review.

I kept noticing that I wasn't applying the same discipline to my AI interactions. Every specification session, every critic run, every PR review was a transient conversation — typed once, lost when the window closed. When I wanted to re-run a workflow that had worked well, I was reconstructing it from memory. When a colleague asked how I'd approached a complex refactoring, I couldn't share the interaction. The approach existed only in my chat history.

The work to fix this became DevSpark's Harness Runtime, shipped in v2.0.0.

What a Workflow Artifact Actually Is

A DevSpark workflow is a YAML file with a defined schema (apiVersion: devspark.ai/v1). It declares a sequence of operations — what context to gather, which agent to invoke, what to validate, how to handle failures — in a format that the devspark harness run command can execute end-to-end.

The sample.harness.yaml in the DevSpark repository shows the structure at its simplest:

apiVersion: devspark.ai/v1
kind: HarnessSpec
metadata:
  name: scaffold-api-endpoint
  description: Generates a new API endpoint with validation and tests
steps:
  - id: gather-models
    action: read_files
    target: src/domain/models/**/*.cs

  - id: generate-controller
    adapter: claude_code
    prompt_ref: .devspark/templates/prompts/api-controller.md
    depends_on: [gather-models]
    validate:
      - rule: file.exists
        path: src/api/controllers/

  - id: generate-tests
    adapter: claude_code
    prompt_ref: .devspark/templates/prompts/api-tests.md
    depends_on: [generate-controller]
    validate:
      - rule: command.exit_code
        command: dotnet test
        expected: 0

Every step has a purpose, explicit dependencies, and validation rules. If the controller step fails, the test step doesn't run. If validation fails, the run exits with a structured error, not a silent wrong output.

The Two Execution Modes

One of the decisions I made early in the Harness design: there had to be a way to preview a workflow without executing it. --mode plan runs the workflow in read-only mode. Steps that would write files or invoke terminal commands are skipped, but the prompts are still assembled and the agent receives a prefixed instruction indicating it's operating in plan mode. The output shows what would happen without the side effects.

--mode act is the default — full execution, all steps, all writes, all commands.

The distinction matters for workflows touching infrastructure or making irreversible changes. Running --mode plan first to review the assembled context and the proposed steps adds ten seconds and has caught me from running the wrong spec against the wrong project more than once.

Adapters and Portability

The Harness Runtime supports five built-in adapters: copilot, claude_code, cursor, manual, and noop. The adapter determines how a step sends its prompt to an AI agent. The manual adapter pauses execution and waits for a human to complete the step — useful for steps that require judgment the AI shouldn't be trusted to make autonomously. The noop adapter skips the LLM call entirely and returns a placeholder, useful for testing the harness spec structure without consuming API tokens.

What this means practically: a workflow I wrote for Claude Code works for a colleague using Copilot by changing the adapter field. The context gathering, the validation rules, the artifact tracking — all of that is adapter-agnostic. The spec is the portable artifact. The adapter is an implementation detail.

Run Artifacts and Telemetry

Every Harness run produces artifacts under .documentation/devspark/runs/. Each run gets a run.json with the structured execution record and a JSONL event log with a timestamped entry for every action the runtime took. Artifact delta tracking records which files were created, modified, or deleted in each step.

The practical value: when a workflow produces unexpected output, I don't have to reconstruct what happened from memory. The run record shows exactly what context was gathered, what prompt was sent, what validation ran, and what files changed. Debugging a non-deterministic AI process becomes close to debugging a deterministic one — there's a trail.

What This Changed

The most concrete change is that useful workflows are now shareable. If I develop a workflow for scaffolding a specific pattern in a .NET codebase — with the right context gathering, the right validation, the right retry logic — I commit the YAML file. A colleague clones the repo, runs devspark harness run scaffold-api-endpoint.yaml, and gets the same result I did. Not approximately the same result. The same result, because the same spec controls what context the agent sees and what validation the output must pass.

The more subtle change is in how I think about AI-assisted development. When I'm refining a prompt that almost works but not quite, I'm now editing a file in version control, not re-typing a chat message. The diff shows exactly what changed. The next run shows whether it helped. That discipline — iterate on the artifact, track the changes — is the same discipline that makes software maintainable over time. It turns out it works just as well on AI workflows.

The repo history knows what I built. That's not a small thing.

Workflows as First-Class Artifacts: Defining Operations for AI

What a Workflow Artifact Actually Is

The Two Execution Modes

Adapters and Portability

Run Artifacts and Telemetry

What This Changed

Related posts

Closing the Loop: Automating Feedback with Suggest-Improvement

Designing the DevSpark CLI UX: Commands vs Prompts

The Alias Layer: Masking Complexity in Agent Invocations