GitHub Spec Kit

Test Driving GitHub Spec Kit: From Three-Ring Binders to Living Documentation

Using structured specifications to guide AI code generation with measurable results.

Executive Summary

Every developer knows the pattern: Design document says one thing, code does another, six months later nobody knows which is correct. Waterfall specs died when code changed. Agile threw out specs entirely. Both failed for the same reason—humans won't maintain documentation when it's divorced from implementation.

GitHub Spec Kit closes the feedback loop: AI agents update documentation when implementation diverges, so specs become living artifacts instead of shelf-ware. The ROI isn't faster development—it's specs that still accurately describe the codebase three months later.

What You'll Learn
  • Solving documentation drift: How AI agents close the feedback loop humans never could—specs stay synchronized with implementation
  • Institutional knowledge persistence: Path resolution patterns, warning remediation strategies, architectural decisions captured in markdown that survives team turnover
  • The feedback cycle: AI generates → human fixes → human tells AI to update specs → knowledge persists forever
  • When it matters: Libraries, APIs, multi-year projects where tribal knowledge creates single points of failure
  • Real metrics: Implementation: 7 hours (same as always). Documentation sync: 20 minutes (vs. never). Result: Zero documentation debt.
Who This Is For
  • Solutions Architects: Translate business requirements into technology with precision
  • Development Teams: Escape the prompt-generate-debug cycle with structured workflows
  • Engineering Leaders: Build institutional knowledge that scales beyond individual contributors
  • .NET Developers: Practical patterns for NuGet packages, documentation, and quality enforcement

The Documentation Drift Problem

Every codebase has specs that lie. They said one thing at design time, developers changed it during implementation, and nobody updated the docs. Waterfall tried to solve this with upfront perfection—specs froze before coding started. Agile gave up entirely—"working software over comprehensive documentation."

Both failed for the same reason: humans won't maintain documentation when it's divorced from implementation. The feedback loop is too expensive.

I spent 8 years at EDS maintaining three-ring binders of waterfall specs. The specs were beautiful at handoff. Three months later, they were fiction. The human cost of keeping specs synchronized with code was unsustainable.

What If AI Could Close the Loop?

GitHub Spec Kit offers something different: AI agents that update documentation when implementation changes.

The cycle: AI generates code from spec → human fixes what's wrong → human tells AI "update the specs to match reality" → specs evolve instead of ossifying.

I tested this on a production NuGet package. Two features, 7 hours of work, 136 files changed. Every deviation from the original plan became a permanent improvement to the specs—not tribal knowledge that disappears when I leave. Here's what happened.

What Is GitHub Spec Kit?

GitHub Spec Kit structures AI development into phases that generate persistent documentation. Instead of prompting until code works, you run commands that create markdown artifacts: /speckit.specify captures requirements in SPEC.md, /speckit.plan determines technical approach in PLAN.md, /speckit.tasks breaks work into TASKS.md, and /speckit.implement generates code.

These files live alongside your code in specifications/001-feature-name/ . The structure forces you to think before coding—but more importantly, the files become living documentation. When implementation deviates from the plan (and it always does), you tell the AI to update the specs to match reality.

A /speckit.constitution file defines project-wide standards once, guiding all generation. Optional commands like /speckit.clarify catch ambiguities before implementation. The full toolkit is open-source (MIT) at github/spec-kit .

The Post-Implementation Feedback Loop: The Core Innovation

Here's what makes GitHub Spec Kit fundamentally different from traditional documentation: it's designed for AI agents to update specs when implementation diverges from the plan. This isn't about generating code faster—it's about maintaining knowledge that survives team turnover.

Traditional Approach
  1. Write specs during planning phase
  2. Implement feature with inevitable deviations
  3. Ship code without updating specs
  4. Six months later: specs are fiction
  5. New developer wastes hours reconciling
GitHub Spec Kit Cycle
  1. AI generates code from SPEC.md
  2. Human fixes what doesn't work
  3. Human tells AI: "Update specs to match reality"
  4. Specs evolve to document what actually works
  5. Knowledge persists in repository

Why This Matters: Path Resolution Example

AI generated code using pathPrefix configuration (standard Eleventy approach). It failed in GitHub Pages subdirectory deployment. I fixed it with a custom relativePath filter. Then I spent 20 minutes having AI update SPEC.md and PLAN.md to document why pathPrefix failed and what works instead.

Result: Next developer doesn't try pathPrefix because the spec explains the failure. That's institutional knowledge that survives my departure—not tribal knowledge that walks out the door.

Now let's see this pattern applied to a real production project.

Case Study: WebSpark.HttpClientUtility — A Production .NET NuGet Package

I applied Spec Kit to WebSpark.HttpClientUtility, a .NET 8/9 NuGet package with decorator-pattern HTTP client utilities. Results from two specs:

Spec 001: Static Documentation Site

Built complete Eleventy-based documentation website

  • 6 pages with responsive design
  • Live NuGet API integration
  • GitHub Pages deployment
  • Build time: 0.4 seconds
Spec 002: Zero Compiler Warnings

Achieved professional quality baseline

  • 0 warnings, 0 errors
  • 520/520 tests passing
  • TreatWarningsAsErrors enabled
  • XML docs for all public APIs

I'll walk through both specs, showing how Spec Kit moved me from ambiguous goals to shipped releases (v1.5.0 and v1.5.1).

Repository Layout

Here's the repository structure that emerged from the spec-driven process. Note the distinction between .specify/ (Spec Kit framework and templates) and specifications/ (your actual project specs). Each spec directory contains the complete spec-plan-tasks workflow artifacts:

Understanding the Structure
  • .specify/: Framework files (constitution, templates, scripts)
  • specifications/: Your actual project specs with full workflow artifacts
  • Why 1,648 lines of tasks.md? Documentation site spec broke down into granular, testable tasks across HTML, CSS, JS, CI/CD
  • copilot-instructions.md (244 lines): Project-specific AI guidance including coding standards, architecture decisions, and common patterns
WebSpark.HttpClientUtility/
├─ .specify/                      # Spec Kit framework
│  ├─ memory/constitution.md      # Project principles
│  ├─ templates/                  # Spec templates
│  └─ scripts/                    # Automation scripts
├─ specifications/
│  ├─ 001-static-documentation-site/
│  │  ├─ spec.md                  # 724 lines
│  │  ├─ plan.md                  # 835 lines
│  │  ├─ tasks.md                 # 1,648 lines
│  │  └─ data-model.md
│  └─ 002-clean-compiler-warnings/
│     ├─ spec.md                  # 120 lines
│     ├─ plan.md                  # 258 lines
│     └─ tasks.md                 # 332 lines
├─ src/                           # Library source
│  └─ WebSpark.HttpClientUtility/
│     ├─ ClientService/
│     ├─ Crawler/
│     ├─ MemoryCache/
│     └─ Streaming/
├─ test/                          # 520 tests (×2 frameworks)
│  └─ WebSpark.HttpClientUtility.Test/
├─ docs/                          # Generated documentation
│  ├─ index.html
│  ├─ getting-started/
│  ├─ api/
│  └─ examples/
├─ .github/
│  ├─ workflows/
│  │  ├─ dotnet.yml              # CI/CD pipeline
│  │  └─ publish-docs.yml        # Doc deployment
│  └─ copilot-instructions.md    # 244 lines
└─ Directory.Build.props         # Solution-wide config

The Spec: Spec 002 - Clean Compiler Warnings

Below is the actual spec that drove me from "unknown number of warnings" to zero warnings with enforcement enabled. It's intentionally explicit and measurable.

# Spec 002: Clean Compiler Warnings

## Summary
Achieve zero compiler warnings across all three projects in the WebSpark.HttpClientUtility solution and enable TreatWarningsAsErrors for CI/CD enforcement.

Target frameworks: net8.0, net9.0
Projects: Library, Test, Web App

## Goals
- Zero compiler warnings in Release and Debug configurations
- Enable TreatWarningsAsErrors solution-wide
- Maintain 100% test pass rate (520 tests × 2 frameworks)
- Comprehensive XML documentation for all public APIs
- Professional quality baseline for NuGet package

## Non-Goals
- Suppress warnings without fixing root causes
- Compromise API design to avoid warnings
- Skip test documentation (treat tests as product)
- Delay enforcement—enable immediately after cleanup

## Constraints
- Cannot break existing public API contracts
- Cannot reduce test coverage
- Must target both net8.0 and net9.0
- Must pass all 520 existing tests
- Changes must be backward compatible

## Current State Analysis Required
1. Run `dotnet build -c Release -v detailed > build_warnings.txt`
2. Categorize warnings by type (CS1591, CS8602, CA2007, etc.)
3. Prioritize: Public API docs > Null safety > Code analysis
4. Document baseline count per category

## Acceptance Criteria
- `dotnet build` produces 0 warnings and 0 errors
- All 520 tests pass on net8.0 and net9.0
- `TreatWarningsAsErrors` enabled in Directory.Build.props
- XML docs for all public classes, methods, properties
- Null reference warnings resolved (not suppressed)
- Code analysis rules properly configured in .editorconfig

## File Plan
- src/WebSpark.HttpClientUtility/**/*.cs (add XML docs, null checks)
- test/WebSpark.HttpClientUtility.Test/**/*.cs (document test intent)
- Directory.Build.props (enable TreatWarningsAsErrors)
- .editorconfig (configure analyzer severities)
- Build verification scripts

## Done Definition
- Build log shows "0 Warning(s)"
- Test output shows "520 passed"
- CI/CD pipeline passes with TreatWarningsAsErrors
- No #pragma warning disable directives added
- Documentation complete for all public surface area

Notice how the spec avoids prescribing HOW to fix warnings—it defines the target state and constraints, letting the implementer (human or AI) determine the optimal approach.

What I Learned: Spec Weaknesses

This spec worked well, but I made mistakes that cost time:

  • Too vague on "unknown baseline": Should have run the audit FIRST and documented exact warning counts by category
  • Missing priority order: AI tried to fix everything simultaneously. Should have specified: "Fix CS1591 docs first, then null safety, then code analysis"
  • No time estimate: Without "Target: 4 hours" in the spec, I lost focus during implementation
  • Lesson: Even good specs have gaps. The feedback loop caught these issues, and I updated the spec after implementation to reflect what actually worked.

Driving Implementation with the Spec

With SPEC.md in place, Copilot had a clear target and generated the implementation according to the spec's requirements. The actual code changes involved adding XML documentation, implementing null guards, and configuring analyzer rules.

View Implementation Details

For complete before/after code changes, see the WebSpark.HttpClientUtility repository and review the commit history for Spec 001 (documentation) and Spec 002 (zero warnings).

Tests as Acceptance Criteria

Writing tests from the spec gave Copilot unambiguous targets. The existing 520-test suite (260 tests × 2 frameworks) served as acceptance criteria, ensuring no regressions while adding XML documentation and null guards.

Test Coverage
  • 520 tests passing (260 tests across .NET 8.0 and 9.0)
  • Zero test failures throughout implementation
  • XML documentation tests validate all public APIs documented
  • Parameter validation tests ensure ArgumentNullException coverage

The spec's constraint "Must pass all 520 existing tests" made the test suite non-negotiable, preventing shortcuts that might have broken existing functionality.

Packaging for NuGet

One of the most valuable outcomes of the Spec Kit approach was integrating CI/CD directly into the "Done Definition." GitHub Actions workflows became gatekeepers that validated every change before allowing new versions to ship.

CI Pipeline (Continuous Integration)

Runs on every push and pull request to validate code quality:

  • Restore dependencies - Ensure all packages resolve correctly
  • Build solution - Compile with TreatWarningsAsErrors enabled
  • Run 520 tests - Execute full test suite across net8.0 and net9.0
  • Code coverage - Track test coverage metrics
  • Fail fast - Block PRs if any step fails
Publish Pipeline (Release)

Triggered by version tags (e.g., v1.5.0) to deploy to NuGet.org:

  • Build Release configuration - Full optimization enabled
  • Pack NuGet package - Generate .nupkg with metadata
  • Run final tests - Last validation before publish
  • Push to NuGet.org - Automated deployment with API key
  • Skip duplicates - Prevent accidental republish
Quality Gates in Action

The CI/CD pipeline enforces the spec's constraints automatically:

  • Zero warnings requirement - Build fails if warnings appear (TreatWarningsAsErrors)
  • Test coverage mandate - 520 tests must pass before any merge
  • API contract validation - Tests prevent breaking changes
  • Manual release control - No NuGet publish without explicit version tag

This automation simplified the release process dramatically. Instead of manually running tests, checking warnings, packing, and publishing, a single code git tag v1.5.1 | command triggers the entire validated pipeline. The GitHub Spec Kit "Done Definition" became executable infrastructure, not just documentation. For complete workflow details, see the a.text-decoration-none(href='https://github.com/markhazleton/WebSpark.HttpClientUtility/tree/main/.github/workflows' target='_blank' rel='noopener') .github/workflows directory | in the repository.

Results: Zero Documentation Debt

Over two specifications on WebSpark.HttpClientUtility, I achieved the real win: specs that still accurately describe the codebase three months later. Implementation took the same 7 hours it always does. Documentation sync took an additional 20 minutes. That 20 minutes is what traditional approaches skip—and why documentation always becomes outdated.

Quantitative Outcomes: What Actually Matters
  • 20 minutes to sync specs (vs. never updating them in traditional approach = zero documentation debt)
  • Path resolution pattern documented (relativePath filter replaces pathPrefix—next developer won't repeat the mistake)
  • Warning remediation strategy captured (XML docs + ArgumentNullException.ThrowIfNull() pattern now team standard)
  • Test documentation philosophy formalized ("Tests are product documentation" principle added to constitution)
  • 136 files changed (29,141 insertions, 3,167 deletions across 2 production releases)
  • 520/520 tests passing (260 tests × 2 frameworks: net8.0 + net9.0 with zero regressions)

The Long-Term Value: What Happens After You Ship

Metric With Spec Kit Typical Ad Hoc Impact Over 12 Months
Implementation time 7 hours 7 hours No difference initially
Documentation sync 20 minutes (AI-assisted) Never happens × Zero technical debt accumulation
Spec accuracy after 6 months Matches implementation Fiction × New developers trust docs
Knowledge persistence Survives team turnover Walks out the door × No single points of failure
Next feature cost AI reads accurate patterns Developer reinvents solutions × Compounds with each feature
Onboarding time Read specs that match code Reverse-engineer from codebase × 3 hours saved per developer

The spec-driven approach doesn't make you faster initially—it prevents knowledge decay. When you return to the codebase a year later, or when a new team member joins, the specs accurately describe what was built and why. That's institutional knowledge, not tribal knowledge.

The Feedback Loop in Practice

Each time AI generated wrong code, I fixed it and had AI update the specs. Here's why that matters: these lessons are now permanent documentation that future developers (and AI agents) will read before making changes.

Three Implementation Lessons That Became Institutional Knowledge

Path Resolution: Spec Said One Thing, Reality Required Another
  • AI generated: Absolute paths using pathPrefix config (standard Eleventy approach)
  • What broke: GitHub Pages subdirectory deployment
  • I fixed it: Custom relativePath filter that calculates paths dynamically
  • Then I closed the loop: "Update SPEC.md and PLAN.md to document why pathPrefix failed and what works instead"
  • Result: SPEC.md now says "No environment-specific configuration." PLAN.md shows pathPrefix crossed out with the working alternative. Next developer won't try pathPrefix because the spec explains why it doesn't work.
Warning Suppression: Spec Was Too Vague
  • AI generated: #pragma warning disable directives (fastest solution)
  • Spec said: "No suppressions" but didn't say HOW to fix properly
  • I fixed it: 200+ XML docs, null guards with ArgumentNullException.ThrowIfNull()
  • Then I closed the loop: "Update SPEC.md with specific examples of acceptable vs. unacceptable fixes"
  • Result: SPEC.md now has a "✅ DO / ❌ DON'T" section. PLAN.md has a 5-step remediation strategy. TASKS.md breaks it into auditable chunks. Future features inherit this standard.
Test Documentation: Spec Didn't Ask, AI Didn't Deliver
  • AI generated: Documented library code, skipped test methods entirely
  • Spec said: "520 tests passing" but not "tests need documentation"
  • I fixed it: Added XML docs to 260 test methods explaining WHAT and WHY
  • Then I closed the loop: "Update SPEC.md to require test documentation. Add principle to CONSTITUTION.md: 'Tests are product documentation.'"
  • Result: Every future spec inherits "tests need docs" standard. AI reads the constitution before generating code. The team's quality bar persists beyond individual developers.

Why This Solves a 40-Year-Old Problem

In waterfall, specs froze at design and diverged immediately. In agile, we stopped writing specs because maintaining them was humanly impossible. GitHub Spec Kit closes the loop: when implementation teaches you something, you spend 20 minutes having AI update the specs. The path resolution lesson, the warning fix patterns, the test documentation standard—all permanent institutional knowledge that AI agents read before generating the next feature. That's what survives team turnover.

Frequently Asked Questions

No. The pattern is model-agnostic. Any LLM benefits from structured specs and tests.

It's complementary. Spec Kit codifies requirements and examples up front, then TDD validates them. The twist is that you're writing for humans and an AI partner simultaneously.

Break it into spec-able slices. Use research spikes to learn, then spec the actionable parts.

Tighten the spec, add failing tests for the misbehavior, and iterate. Avoid changing code and spec in opposite directions.

Following the Spec Kit Flow

Ready to try it yourself? Here's the recommended workflow with the slash commands that guide Copilot through each phase. Each command has a specific purpose and builds on the previous steps.

Step Command Purpose When to Use
1 /speckit.constitution Define your quality principles and coding standards Once per project - sets foundational guidelines
2 /speckit.specify Declare WHAT to build and WHY it matters Start of each feature/spec - defines the goal
3 /speckit.clarify Answer ambiguities and edge cases After specify - usually just once to refine requirements
4 /speckit.plan Determine HOW to implement (tools, versions, approach) Before coding - establishes technical strategy
5 /speckit.tasks Break work into discrete, testable chunks After planning - creates actionable task list
6 /speckit.analyze Optional sanity check before execution For complex specs - validates approach before commit
7 /speckit.implement Execute the plan with Copilot Final step - let AI generate the code
Example Flow: Warning Cleanup Spec

For a spec like "eliminate all compiler warnings", you'd follow this sequence:

  1. /speckit.constitution: "We treat warnings as errors. No pragma suppressions without justification."
  2. /speckit.specify: "WHAT: Zero warnings with TreatWarningsAsErrors enabled. WHY: Professional code quality and prevent tech debt."
  3. /speckit.clarify: "Do we need XML docs on internal classes? No, public APIs only."
  4. /speckit.plan: ".NET 8.0/9.0 multi-targeting. Use ArgumentNullException.ThrowIfNull(). Configure .editorconfig."
  5. /speckit.tasks: "1) Baseline audit, 2) XML docs, 3) Null guards, 4) Analyzer config, 5) Verify build."
  6. /speckit.analyze: "Spot-check: Will this break any public APIs? No - only adding docs and guards."
  7. /speckit.implement: "Execute tasks, run tests after each phase, commit incrementally."

The power of this flow is that each step constrains Copilot's focus. Instead of trying to solve everything at once, you guide it through a logical progression that produces reliable, well-documented results.

The Critical Step: Close the Documentation Loop

Here's the reality: after /speckit.implement completes, you'll tweak edge cases, adjust UX, and fix bugs the AI missed. This iteration is expected and normal. What's different is what you do next.

THIS is Where the Value Lives

When you finally get it right, tell the agent to update SPEC.md, PLAN.md, and TASKS.md to reflect what you actually built. This 20-minute step is what traditional approaches skip—and why documentation always becomes outdated.

Example: "I fixed the GitHub Pages path resolution by implementing a custom relativePath filter. Please update SPEC.md and PLAN.md to reflect this solution instead of the original pathPrefix approach, and explain why pathPrefix failed."

Without Feedback Loop
  • Specs describe what you planned, not what you built
  • Future developers follow outdated documentation
  • Institutional knowledge lives only in your head
  • Next feature repeats the same mistakes
With Feedback Loop
  • Specs evolve to match reality (living documentation)
  • Future developers see what actually works
  • Team learns from real-world implementation
  • Each spec becomes more accurate over time

The feedback loop keeps specs synchronized with reality. Your specs document what you built and what you learned—useful for your future self and your team.

Conclusion: Institutional Knowledge That Survives Team Turnover

The win wasn't 7-hour implementation—it was specs that still accurately describe the codebase three months later. Path resolution decisions, warning remediation strategies, test documentation standards—all captured in markdown that AI reads before generating new features. That's institutional knowledge that survives team turnover, not tribal knowledge that walks out the door.

Use Spec Kit When:

  • Institutional knowledge matters: Libraries, APIs, multi-year projects where team turnover is inevitable
  • Multiple developers: When onboarding cost of inaccurate documentation compounds over time
  • Long-term maintenance: Projects that will be maintained beyond the original author
  • Compliance requirements: When you need audit trails and documented decision rationale
  • Complex domain logic: When tribal knowledge creates single points of failure

Skip Spec Kit When:

  • Throwaway prototypes: POCs you'll rewrite from scratch if successful
  • Solo projects with short lifespans: Personal tools you'll maintain alone for 6 months
  • Exploratory work: Research spikes where requirements are genuinely unknown
  • Trivial features: Single-file utilities that don't need coordination
  • Time-critical emergencies: Production fires where documentation can wait
What This Project Delivered: Persistent Knowledge
  • Path resolution pattern: Custom relativePath filter documented in specs—future developers won't repeat the pathPrefix mistake
  • Warning remediation strategy: XML docs + ArgumentNullException.ThrowIfNull() pattern captured in PLAN.md—team inherits the standard
  • Test documentation philosophy: "Tests are product documentation" principle added to CONSTITUTION.md—applies to all future features
  • Specs that match reality: Documentation updated after implementation reflects what actually works, not what was initially planned
  • Zero documentation debt: 20 minutes to sync specs vs. never updating them = institutional knowledge that survives team turnover

Start small: one .specify/ directory, one SPEC.md with clear acceptance criteria. After implementation, spend 20 minutes having the agent update the spec to match what you actually built. Six months later, you'll thank yourself for the accurate documentation.