DevSpark in Practice: A NuGet Package Case Study
The DevSpark series describes the methodology. This article shows it. Four consecutive feature specifications on WebSpark.HttpClientUtility — a production .NET NuGet package — covering a documentation site, compiler warning cleanup, a package split, and a new batch execution feature. Each spec illuminated something different about what spec-driven development costs, what it saves, and what it preserves.
DevSpark Series — 24 articles
- DevSpark: Constitution-Driven AI for Software Development
- Getting Started with DevSpark: Requirements Quality Matters
- DevSpark: Constitution-Based Pull Request Reviews
- Why I Built DevSpark
- Taking DevSpark to the Next Level
- From Oracle CASE to Spec-Driven AI Development
- Fork Management: Automating Upstream Integration
- DevSpark: The Evolution of AI-Assisted Software Development
- DevSpark: Months Later, Lessons Learned
- DevSpark in Practice: A NuGet Package Case Study
- DevSpark: From Fork to Framework — What the Commits Reveal
- DevSpark v0.1.0: Agent-Agnostic, Multi-User, and Built for Teams
- DevSpark Monorepo Support: Governing Multiple Apps in One Repository
- The DevSpark Tiered Prompt Model: Resolving Context at Scale
- A Governed Contribution Model for DevSpark Prompts
- Prompt Metadata: Enforcing the DevSpark Constitution
- Bring Your Own AI: DevSpark Unlocks Multi-Agent Collaboration
- Workflows as First-Class Artifacts: Defining Operations for AI
- Observability in AI Workflows: Exposing the Black Box
- Autonomy Guardrails: Bounding Agent Action Safely
- Dogfooding DevSpark: Building the Plane While Flying It
- Closing the Loop: Automating Feedback with Suggest-Improvement
- Designing the DevSpark CLI UX: Commands vs Prompts
- The Alias Layer: Masking Complexity in Agent Invocations
The Problem with Methodology Articles
The rest of this series describes the methodology. What it is, why it matters, how it evolved, what it costs, what it saves. That's the right place to start. But describing a process and showing it are two different things.
This article shows it.
WebSpark.HttpClientUtility is a production .NET NuGet package I maintain — utilities for managing HttpClient with resilience, caching, telemetry, concurrent execution, and web crawling. It has real consumers, runs against multiple target frameworks (.NET 8, 9, and 10), and has years of accumulated decisions baked into it. A genuinely brownfield codebase, not a demo project.
Over the past several months, I applied DevSpark to four consecutive features in this package. Each spec is in the repository under /specs/. Each solved a different type of problem — a documentation site, a quality cleanup, an architectural split, and a new feature. Each one illuminated something different about what the methodology costs, what it saves, and what it preserves.
This is what it actually looked like.
The Codebase Before Specs
Before the first spec was written, HttpClientUtility was a working, tested, useful library with a real problem: its institutional knowledge lived entirely in my head.
Why did the FireAndForget implementation use a specific exception-handling pattern? Commit history told you what changed, not why. Why did the resilience policies distinguish transient from non-transient errors in that particular way? There was a reason — a production incident that taught it — but it was undocumented.
This is the standard state for solo-maintained open-source libraries. The author knows everything. The documentation covers the happy path. The edge cases and architectural decisions that shaped the code exist nowhere except memory that fades.
The four specs changed that.
Spec 001: The Documentation Site
Type of problem: Greenfield within brownfield — a new artifact for an existing project
The initial requirement was approximately: "Build a documentation website for the NuGet package." That's a vague enough prompt that an AI assistant would generate something functional and wrong — a generic static site that breaks immediately on GitHub Pages because subdirectory deployment requires relative paths, and "standard" static site configurations use absolute paths.
The spec forced a different conversation first.
By the time the clarification loop was done, the requirement had become specific: Eleventy 3.0 (over Hugo, Jekyll, and Astro — with the evaluation criteria documented), GitHub Pages deployment at a specific subdirectory path, all paths relative and dynamically calculated, NuGet API integration with cache fallback for build resilience, Prism.js with exactly four target languages (C#, JavaScript, JSON, PowerShell — no kitchen sink), and performance targets of 90+ Lighthouse scores across all categories.
The spec artifacts ran to over 37KB for the specification alone. That sounds like a lot until you consider what it replaced: multiple iterations of generate-test-debug-regenerate trying to make a static site deploy correctly to a subdirectory.
What survived in the spec artifacts that wouldn't have survived otherwise: the rationale for choosing Eleventy over the alternatives. When I revisit this in a year, I won't have to reconstruct why Astro was considered and rejected, or why Jekyll was ruled out. The decision is documented where the decision lives.
What the spec caught: The path resolution requirement — every path must be relative, calculated dynamically — would have been invisible until the first deployment attempt. Catching it at spec time cost an hour of thinking. Catching it at deployment time would have cost debugging time plus the confidence tax of watching generated code fail in production.
Spec 002: Zero Compiler Warnings
Type of problem: Quality remediation — cleaning up what had accumulated
"Fix the compiler warnings" is a deceptively vague requirement. It sounds obvious. What it doesn't specify: what constitutes a proper fix. The obvious answer — #pragma warning disable — silences the warning. The right answer depends on a quality standard that has to be stated explicitly, not assumed.
The spec for this feature had to answer eight clarification questions before planning could begin:
- What counts as an acceptable fix for a nullable reference type warning?
- Are
#pragmasuppressions ever acceptable, and if so, under what conditions and with what documentation? - Do test methods require XML documentation? If so, what level of detail?
- What is the sequencing when one warning fix reveals another?
These aren't edge cases. They're the substance of the work. Without explicit answers, an AI assistant will make reasonable-sounding choices that may or may not match your actual standards — and you won't know until you're reviewing code.
The spec answered them with specificity. The fix strategy: ArgumentNullException.ThrowIfNull() for null guards (not conditional checks), full XML documentation on all public APIs (including test methods that document what and why, not just what), TreatWarningsAsErrors enabled in Directory.Build.props, and suppressions limited to a maximum of five with documented justification for each.
The result — a zero-warning build across net8.0 and net9.0, with 711 tests passing — reflected those standards exactly. More importantly: the standards are now in the spec artifacts. The next feature that touches public APIs inherits the documentation requirement. The quality bar compounds.
What the spec caught: The instinct to silence warnings rather than fix them. That instinct is fast and feels productive. The spec created a gate before that shortcut could be taken — not by prohibiting it, but by forcing explicit articulation of the alternative. When you have to write down "no pragma suppressions except with documented justification," it changes what feels like a reasonable default.
Spec 003: Splitting the Package
Type of problem: Architectural — changes with real consequences for real consumers
This is where spec-driven development earned the most of its overhead.
The core question: WebSpark.HttpClientUtility had grown into a monolithic package. The SiteCrawler and related web crawling components pulled in five additional specialized dependencies that core HTTP utility users didn't need. Every consumer was carrying those dependencies regardless of whether they ever crawled a site.
The solution — split into a base package and a crawler extension package — was obvious in principle and genuinely complex in execution. Architectural splits affect consumers. Breaking changes in NuGet packages are painful for everyone downstream. Package versioning, strong-name signing, GitHub Actions atomic publishing, and consumer migration all had to be designed before the first line changed.
The spec surfaced the decisions that had to be made:
- Lockstep versioning: Both packages would always release at the same version number, eliminating the consumer confusion of mismatched versions
- Zero breaking changes for core users: Consumers who never imported crawling features would upgrade without a single code change
- Explicit breaking change communication for crawler users: They would need to add the new package and register the crawler service separately — documented in the migration guide before implementation began
- Atomic publishing: The GitHub Actions workflow would publish both packages in a single atomic operation or roll back both, eliminating the split-brain state where a consumer could install mismatched versions
The implementation complete note records the outcome: base package at 9 dependencies (reduced from 13), crawler package with 5 additional specialized dependencies, package size reduction over 40%, and zero unexpected breaking changes.
What would have happened without a spec? The split would have happened — the problem was real enough that it couldn't be ignored. But the decisions about versioning, breaking change communication, and atomic publishing would have been made ad hoc during implementation, under pressure, when the cost of getting them wrong was highest. The spec moved those decisions to the cheapest possible point: before any code changed.
What the spec caught: The atomic publishing requirement. It's not obvious until you're staring at a scenario where package A is published and package B fails — and you've just shipped a state to NuGet.org that consumers can encounter. The spec forced thinking through the failure modes before they could occur.
| Decision | Where It Was Made | Cost of Getting It Wrong |
|---|---|---|
| Lockstep versioning | Spec time | Documentation update |
| Zero breaking changes for core | Spec time | Spec revision |
| Atomic publish or rollback | Spec time | GitHub Actions configuration |
| Migration guide for crawler users | Spec time | Writing the guide |
| If discovered post-release | Consumer complaints | Hotfix, comms, apology |
Spec 004: Batch Execution Orchestration
Type of problem: New feature — capability that didn't exist before
The batch execution spec is the most recent, and the most instructive about how specs evolve during implementation.
The feature: template-based batch HTTP request execution with combinatorial parameter substitution, configurable concurrency, and percentile statistics collection (P50, P95, P99). Think load-testing tooling that runs inside the library itself — useful for developers characterizing API behavior.
The spec captured seven design decisions that would have been made silently otherwise:
- The orchestrator is independent from the existing decorator chain (not an extension of the resilience or caching layers, but a separate service above them)
- Template substitution uses
{placeholder}tokens with a special{{encoded_user_name}}token for URL-safe encoding — a single-pass renderer with no recursive processing - Statistics use snapshot-based percentile calculation — samples collected into a sorted array at completion rather than streaming approximations
- Hashing uses SHA-256 from the BCL (no external dependency added for response body comparison)
- Progress reporting uses REST-friendly polling for the demo UI, not SignalR dependency — keeping the core library free from real-time infrastructure concerns
- The batch runner is opt-in via
HttpClientUtilityOptions, not registered by default - Demo runs are capped at 50 requests to prevent the sample application from being used as an unintentional load-testing tool
Each of these decisions has a reason. The spec recorded the reasons at the moment they were made.
The implementation deviated from the original plan in one significant area: the SignalR progress reporting, initially scoped as a core feature, was moved to the demo application only. That deviation was fed back into the spec artifacts — the implementation notes document the why. Future work on the batch execution feature inherits that boundary without having to rediscover it.
What the spec caught: The scope creep tendency. Batch execution is adjacent to load testing, which is adjacent to reporting, which is adjacent to dashboards. Without a spec defining the boundaries explicitly, each of those adjacencies is an invitation to expand scope. The spec's explicit opt-in registration requirement was the clearest signal that this was a focused utility feature, not a platform.
What Four Specs Built
Looking across the four specifications, something more durable than the features themselves accumulated: a set of documented decisions that form the beginning of a real constitution for this package.
Not a formal constitution yet. But the raw material:
- Path resolution standard: All links and asset references must be relative, calculated dynamically. No environment-specific configuration in templates. (From Spec 001)
- Documentation standard: All public APIs require XML documentation. Test methods require documentation explaining what and why, not just what. (From Spec 002)
- Warning standard: Zero warnings in all target frameworks.
TreatWarningsAsErrorsenabled. Suppressions require documented justification with a maximum of five. (From Spec 002) - Package boundaries: Core utilities ship without crawling dependencies. Crawler extensions ship separately. Both packages version in lockstep. (From Spec 003)
- Dependency discipline: No external dependencies added for functionality available in the BCL. New dependencies require explicit justification in the spec. (From Spec 004)
- Scope control: New features default to opt-in registration. Demo-specific functionality stays in the demo project, not the library. (From Spec 004)
These aren't aspirational principles. They're documented decisions that shaped real code — with the rationale preserved in the spec artifacts. The next contributor who opens this repository won't have to rediscover them. The next AI session that reads the specs won't assume generic patterns where specific decisions already exist.
That's what four specs built, beyond the four features.
The Honest Accounting
The overhead is real. Each spec took several hours to write and refine before any implementation began. The documentation site spec alone runs to 37KB before the plan or tasks — that's a substantial investment for a feature that a competent developer could prototype in an afternoon.
But the accounting has to include both sides:
| Feature | What the Spec Prevented | Estimated Cost of Discovery Without Spec |
|---|---|---|
| Documentation site | Wrong deployment path strategy | 2-4 hours debugging post-deployment |
| Compiler warnings | Suppression-first instinct | Quality debt that compounds silently |
| Package split | Ad-hoc versioning decisions under pressure | Consumer-facing breaking changes, hotfix cycle |
| Batch execution | Scope drift into load-testing platform | Weeks of scope creep, architectural coupling |
The documentation site spec paid for itself the first time the GitHub Pages deployment succeeded without iteration. The package split spec paid for itself by moving the consumer-impact decisions to the cheapest possible moment.
The compiler warnings spec paid for itself differently — not in prevented debugging, but in accumulated quality standards that compound across every subsequent feature. That's the payoff that's hardest to measure but most durable.
Resources
The full specification artifacts for all four features are available in the WebSpark.HttpClientUtility repository — including spec, research, data model, plan, tasks, and quickstart documents for each feature. Reading the actual artifacts alongside this article shows what spec-driven development produces at each stage of the pipeline.
Explore More
- DevSpark: Months Later, Lessons Learned -- What months of real spec-driven development actually taught me about AI
- Why I Built DevSpark -- Building the tool I needed to survive the reality of brownfield developm
- DevSpark: Constitution-Driven AI for Software Development -- DevSpark aligns AI coding agents with project architecture and governanc
- Taking DevSpark to the Next Level -- A practitioner's guide to bridging three decades of enterprise developme
- DevSpark: The Evolution of AI-Assisted Software Development -- From requirements discipline to continuous governance — a complete frame
