Extending GitHub Spec Kit for Constitution-Based Pull Request Reviews

January 24, 202610 min read

Every mature codebase accumulates institutional knowledge that lives in scattered places. This article explores how to extend a SpecKit-based development workflow to perform AI-powered pull request reviews that validate changes against a project constitution—a living document capturing architectural principles, anti-patterns, and non-negotiable standards.

Part Two: Constitution-Based PR Reviews

Series: Part 1 | Part 2 | Part 3 | Part 4

If a project constitution is valuable during feature development (as explored in Part 1), it should also govern code reviews.

This article shows how to extend Spec Kit to perform AI-powered PR reviews that validate changes against your constitution—turning subjective code review disagreements into objective governance.

The Problem: Code Reviews Without Context

Every codebase has rules. Most are implicit, scattered, contradictory, or forgotten. This knowledge traditionally lives in problematic places:

Location	Problem
Senior developers' heads	Single point of failure
Scattered code comments	Nobody reads them
Wiki pages	Always outdated
PR review comments	Lost after merge

The result? Reviewers re-discover the same issues repeatedly. Junior developers unknowingly break conventions. Everyone wastes time explaining context that should be obvious.

The Solution: A Living Constitution

A GitHub Spec Kit constitution is a governing document for a repo that captures the fundamental rules your project must always follow. It encodes things like coding standards, testing requirements, security and privacy constraints, UX and accessibility expectations, and any architectural “musts” or “must nots.” The idea is that every spec, plan, and change generated with Spec Kit has to respect this constitution, so your system stays coherent over time instead of drifting with each new feature.

Because it acts as the project’s “source of truth” for intent, a good constitution is concrete and enforceable, not aspirational fluff. Rather than saying “we care about performance,” it might say “API p95 latency must stay under 300 ms for core operations” and “every new endpoint requires contract tests.” The clearer and more testable the rules are, the easier it is for both humans and AI to make consistent decisions without having to re-negotiate standards on every change.

In practice, you usually create the constitution early, then evolve it carefully as the system and organization mature. Many teams treat it as “slow to change but not frozen”: updates go through review, are versioned, and are communicated explicitly. That way, when you adjust something big—like adopting a new auth model, enforcing WCAG AA, or mandating feature flags for risky changes—you do it once in the constitution, and every subsequent spec generated by Spec Kit naturally follows the new rules.

The constitution includes:

Architectural Principles: What patterns are required and why
Anti-Patterns: What mistakes we've made before (with code examples)
Enforcement Mechanisms: How violations are detected
Amendment Process: How to change the rules (requires approval)

The difference between a constitution and documentation matters. A constitution is designed for machine parsing and automated enforcement.

Connecting the Constitution to PR Reviews

If you're already using SpecKit for specification-driven development (the /speckit.specify, speckit.plan, speckit.implement workflows), you can extend that same agentic approach for code reviews.

The Architecture

The implementation of a SpecKit PR Review has three layers:

+-------------------------------------+
|  /speckit.pr-review 99999           |  <- GitHub Copilot prompt file
+-------------------------------------+
|  pr-review.ps1                      |  <- PowerShell script (Azure DevOps API)
+-------------------------------------+
|  constitution.md                    |  <- Source of truth for review criteria
+-------------------------------------+

Layer 1: The Prompt (speckit.pr-review.prompt.md)

This structured prompt transforms your AI assistant from a generic code reviewer into a constitutional lawyer for your codebase:

### Goal
Perform a comprehensive code review of an Azure DevOps pull request,
validating all changes against the project constitution
(`.specify/memory/constitution.md`) and coding standards.

#### 1. Load Constitutional Framework
Read the project constitution and extract:
- Architectural Principles
- Development Workflow Standards
- Quality & Consistency Standards
- Anti-Patterns to Detect (Appendix A)

Layer 2: The API Integration (pr-review.ps1)

A PowerShell script fetches PR metadata from Azure DevOps:

function Get-PullRequestMetadata {
    param(
        [string]$PrId,
        [hashtable]$Headers
    )

    $endpoint = "git/pullRequests/$PrId"
    return Invoke-AzureDevOpsApi -Endpoint $endpoint -Headers $Headers
}

The script authenticates via Azure CLI, fetches PR details, retrieves changed files, and classifies files by code area. That classification triggers different constitutional principles:

function Classify-FileByArea {
    param([string]$FilePath)

    switch -Regex ($FilePath) {
        'app/api/routes/' { return 'API Routes' }        # -> Principle VI (Security)
        'app/core/azure/' { return 'Azure Integration' } # -> Principle IV
        'app/tests/' { return 'Tests' }                  # -> Testing Requirements
        'specs/' { return 'Specifications' }             # -> Spec Workflow
    }
}

Layer 3: The Constitution

The living document organized for machine parsing:

Principles with clear MUST/SHOULD/PROHIBITED language
Implementation patterns with code examples
Anti-patterns in appendices with explicit wrong/correct comparisons
Enforcement mechanisms that map to reviewable checks

The Review Framework

When the AI processes a PR, it applies a multi-dimensional review framework.

Constitutional Compliance Matrix

Principle	What It Checks
I. Template-Driven	Logic externalized to JSON? No hardcoded workflows?
II. Schema Stability	No new top-level fields?
III. API Versioning	Breaking changes in new version? Deprecation warnings?
IV. Azure-First	Sync Cosmos calls without await? Proper parameters?
V. Fire-and-Forget	Try/except/finally pattern? Single log entry?
VI. API Security	Depends(check_api_key)? Type normalization?

File-Specific Checks

For a route file, the review verifies:

Depends(check_api_key) used (not Header-only)
Type normalization before Pydantic instantiation
Module aliasing when parameter shadows import
Proper HTTP status codes (401 for auth, not 422)

Severity Classification

Issues are classified using constitutional language:

CRITICAL: Violates MUST requirement (blocks merge)
HIGH: Violates SHOULD requirement (should fix before merge)
MEDIUM: Style inconsistency (consider addressing)
LOW: Suggestion for improvement (optional)

Why This Approach Works

1. Consistency Without Memorization

Human reviewers forget edge cases. They have good days and bad days. They might catch an anti-pattern on Monday but miss it on Friday. The constitution-based review applies the same standards every time.

2. Knowledge Transfer at Scale

New team members don't need to absorb months of tribal knowledge. They see constitutional violations in their first PR, with explanations and references. The review process teaches the architecture.

3. Reduced Review Friction

Instead of "I don't like this pattern," the feedback becomes "This violates Principle VI." It's not personal—it's constitutional. This makes code review conversations more productive and less adversarial.

4. Governance Documentation That Gets Read

Most architecture documents rot on the vine. Nobody reads them because they're disconnected from daily work. A constitution that powers PR reviews gets read every time a PR is submitted. Developers encounter the principles when they matter most.

Implementation Tips

If you want to implement something similar:

1. Start with your actual pain points

Your constitution should evolve from real bugs. The schema stability principle exists because someone added a top-level field and broke the analytics pipeline. The API security pattern exists because endpoints were returning 422 instead of 401. Document what you've learned the hard way.

2. Use normative language

Not "we generally prefer"—use MUST, SHOULD, PROHIBITED. Make violations objectively detectable.

3. Include code examples for anti-patterns

Abstract principles are hard to review against. Concrete examples make detection trivial:

## WRONG
document = await get_document(...)

## CORRECT
document = get_document(...)

4. Organize for machine parsing

Section headers, consistent formatting, categorized appendices. The AI needs to extract patterns—make that easy.

5. Keep it living

Your constitution should have a version history. Changes require documented approval. It evolves with the codebase.

Reflections on Constitutional Review

This approach reflects a shift in how AI coding assistants can be most effective. The best results come not from asking an AI to "review my code," but from providing context about what matters.

A constitution provides that context, formalized.

Making architectural rules explicit and using them to power automated reviews transforms governance from an afterthought into a systematic practice. Your codebase has rules—writing them down and making your AI assistant enforce them creates consistency that scales.

This approach works with any AI that can process structured prompts and read files—GitHub Copilot, Claude, GPT-4, or others. The key insight is treating your architectural principles as a reviewable specification, not just documentation.

Where This Approach Shines

What makes constitutional PR reviews genuinely useful isn't the AI component — it's the act of writing the principles down in the first place. Once your architectural rules exist as a formal specification with normative language, they become enforceable, discussable, and evolvable. The AI review is just the mechanism that ensures they stay visible during every code change.

The shift from subjective feedback to objective governance is the real payoff. When a reviewer can point to a specific principle rather than a personal preference, the conversation changes. And because the constitution is encountered during every PR, it stays alive in a way that documentation wikis never do.

What's Missing

PR reviews work for new changes—but what about existing codebases? How do you discover the implicit constitution in legacy code? How do you audit codebase-wide compliance?

Part 3 addresses these gaps with brownfield discovery and site auditing capabilities.

Continue: Part 3: Building Spec Kit Spark →

References

GitHub Copilot: https://github.com/features/copilot
SpecKit Repository: https://github.com/Villamor-bot/specify
RFC 2119 (Key words for RFCs): https://www.rfc-editor.org/rfc/rfc2119