Engineering Metrics: The Git Spark Story

Traditional performance metrics weren't designed for the age of AI-assisted development. They track visible outputs like commits and tickets closed, but miss the work that AI tools do behind the scenes. This is the story of trying to measure AI contributions through Git data—and why it's harder than it looks.

Mark Hazleton October 2025 Engineering Metrics, Git Analytics

When "Objective Data" Leads You Astray

After writing about measuring AI's contribution to code, I was frustrated. I couldn't quantify how much AI tools were actually helping my development process. Git history seemed like the obvious answer—objective, comprehensive, and already being tracked. Every commit, every change, every contributor permanently recorded. Surely, I could extract meaningful insights from this data, right?

Wrong. Very wrong. That weekend, instead of just analyzing Git data, I decided to build something better. I wanted to create a tool that could honestly report what Git history actually contains—without the misleading "health scores" and fake productivity metrics that plague existing tools. This became git-spark, my first npm package.

I thought building the tool would give me answers. Instead, it taught me about all the questions Git history simply cannot answer. But that journey—from frustrated developer to package publisher—revealed something more valuable: the fundamental limitations of measuring software development through commit logs.

This is the real story of creating git-spark: a weekend project born from frustration, shaped by AI tools, refined through trial and error, and ultimately designed to admit what it doesn't know. If you're trying to measure developer productivity or AI contributions through Git data, here's what I learned about what you can and cannot reliably discover.

Building Git Spark: My First npm Package

That weekend, I decided to build what I wished existed: an honest Git analytics tool. Having used npm packages for years, I'd always wanted to create my own. Git-spark became that opportunity—and my crash course in package development.

The Weekend Build: Reality Check

Saturday Morning Goals

  • Use AI tools to generate initial code
  • Parse Git history and extract metrics
  • Create visualizations and reports
  • Publish to npm by Sunday evening

Sunday Night Results

  • AI tools gave me a great first build
  • Had to revise formulas multiple times
  • Realized most metrics were misleading
  • Still couldn't measure AI contributions
  • Learned what honest reporting means

The AI tools I used (ironically, the same ones I was trying to measure) gave me an impressive initial implementation. The code worked, the visualizations looked professional, and the metrics seemed authoritative. That's when I made a critical mistake: I actually read the formulas behind the numbers.

The Problem with "Health Scores"

My first version calculated a "Repository Health Score" based on commit frequency, author distribution, and code churn. It looked scientific. It generated impressive charts. And it was complete nonsense.

The formula assigned arbitrary weights to metrics we couldn't meaningfully interpret. A score of 87% health? What does that even mean? Healthy compared to what? Based on whose definition of healthy?

This realization forced multiple rewrites. Each iteration stripped away another layer of pretense, another attempt to derive meaning from data that simply didn't contain it. By version 0.5, I'd removed every evaluative metric and replaced them with honest observations.

The Problem with Current Metrics

The most popular Git-based metrics aren't just unhelpful—they're actively harmful. They measure motion instead of progress and incentivize behaviors that damage both code quality and team culture.

The Three Problematic Metrics

Commit Count

Rewards developers who split logical changes into artificially small commits. Punishes those who make well-structured, comprehensive changes.

Result: Noisy history and developers optimizing for metrics instead of quality.

Lines of Code

Measures verbosity, not value. Punishes refactoring and simplification. Rewards copy-paste programming and bloated implementations.

Result: Growing codebases that become harder to maintain.

Weekend Commits

Often interpreted as "dedication" when it actually signals burnout, poor work-life balance, or unrealistic deadlines.

Result: Normalized overwork and exhausted team members.

I learned this lesson the hard way. Within a month of introducing commit-based metrics, our most disciplined engineer—someone who routinely made comprehensive, well-tested commits— appeared to be our least productive. Meanwhile, a junior developer who committed after every minor change topped the charts. The metrics were giving us exactly the wrong signal.

What Git History Cannot Tell You

Building git-spark taught me something humbling: the most valuable aspects of software development leave absolutely no trace in commit logs. This was my biggest surprise and the most important lesson from the entire project.

I started this journey trying to answer one specific question: "How much of my code is generated by AI prompts?" After weeks of analysis, creating formulas, and examining every metric Git history offers, I have to admit defeat. Git simply doesn't—and cannot—record that information.

What Git Records

  • Files changed
  • Lines added/removed
  • Timestamp of changes
  • Commit author (always human)
  • Commit message (human-written)

What I Needed (But Doesn't Exist)

  • AI assistance level per commit
  • Prompt-to-code traceability
  • Human vs. AI authorship percentage
  • Code review quality and outcomes
  • Design decisions and trade-offs
  • Testing effort and coverage
  • Deployment success and impact
  • Refactoring vs. new feature work

This realization completely changed how I thought about developer productivity measurement. Every existing tool I'd examined—the ones claiming to measure "health" or "productivity"—suffered from the same fundamental flaw: they measured motion, not value. They counted commits but ignored impact.

Building Honest Metrics

Here's where I started to understand what Git analytics could legitimately provide. The problem wasn't using Git data—it was pretending it measured things it didn't. "Repository health" is marketing speak. "Activity patterns" is honest reporting.

Many tools generate "health scores" or "productivity ratings" that sound authoritative but are fundamentally subjective. They take the limited data Git provides, apply arbitrary weights and thresholds, then package it as objective truth.

Activity Index: What We Can Honestly Measure

Instead of fake health scores, we can measure observable activity patterns and let humans interpret what they mean in context:

Commit Frequency (Normalized)
How often commits happen, adjusted for team size and project phase. Signals whether development is active, stalled, or sporadic.
Author Participation Breadth
How many team members contribute relative to total volume. Reveals whether work is distributed or concentrated.
Change Size Variability
Coefficient of variation in commit sizes over time. Indicates consistency in development rhythm and working style.
File Touch Patterns
Which files change together frequently and who works on them. Exposes coupling, specialization, and potential bottlenecks.

This shift from evaluation to observation is crucial. We're not saying the team is productive or unproductive. We're saying "here are the patterns we observe; you decide what they mean for your context."

Team Patterns in Code

This is where Git analytics gets genuinely interesting: not for measuring individual productivity, but for revealing patterns that would otherwise remain invisible. Your code structure mirrors your team structure, and Git history exposes that relationship.

Conway's Law states that organizations build systems that mirror their communication structure. Git analytics makes this visible through patterns that emerge from how developers interact with the codebase. These patterns don't tell you if your team is "good" or "bad"—they tell you how your team actually works, which is far more valuable.

File Specialization Index

Measures how concentrated code ownership is across files. High FSI means few people touch each file; low FSI means broad collaboration.

What It Reveals:
  • Potential knowledge bottlenecks
  • Areas of deep expertise vs. shared ownership
  • Bus factor risks

Ownership Entropy

Measures how evenly contributions are distributed across authors. High entropy means balanced collaboration; low entropy means concentrated ownership.

What It Reveals:
  • Whether "collaboration" is real or superficial
  • Dominant contributors vs. peripheral participants
  • Team knowledge distribution patterns

Co-Change Coupling: Hidden Dependencies

Files that change together frequently reveal architectural coupling that may not be obvious from static code analysis. This helps identify:

  • Architecture Friction: Files that shouldn't be coupled but always change together
  • Hidden Dependencies: Coupling between seemingly unrelated modules
  • Team Coordination Needs: Files requiring coordination between their maintainers

I discovered this accidentally when analyzing why certain features always took longer than estimated. Git analytics revealed that three supposedly independent modules had high co-change coupling—they couldn't be modified independently in practice, even though the architecture said they could. This explained why every "simple" change rippled through the system.

What Git Spark Does

Despite failing to measure AI contributions, I succeeded in creating something useful: an honest Git analytics tool that respects what the data can and cannot tell us. This became git-spark, my first npm package.

The Honest Approach

After multiple rewrites, here's what git-spark actually does:

What It Reports
  • Observable patterns in commit history
  • File coupling and change frequencies
  • Author contribution distributions
  • Temporal development trends
  • Code structure evolution over time
What It Refuses to Infer
  • Fake "health scores" or "productivity ratings"
  • Developer rankings or comparisons
  • Code quality judgments from LOC
  • AI contribution percentages
  • Anything not directly observable in Git

Git-spark doesn't tell you whether your repository is "healthy" or your team is "productive." Instead, it shows you patterns and lets you interpret them with your domain knowledge.

Try It Out

# Install globally
npm install -g git-spark

# Or use with npx
npx git-spark analyze

Get transparent insights into your repository's activity patterns. No magic formulas, no fake scores—just observable facts from your Git history.

Lessons Learned

I set out to answer a simple question: "How much of my code is generated by AI prompts?" I built an entire analytics tool, published my first npm package, and learned more about Git internals than I ever expected. And I still can't answer that question.

But that failure taught me something more valuable: the discipline of honest measurement. Not every question has a data-driven answer. Not every metric is meaningful. And tools that claim to measure everything often measure nothing reliably.

What This Journey Produced

Technical Skills

  • My first published npm package
  • Understanding of Git internals
  • Experience with testing and validation
  • Package organization skills
  • TypeScript/JavaScript development

Important Insights

  • Limits of Git-based metrics
  • Difference between motion and progress
  • Value of honest vs. authoritative reporting
  • Why AI contributions remain invisible
  • What makes metrics trustworthy

Git-spark doesn't tell you if your team is productive. It doesn't measure code quality. It can't identify AI-generated code. What it does is report observable patterns in your repository's history and then—crucially—shuts up. No fake scores. No pretend insights. Just data and the humility to admit its limitations.

Building git-spark taught me that the best metrics tools don't pretend to have all the answers. They provide honest data and trust you to ask better questions. That's the philosophy behind every line of code in git-spark, and I hope it's useful to others wrestling with these same measurement challenges.