You could easily find yourself mired in programmer debates over code coverage. Here’s one, for instance. It raged on for hundreds of votes, dozens of comments, many answers, and eight years before someone put it out to pasture as “opinion-based” and closed it.
Discussion participants debated a “reasonable” percentage of code coverage. I imagine you can arrive at the right answer by taking the mean of all of the responses. (I’m just kidding.)
Some of the folks in that thread, however, declined to give a simple percentage. Instead, they gave the obligatory consultant’s response of “it depends” or, in the case of the accepted answer, a parable. I found it refreshing that not everybody picked some number between 50 and 100 and blurted it out.
But I found it interesting that certain questions went unasked. What is the goal of this code coverage? What problem does it solve? And as for the reasonability of the number, reasonable to whom?
What is Code Coverage? (Briefly)
Before going any further, I’ll quickly explain the concept of code coverage for those not familiar, without belaboring the point. Code coverage measures the percentage of code that your automated unit test suite executes when it runs.
So let’s say that you had a tiny codebase consisting of two methods with one line of code each. If your unit test suite executed one method but not the other, you would have 50% code coverage. In the real world, this becomes significantly more complicated. In order to achieve total coverage, for instance, you must traverse all paths through control flow statements, such as if conditions, switch statements, and loops.
When people debate how much code coverage to have, they aren’t debating how “correct” the code should be. Nor do they debate anything about the tests themselves, such as quantity or quality. They’re simply asking how many lines of code per one hundred your test suite should cause to execute.
You can get tools that calculate this for you. And you can also get tools that factor it into an overall code quality dashboard.
The Origins of Worry Over Code Coverage
So why does anyone start caring about this figure in the first place? To actually try to address a debate over the “reasonable” percentage, you really need to understand the context.
First, I’ll take a stab at imagining why the person asking that Stack Overflow question, “sanity,” may have done so. Caveat emptor. I don’t know him at all, so I’m engaging in pure speculation. But a software developer might ask a question like that in order to establish a barometer for self-assessment. I want to make a reasonable investment in my automated test suite, so I want a sanity-check metric to see whether I’m on the right track. In this context, the metric serves as a benchmarking tool for developers.
But you’ll also see an entirely different archetype ask this same question: leadership. An architect, dev manager, or department head will ask what percentage the team should sustain and then instrument the teams’ codebases for measurement. In this context, the metric serves as a benchmarking tool used on developers. It becomes a de facto part of their performance evaluations.
The Perils of Well-Intentioned Micromanagement
To understand how these two contexts differ, I’ll use analogy. Think about a home inspector.
Let’s say that you purchase a house and hire a home inspector. You’re understandably excited when the inspector tells you that the house you’re about to buy is in great shape. So you complete the purchase and move in. And then everything starts falling apart. You feel utterly betrayed by the inspector’s incompetence.
So the next time you buy a house, you resolve to do things differently. You start to research the elements of a good home inspection, and you find something like this checklist via Google. And then you reason that, while you don’t know all of the details, you can evaluate your next inspector on the basis of what percentage of things on that list he does.
But to your exasperation, your metric doesn’t seem to help. It turns you that nobody could score perfectly because your prospective house lacks a fireplace or any stucco on the outside. And when you selected the highest-scoring person, it went poorly. It seems that, while he did inspect the highest number of things, he did a bad job of it.
Failing to Solve the Actual Problem
If you did a post mortem on the inspection process, you’d realize that you failed to address the actual problem. Specifically, you don’t understand home inspections, and you’re in a position of trying to hire and evaluate people that do.
When it goes poorly the first time, you give yourself a cursory education in home inspections. Then, you try again, futilely micromanaging the next inspector. Downloading a checklist from the internet and asking the guy to fill it out doesn’t alter the knowledge gap at all. It’s just a lot of sound and fury on your part.
You need to address the problem of hiring in the dark by finding someone you can trust. And you do that not with haphazardly administered metrics, but through some other strategy, such as looking on Angie’s List or asking friends for recommendations.
The Dangers of Reductionist Metrics in Software
It goes no differently in the software world. A manager asking a question like, “What is the right percentage of code coverage?” is the frustrated home buyer evaluating inspectors by “percentage of this checklist completed.”
And the developers can hand them fools’ gold just as easily as the inspector saying, “Yep, I totally checked all of that stuff — mark me down for 100%.” They could achieve this by writing a bunch of unit tests that assert nothing and cannot possibly fail.
public void CoverAllTheThings()
That method alone ought to be good for 40% coverage, depending how far it gets before stuff blows up. The dev team, knowing relatively little about unit tests, could spend an afternoon writing things like this and generate a figure that looks awfully nice at a leadership governance meeting.
Of course, developers, unless utterly disenfranchised, rarely engage in anything that starkly cynical. But with a deadline looming and some outwardly imposed code coverage mandate, that makes sense to cut corners … just until they could come back and fix it, of course.
Code Coverage, Properly Considered
I’m a practitioner of test driven development (TDD) and have been for years. This means that I commonly write code at or near 100% coverage. You’d think I’d take the opportunity to crow about my figure, but I actually don’t even pay attention to my coverage percent. It really doesn’t matter to me.
What I do pay attention to, however, is what bits of code don’t have covering unit tests. I treat this code with suspicion because I have no automated proof of how someone expects it to behave. I seek to get it under test.
Code coverage percentage, you see, offers a reductionist proxy for a much more complicated concept that only those steeped in the codebase understand. Where are our blind spots? What does our testing address (or fail to address)? Developers then take that information and use it to make decisions and recommendations about risk, complexity, and design tradeoffs.
When developers discuss a “reasonable” code coverage percentage, they’re speaking in a subtle shop language about these blind spots and their implications. If management tries to eavesdrop and appropriate this language, the rich meaning completely evaporates, leaving only an inevitable example of the law of unintended consequences.
Management shouldn’t become well versed in code coverage. They should become well versed in hiring developers that know whether it matters or not — and then trusting them to use the metric as they see fit.