Managing Blind
Let me see if I can get in your head a little bit. You manage a team of developers and it goes well at times. But, at other times, not so much. Deadlines slide past without a deliverable, weird things happen in production and you sometimes get deja vu when a bug comes in that you could swear the team fixed 3 months ago.
What do you do? It’s a maddening problem because, even though you may once have been technical, you can’t really dive into the code and see what’s going on. Are there systemic problems with the code base, or are you just experiencing normal growing pains? Are the developers in your group painting an accurate picture of what’s going on? Are they all writing good code? Are they writing decent code? Are any of them writing decent code? Can you rely on them to tell you?
As I said, it’s maddening. And it’s hard. It’s like coaching a sports team where you’re not allowed to watch the game. All you can do is rely on what the players tell you is going on in the game.
A Light in the Darkness
And then, you light upon a piece of salvation: automated unit tests. They’re perfect because, as you’ll learn from modern boilerplate, they’ll help you guard against regressions, prevent field defects, keep your code clean and modular, and plenty more. You’ve got to get your team to start writing tests and start writing them now.
But you weren’t born yesterday. Just writing tests isn’t sufficient. The tests have to be good. And, so you light upon another piece of salvation: measuring code coverage. This way, not only do you know that developers are writing tests, but you know that you’re covered. Literally. If 95% of your code base is covered, that probably means that you’re, like, 95% good, right? Okay, you realize it’s maybe not quite that simple, but, still, it’s a nice, comforting figure.
Conversely, if you’re down at, say, 20% coverage, that’s an alarming figure. That means that 80% of your code is not covered. It’s the Wild West, and who knows what’s going on out there? Right? So the answer becomes clear. Task a team member with instrumenting your build to measure automated test coverage and then dump it to a readout somewhere that you can monitor.
Time to call the code quality issue solved (or at least addressed), and move on to other matters. Er, right?
Fools’ Gold?
I’ve managed software developers in the past, so I understand the disconcerting feeling that comes from not really being able to tell what’s happening under the covers. These days, I’m living a little closer to the keyboard, and my living comes largely from coaching developers, and doing IT management consulting and gap analysis. And a common gap is managers thinking that test coverage correlates a lot more strongly with well written, well tested code than it actually does.
Don’t get me wrong. In a nice, clean code base, you’re likely to see extremely high test coverage figures. But many managers get the cause and effect relationship backward. Really well crafted code causes high coverage. High coverage does not necessarily cause really well crafted code.
To understand the disconnect, consider the relatively common situation of purchasing a home. In the USA, part of this process is a (generally) mandatory home inspection in which a licensed home inspector examines the house in detail, making note of any deficiencies, problems, dangers, etc, and giving a general report on what kind of shape the house is in.
Now, you’re no carpenter, and you certainly don’t understand what all the home inspector is doing. But, you want to know if he’s doing a good job. So, you introduce a metric. You make note of how many rooms in the house he enters during his inspection. If he doesn’t cover at least 90% of the rooms, he’s not doing great work. If he only covers 20%, he’s terrible.
But the same problem arises. The fact that he wandered through every room in the house does not mean he did a good inspection. If he did a thorough inspection, the result will be that he covered a high percentage of rooms. If he covered a high percentage of rooms, he may or may not have done a thorough inspection.
There Are No Shortcuts
So, what if you timed the inspector? And, what if you introduced another metric in addition to code coverage to make it harder to game or to be wrong? Well, that might help a bit, but there are no easy answers, really. If you want to know whether the inspector does good work, hire another inspector or two, and triangulate. If you want to know whether a test suite is good or not, you need to have someone with experience do a lot more digging and inspection of the code. I know you don’t want to hear this because you’re picturing costs, but knowledge isn’t cheap.
I assess code bases a lot, and when looking at test suites, here are some questions I ask (and then answer):
- What is the average number of asserts per method? Way less than 1 means you might have useless tests. Way more than 1 means you might have convoluted, brittle tests.
- What percentage of test methods have a number of asserts other than 1? This might indicate a flawed approach to testing.
- What is the average size of a test method? Large test methods tend to indicate brittle tests that people will abandon.
- Do tests often refer to static/global state? This may indicate an unreliable test suite.
- Do tests have a lot of duplication? This may indicate poorly written tests or even a poor design of production code.
I could go on, but you get the idea. An important takeaway is that I say “may” and “might” in all of my explanations because, you know what? Life isn’t simple. These things may exist in your test code, and there may be a perfectly reasonable explanation. Just as test coverage may or may not be indicative of code quality.
Detecting whether things are on the right track or not isn’t easy. But the more metrics you have, and the more you use them as triggers for further investigation instead of quality gates, the better equipped you and your group will be.
A good article – as usual! – but you buried the lead!
“… the more you use them as triggers for further investigation instead of quality gates, the better equipped you and your group will be …”
As a manager, I mandate a specific code coverage % as a forcing-function to get the developers to “eagerly” adopt TDD, but the real productivity gain is mine; I use introspection of the test code as a proxy for evaluating both the core developer skill as well as a (very!) subjective measure of how well the developer understands the feature requirements.
IOW … we are in violent agreement that a strict code coverage metric is not the end in itself, it is but a means to the end.