About a month ago, I wrote a post about how unit tests affect (and apparently don’t affect) codebases. That post turned out to be quite popular, which is exciting. You folks gave a lot of great feedback about where we might next take the study. I’ve incorporated some of that feedback and have a followup on the unit test effect on codebases.
Summarized briefly, here are the high points of this second installment and refinement of the study:
- Eliminating the “buckets” from the last time.
- Introducing more statistical rigor.
- Qualifying and refining conclusions from last time.
Also, for the purposes of this post, please keep in mind that non-incorporation of feedback is not a rejection of that feedback. I plan to continue refinement but also to keep posting about progress.
Addressing Some of the Easier Questions and Requests
Before getting started, I’ll answer a few of the quicker-to-answer items that arose out of the comments.
Did your analysis count unit test methods when assessing cyclomatic complexity, etc.?
Yes. It might be interesting to discount unit test methods and re-run analysis, and I may do that at some point.
Can you show the code you’re using? Which codebases did you use?
The scraping/analysis tooling I’ve built using the NDepend API is something that I use in my consulting practice and is in a private repo. As for the list of specific codebases, I’m thinking I’ll publish that following the larger sample size study. In the most general terms, I’m going through pages like this that list (mostly) C# repos and using their links.
What about different/better categorization of unit test quality (test coverage, bolted on later vs. written throughout vs. demonstrably test driven)?
This is definitely something I want to address, but the main barrier here is how non-trivial this is to assess from a data-gathering perspective. So I will do this, but it will also take time.
Think of even just the anecdotally “easy” problem of determining TDD vs. non-TDD. I approximated this by positing that test-driving will create a certain ratio of test methods to production methods since any production method will be preceded by a test method (notwithstanding future extract method refactorings). We could, perhaps, do better by auditing source control history and looking for a certain commit cadence (modification to equal numbers of test/production classes, for instance). But that’s hard, and it doesn’t account for situations with large batch commits, for instance.
The upshot is that it’s going to take some doing, but I think we collectively can figure it out.