Today, I give you the third post in a series about how unit tests affect codebases.
The first one wound up getting a lot of attention, which was fun. In it, I presented some analysis I’d done of about 100 codebases. I had formed hypotheses about how I thought unit tests would affect codebases, and then I tested those hypotheses.
In the second post, I incorporated a lot of the feedback that I had requested in the first post. Specifically, I partnered with someone to do more rigorous statistical analysis on the raw data that I’d found. The result was much more clarity about not only the correlations among code properties but also how much confidence we could have in those relationships. Some had strong relationships while others were likely spurious.
In this post, though, I’m incorporating the single biggest piece of feedback. I’m analyzing more codebases.
Analysis of 500 (ish) C# Codebases
Performing static analysis on and recording information about 500 codebases isn’t especially easy. To facilitate this, I’ve done significant work automating ingestion of codebases:
- Enabling autonomous batch operation
- Logging which codebases fail and why
- Building in redundancy against accidentally analyzing the same codebase twice.
- Executing not just builds but also NuGet package restores and other build steps.
That’s been a big help, but there’s still the matter of finding these codebases. To do that, I mined a handful of “awesome codebase” lists, like this one. I pointed the analysis tool at something like 750 codebases, and it naturally filters out any that don’t compile or otherwise have trouble in the automated process.
This left me with 503 valid codebases. That number came down to 495 once adjusted for codebases that, for whatever reason, didn’t have any (non-third party) methods or types or that were otherwise somehow trivial.
So the results here are the results of using NDepend for static analysis on 495 C# codebases.