The last episode of this series was heavy on theory, so let’s balance it out a bit with some applied science. The lofty goal of this series of posts is to construct a way to predict the time to comprehend a method. But, regardless of how that turns out and how close we get, we’re going to take a detailed look at NDepend, static analysis, and code metrics along the way.
One of the simplest hypotheses from the last post was, “the more statements there are in a method, the longer it will take to comprehend.” Intuitively, this makes sense. The more stuff there is in a method, the longer it will take to grok. But you’ll notice that I said “statement” rather than “line of code.” What did I mean by that? Are these things interchangeable?
No, it turns out. There are actually a few different ways to reason about “lines of code.” Here are three that I’ll discuss today.
- Source/Physical Lines of Code: count up the actual file lines occupied by the method or class.
- Logical Lines of Code (LLOC): more or less lines of executed code, so you don’t count a curly brace as a line.
- IL Instructions: the number of byte code/intermediate language that your code results in when compiled.
So which is the right one? Well, if only life were that simple. They each have situational advantages and disadvantages. Physical lines of code is the easiest to compute and reason about, but it is prone to distortion. After all, if I am in the habit of adding a line break between every single line of code in my code base, this will enormously bloat my physical lines of code in the code base, but without adding much in the way of overhead complexity. Logical lines of code cuts most to the heart of the issue because it talks about executable code. But, it requires PDB files to compute and completely ignores things like abstract declarations which, while not executable, still contribute cognitive overhead and complexity. And finally, IL instructions may be the great equalizer across languages that target the framework, but it’s sort of hard for you to control without really diving into the workings of the compiler.
Let’s not pick a “right” one, then. Instead, for now, let’s take a look at how we can access each one of these and how they differ for some pretty simple methods.
(Here is the address of the repository that I’m using in the video).
To summarize the video, you can access physical lines of code easily in the IDE, but you can augment this view of your lines of code with features available through CQLinq. Of course, you don’t need to create your own custom query to do that. It’s available to you in the NDepend menu, under metrics, as shown here.
With the requisite back story in place, let’s now talk about which to use for contemplating the time to comprehend for a method. Probably the easiest to eliminate is the number of IL instructions. This metric is just as much a function of the compiler as it is of the readability of the code. After all, we could see that IL instructions seemed to correlate somewhat with the complexity of what was going on in the method, but the relationship was not always obvious. And, frankly, for our purposes, we don’t really care about the compiler.
It’s a little more difficult, but I’m also going to eliminate physical lines of code (at least for now). Every method has a signature, and every method in the overwhelming majority of C# code bases has a line dedicated to the opening and closing curly braces for the method. So, it’s not a differentiating factor to consider those things. White space, on the other hand, does differ, and methods with control flow statements may or may not have lines dedicated to curly braces of their own, so it’s not as though there won’t be variance.
But I submit that the variance isn’t significant in terms of comprehension. Apart from the occasional gotcha of a control flow statement omitting curly braces or not (which we could always isolate separately), there aren’t too many pure physical concerns that threaten to muddy understanding. Does code with more white space between lines take longer to read? Maybe marginally, if the more compact method doesn’t require scrolling and the other one does. But, on balance, it’s probably relatively insignificant.
And so, I’m going to start out considering logical lines of code, or, in NDepend parlance “NbLinesOfCode” or sometimes just “Lines of Code.” We’re going to send out an experiment to try to see what effect the number of lines of code has on time to comprehend. If you haven’t yet signed up to participate and want to do so, you can sign up here.
Next time in the series, I’ll incorporate the results of the experiment in the first bit of tentative construction of our metric, and we’ll take a look at how to examine another potentially relevant property of our source code.