Today I offer another one of the code research posts we’ve been doing. If you want more backstory on the series, check out the last post in the series, where I give a brief history. You should also read it if you want to understand both what I mean by functional C# and for details about its impact on codebases at the method level.
Quick editorial note: a couple of people have commented/sent notes asking about p-values. I’ve been eliding those to keep the posts more narrative. But as we’ve expanded the set of variables we capture, we’ve been looking only at dramatically lower p-values. Those cited in this post, for instance, range between 0 and 0.04, with most being less than 0.01.
I’ll summarize the last functional study here, briefly. Last time, I studied about 500 codebases to see what functional-style programming did to methods and types. And the answer was that it made them less object-oriented, but it had surprisingly little influence on clean code statistics, like
- Lines of code per method or type.
- Cyclomatic complexity.
- Parameters per method.
- Method nesting depth.
- Methods per type.
I expected that functional codebases would correlate with a reduction in all of those things. In other words, I figured that functional-style programming would lead to smaller, clearer, more focused, and less complex methods. It didn’t.
Undaunted, I vowed to take a broader look at the effect of functional programming on a wider array of concerns. And I did just that, with the help of my partner who runs the statistical regressions.
Broadening the Study of Functional C#
We did two things differently for the purpose of this analysis:
- Studied slightly more codebases.
- Studied many, many more code properties/metrics.
With regards to the codebases themselves, we’re constantly expanding the corpus that we study. So last time we studied roughly 500, and this time it was about 570 (which included the 500 from last time). Lest you worry that this materially altered what we saw last time, rest assured that it didn’t. The p values and r^2 figures for the items I mentioned in the introduction still proved insignificant (i.e., functional programming does not affect those things).
More germane to the rest of the post, however, is the broader array of things we studied. This included an enormous cross-section of the metrics that NDepend provides, including more at the method and type level and also expanding to the namespace and assembly level.
Why study so much more?
Well, simple. I wanted to see a more holistic and more architectural view of the codebases and the effect of functional C# on them. The fact that it didn’t make the methods squeaky clean doesn’t mean that it didn’t do anything. And I wasn’t disappointed with the results. It did a lot of stuff, all of which I find interesting.
The Curious Case of IL Instructions and Complexity
Recall that we recently examined counting lines of code in detail. Generally speaking, I use NDepend’s NbLinesOfCode (corresponding to logical lines of code). This approximates the counts you see in your editor but eliminates a lot of noise. You could think of it corresponding to “statements” in the language.
And it’s this figure that I almost always use in assessing codebases. If this figure gets north of five for a method, it becomes harder to understand, and if you start getting above 20, look out—you’ve got maintenance difficulties on your hand. I was surprised that the average for this figure per method and type didn’t vary as codebases became more functional.
But do you know what did?
IL Instructions per method goes down significantly in more functional codebases. IL cyclomatic complexity does as well, also quite significantly, if with a shallower slope.
That’s fascinating.
This means that methods across all of these codebases are relatively uniform in terms of C# complexity and instructions per method. But somehow OO codebases are, for lack of a better term, denser. I don’t have a great hypothesis as to why object-oriented methods and types produce more IL logic, so I’d be pretty interested to hear your take in the comments.
Functional C# Creates a Massive Uptick in Immutability
Let’s move on to something a lot less mysterious. Here’s a graph of the rate of type immutability against the rate of codebases being functional.
The scatter plot indicates a direct and obvious correlation. The statistical relationship is also quite strong. I hadn’t previously been capturing immutability rates, and now I am.
This shouldn’t be particularly surprising, but it also wasn’t a given. Functional methods are ones that have no side effects at the instance, type, or global levels. Immutable types are ones whose state doesn’t change post-creation. You’d expect these to relate, but you could construct a codebase (e.g., one with an abundance of global state) where these were orthogonal.
Still, it’s nice to see strong confirmation of the hypothesized result. Immutable types are more and more in favor these days, and they’re easy to reason about. And functional codebases tend to have many more immutable types.
Functional Codebases Have Significantly More Decoupled Architecture
Alright, let’s move on to the star of the show, in my opinion. Last time around, I was gathering almost nothing in the way of what you might call architectural statistics (assembly and namespace level statistics, coupling, cycles, etc.). The only thing that I did capture, average method rank, showed no variance.
This round was a much different story. Here are some of the statistically significant relationships we discovered:
- Functional codebase methods invoke fewer methods than their more OO counterparts.
- The figure for ABT (association between types) is lower in functional codebases.
- Functional codebase types have much lower fan-in and fan-out.
- Each namespace in a functional codebase has slightly fewer child namespaces.
- There’s a significant reduction in coupling among namespaces.
- Each assembly in functional codebases uses fewer other assemblies and fewer external types, on average.
By pretty much every measure (except the initial study of method rank), functional codebases have less snarl among their elements. From an architectural standpoint, this is unambiguously preferable. It makes code easier to maintain, migrate, partition, and deploy.
Pure Static and Extension Methods
Here’s another result that will probably not surprise anyone reading. Functional codebases have an extremely strong, statistically significant relationship with both the rate of extension methods and static methods in general. Here’s what the plot of functional versus static looks like.
There’s another critical piece to this puzzle, however. Functional codebases have a strong inverse relationship with the rate of methods changing type-level (meaning static) state. So:
- Functional codebases have more static methods.
- Functional codebases have less static state.
This means that our functional codebases contain way more of what I’m calling “pure” static and extension methods. These are non-instance methods that perform some kind of transformation purely on their inputs—the hallmark of the functional approach.
Again, this isn’t really surprising, but it is important. These codebases could, for instance, have achieved pure functionality via instance methods, but the fact that the developers of the codebases make them static indicates that they’re complying with static analysis guidance and that they’re making intentional design decisions. Both of these are indicators of good design, in my book.
Some Interesting Miscellany
Before wrapping up, I thought I’d sprinkle in some other curious relationships. These don’t necessarily have significant design/architecture implications, but they might raise an eyebrow, the way they did for me.
- Functional codebases have smaller instance sizes on average.
- You see generally less boxing and unboxing in functional codebases (perhaps indicating more elegant handling of types in general).
- Functional codebases make less use of events.
- They also use async less frequently (which surprised me, since less state would seem to do better in the async world).
- You’ll find way more generic methods in functional codebases, but there’s no relationship with generic types.
- Functional codebases are more likely to use PInvoke.
- And finally, functional codebases are far more likely to have more unit tests.
Generally Better Design
As I do these studies, it’s sometimes hard to know where to editorialize and where to go purely with data. The whole point of this line of study is to get us away from defining “good code” or “good design” based on gut feel and to do it instead based on data and outcomes.
In this post, I took an awful lot of data and rolled it up to what you could likely call a subjective value judgment. Less coupling, more seams, more unit tests, reduced IL logic, pure statics, and more immutability…do these constitute better design overall? I would argue that they do and happily stake my reputation to it until/unless proved otherwise.
But I’d sure like to demonstrate it.
And leads me to conclude by asking what you think I should research next. What would you like to see? Do you think there’s a good way to try to quantify whether these things truly do represent good design? Did this post inspire something else you’re wondering about? Feel free to weigh in below in the comments.