Over the course of the fall and winter, I’ve been gaining momentum with code research posts. Today, I bring that momentum to bear on the subject of functional programming. Or at least, on functional style of programming in C# codebases.
But before I do that, let me provide a little background in case you haven’t caught the previous posts in the series. It started with me doing automated static analysis on 100 codebases to see how singletons impact those codebases. Next up, I used that data to look at how unit tests affect codebases. That post generated a lot of buzz, so I enlisted a partner to help with statistical analysis and then boosted the codebase sample size up to 500.
At the end of that last post, I suggested some future topics of study. Well, now I’ve picked one: functional programming.
What Is Functional Programming?
The idea with this post is mostly to report on findings, but I’d be remiss if I didn’t provide at least some background so that anyone reading has some context. So first, let’s cover the topic of functional programming briefly.
Functional programming is one of the major programming paradigms. Specifically, its calling card is that it disallows side effects. In other words, it models the rules of math, in which the result of the function (or method) is purely a deterministic function of its inputs.
So, in pseudo-code, it looks like this:
1 2 3 4 |
public int Add(int x, int y) { return x + y; } |
This is a functional method. But if you do something like this
1 2 3 4 |
public int Add(int x, int y) { return GlobalVariable + x + y; } |
or like this
1 2 3 4 5 |
public int Add(int x, int y) { _databasePlopper.plopResultInDatabase(x + y); return x + y; } |
then you’re out of the functional realm because you’re adding side effects. These two modified versions of Add() each concern themselves with the world beyond processing the inputs to add. (As an aside, you could “fix” this by passing the global variable or the _databasePlopper dependency to the method as a parameter.)
Now, take note of something because this matters to the rest of the post. While C# (or any other object-oriented language) is not a functional language, per se, you can write functional methods in C#.
Object-Oriented Programming
Alright, so C# is an object-oriented language. As we did with functional programming, let’s take a look at what object-oriented programming (OOP) is and what an object-oriented language does.
If you’re like me and you learned programming during the heady early days of enterprise Java and the heyday of C++, this is probably old hat. If you didn’t, what I’m about to say might amuse you. Back then, you couldn’t pass any interview on earth (just about) without being able to rattle the “pillars of OOP” off the top of your head:
- Abstraction
- Encapsulation
- Inheritance
- Polymorphism
All of that sounds a little…formal. And I’ll come back to some of those terms a little later. But OOP is essentially the idea that you define classes, which themselves define both data (fields) and behaviors (methods). At runtime, you then have instances of these classes with specific values for the fields. And you can compose these classes in such a way that they inherit behavior from one another, allowing for flexible elaboration on basic behaviors.
In other words, you can define a vehicle class with an “isOn” field and then create a more specific boat class and a more specific car class, each of which manipulate the parent “isOn” field in different ways.
The Functional-OOP Mashup
Back in the 1990s and early-to-mid 2000s, OOP utterly dominated the landscape and people were class- and inheritance-happy. I don’t specifically recall, but I’m sure people wrote classes like ForLoop and Equals.
Functional programming was so far off the radar that not even hipster programmers knew about it. It was the province of academics. Personally, my only conscious encounter with it until the introduction of Linq came during an obscure college class that seemed to blend programming and math.
But times changed. As Moore’s Law yielded to massive parallelism and people began to crave the simplicity of declarative styles, functional-style programming began to creep into OO languages. Programmers started to increasingly favor methods without side effects while language authors added features like the aforementioned Linq.
Fast forward to modern day, and C# is truly a hybrid language. It comes from OO roots, but as it adds features, it caters more and more to this style.
So I decided to take a look at how favoring one style or the other affected codebases.
Computing Functional-Ness of Codebases
I’ll talk briefly about methodology, but it’s largely the same as last time. The main difference is that I added a column to the data corpus for analysis. In the data that I’ve gathered during research, I have a lot of information about individual methods in a codebase. For each codebase, I used this data to calculate the percentage of its methods that have no side effects.
Specifically, they don’t reference the following:
- Fields (instance state)
- Typestate (static state)
- Global state
The percentage of methods for which this is true in a codebase became its “percent functional.” Now, this is certainly not the be-all and end-all for measuring functionality. But I’m constrained by certain feasibility angles and time, given that I’m doing automated analysis of millions of lines of code across hundreds of codebases. And with all of these posts, the aim is to start conversations, generate interest, and steer further research.
So, for now, “percent functional” it is.
Functional Programming Makes Codebases Less Object-Oriented
We took all of the same pieces of data as last time and ran them through the same modeling. And here are the strong correlations with a high percentage of pure functional methods in these codebases:
- Fewer interfaces
- Fewer abstract types in general
- (Slightly) more overloads per method
- Lower incidence of inheritance (average type inheritance depth)
- Fewer virtual methods
- Fewer properties and fields per type
- Not as many enums (though a weak r^2 for this one)
- Higher LOCM (less type cohesion)
- Constructor logic
Omitting abstraction (which I’d argue is absolutely not OOP-specific anyway), we’re shedding our OOP pillars. These functional codebases have fewer interfaces (polymorphism), less inheritance, and less encapsulation of state. (Lack of cohesion of methods [LCOM] is a measure of how tightly a type interacts with its fields/state.)
So that reinforces something we’d expect to see. The more you program in the functional style, the less you do in the OO style.
Functional Programming Doesn’t Do Much Else
I’ll pardon you if you’re not phoning the news outlets just yet. These findings are encouraging (from a methodological standpoint) but expected.
Here’s what’s less expected, though. The functional approach doesn’t seem to have any other significant effects. It doesn’t seem to make the code cleaner, more compact, or really anything. Here’s a list of things that were not at all affected statistically by the tradeoff between OOP and FP:
- Method cyclomatic complexity
- Rate of code comments
- Lines of code per method
- Lines of code per type
- Parameters per method
- Average method nesting depth
- Average method rank
- Types with dependency cycles
- Methods per type
There were a couple of relationships that, according to r and p values, were significant but not very explanatory: functional codebases references fewer types per type and had lower type rank.
There was only one property we measured with significant correlation that didn’t specifically relate to whether the code was functional or not: unit tests. A higher percentage of functional methods correlated with a higher unit test method percent.
What’s Next?
What conclusions can we draw from this? Is it time to conclude that functional programming is a fad, conferring no actual benefits? No, of course not. The primary driver for adopting a functional style isn’t to reduce cyclomatic complexity or lines of code per method/type. The benefits to the approach stretch way beyond that and include ease of reasoning about the code, reduction of parallelism issues, and plenty more.
But it certainly is interesting that the functional vs OO decision, at least within a C# context, seems to have little impact on other code properties.
Stay tuned for next time. A lot of these are statistics at a granular level, and I think we need to put on our architecture hat to reach further conclusions. So next up, I want to see how a functional style impacts statistics at the assembly, namespace, and broader type level. Does it make architecture simpler? That’s what I’d like to know next.
As always, feel free to comment on what you think or what you’d like to see next.
Posting a link to this comment here so you don’t miss it: https://blog.ndepend.com/unit-tests-desirable-codebase-properties/#comment-3193 You didn’t include the p-values with this article, but if you used the same implicit 0.5 p-value cutoff here like you did with previous articles, then your conclusions are likely riddled with false positives.