Last time around, I promised a foray into Newtonian mechanics, so I might as well start with that. This is the equation for gravitational force, one of the 4 fundamental forces of nature.
To put it conversationally, the force of gravity between two objects is the product of the mass of each object, divided by the square of the distance between the objects, multiplied by some thing called “G”. Really, I’m not kidding about the last bit. “G” is the “gravitational constant” and just a placeholder thrown in to make the rest of the math work out.
What Newton figured out was the relationship between the variables at play when it comes to gravitation: the two masses and the distance between them. The heavier the masses, the more gravitation, but if you started moving the masses apart, the force dropped off precipitously. He figured out that the force of gravity varied proportionally with the mass of each object and varied inversely with the square of the distance. As far as Newton was concerned, the law of gravitation, specifically about the Earth, would have been expressed as follows.
This formula — this expression of proportionality — demonstrates that it is possible to understand relationships via experimentation, without being able to fully express reality in the form of a neat equation that always works out. Newton stuck a value in there, called the graviational constant, and called it a day. Some 70 years or so after Newton died, a man named Henry Cavendish was able to perform an experiment and empirically determine the value of G, resulting in a pretty accurate equation (notwithstanding general relativity).
Code Readability Mechanics
Okay, so what does this have to do with our mission here, to work toward a metric for method readability? Well, it demonstrates that we can shave off thin slices of this thing for reasoning purposes, without having to go right away for the whole enchilada. Think of experiments that Newton, had he been the size of solar system, might have run.
He could have placed two planets a million miles apart, recorded the force between them, increased the number to 2 and then 3 million miles, and recorded what had happened to the force of gravity. This would have told him nothing apart from the fact that the force of gravity was inversely proportional to the square of the distance. He could have placed two planets a million miles apart, and then swapped out one planet for others that were half and twice the size of the original. This would have told him only that the force was linearly proportional to the mass of the planet he was swapping out. He then might have swapped a rocky planet for a gas planet of equal mass and observed that that particular variance was irrelevant.
And then, following each experiment, he could have used each piece of learning, in sequence, to build the equation one piece at a time. It stands to reason that we can, and probably should, do the same thing with the approach to creating a “time to read/comprehend” metric.
So what are some things that would lengthen the time to comprehend a method? It’s brain storming time. I’m going to put some ideas out there, but please feel free to chime in with more in the comments. For me, it boils down to thinking of the degenerative cases and expanding outward from there. The simplest method to understand would be a no-op, probably followed by simple getters and setters. So, thinking inductively, where do we get stuck?
Here are some hypotheses that I have. These all refer to the gestalt of comprehension. What I mean is you may be able to find a particular method that serves as a counter-example, but I’m hypothesizing that over a large sample size, these will hold up.
- The more statements there are in a method, the longer it will take to comprehend.
- Simple variable assignment has very little effect on time to comprehend.
- Assignment using arrays and other collection types has more effect than simple assignment.
- Control flow statements are harder to comprehend than assignments.
- Compound boolean conditions substantially increase time to comprehend.
- Naming of helper methods means the difference between extremely large time to comprehend (poorly named helper method) and nearly trivial (well named).
- Understanding methods that refer to class fields take longer than purely functional methods.
- Collaborators with poorly named methods sharply increase time to comprehend.
- Collaborators with well named methods are roughly equivalent to assignment and commands.
- Time to comprehend is dramatically increased by reference to global variables/state.
From this list, we can extract some things that would need to be measured. Think of Newton with his hypotheses about mass, distance, and gas/rocky; he’d need a way to measure each of those properties so that he could vary them and observe the results. Same thing here. Given this list of hypotheses, here are some things that we’d have to be able to observe/count/measure.
- Count statements in a method.
- Identify simple assignment.
- Identify array/collection assignment.
- Identify and count control flow statements.
- Count conditions inside of a boolean expression.
- Poorly named versus well named members (this is probably going to be pretty hard).
- Identify and count class field references.
- Identify methods that refer to no class state.
- Identify method references to global state.
Experimentation
There’s been a very science-y theme to this post. I started off with Newtonian mechanics and then formed some hypotheses about what makes code take a long time to comprehend. From those hypotheses, I extracted things that would need to be observed and measured to start trying to confirm them. So, in accordance with the scientific method, the next thing to do is to start running some experiments. In the next post, I’m going to show you how to use NDepend to actually make the observations I’ve outlined.
In parallel with that, I’d like to invite you to sign up to help me with running experiments in time to comprehend. I don’t mind using myself as the guinea pig for these experiments, but the more data, the better the result. As this series goes on, it’d be great if you could help by supplying your time to comprehend for some methods. Click below if you’re interested in signing up.
The landing page explains in more detail, but participation is pretty low impact. We’ll periodically send out code to read, and you just read it and record how long it took you to understand it. So, if you’re interested and you’re up for reading a little code, please join me!
This got me thinking about the Cogitive Complexity metric:
https://blog.sonarsource.com/cognitive-complexity-because-testability-understandability
Interesting — I hadn’t seen that before. Thanks for the link!