Last time, I talked through a little bit of housekeeping on the way to creating a metric that would be, uniquely, ours. Nevermind the fact that, under the hood, the metric is still lines of code. It now has a promising name and is expressed in the units we want. And, I think that’s okay. There is a lot of functionality and ground to cover in NDepend, so a steady, measurable pace makes sense.
It’s now time to start thinking about the nature of the metric to be created here, which is essentially a measure of time. That’s pretty ambitious because it contains two components: defining a composite metric (recall that this is a mathematical transform on measured properties) and then tying it an observed outcome via experimentation. In this series, I’m not assuming that anyone reading has much advanced knowledge about static analysis and metrics, so let’s get you to the point where you grok a composite metric. We’ll tackle the experimentation a little later.
A Look at a Composite Metric
I could walk you through creating a second query under the “My Metrics” group that we created, but I also want this to be an opportunity to explore NDepend’s rich feature set. So instead of that, navigate to NDpened->Metric->Code Quality->Types with Poor Cohesion.
When you do that, you’re going to see a metric much more complicated than the one we defined in the “Queries and Rules Edit” window. Here’s the code for it, comments and all.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
// Types with poor cohesion warnif count > 0 from t in JustMyCode.Types where (t.LCOM > 0.8 || t.LCOMHS > 0.95) && t.NbFields > 10 && t.NbMethods > 10 orderby t.LCOM descending, t.LCOMHS descending select new { t, t.LCOM, t.LCOMHS, t.NbMethods, t.NbFields } // Types where LCOM > 0.8 and NbFields > 10 // and NbMethods > 10 might be problematic. // However, it is very hard to avoid such // non-cohesive types. The LCOMHS metric // is often considered as more efficient to // detect non-cohesive types. // See the definition of the LCOM metric here // http://www.ndepend.com/Metrics.aspx#LCOM |
There’s a good bit to process here. The CQLinq code here is inspecting Types and providing data on Types. “Type” here means any class or struct in your code base (well, okay, in my code base), along with a warning if you see anything that matches. And, what does matching mean? Well, looking at the compound conditional statement, a type matches if it has “LCOM” greater than .8 or “LCOMHS” greater than .95 and it also has more than 10 fields and 10 methods. So, to recap, poor cohesion means that there are a good number of fields, a good number of methods, and… something… for these acronyms.
LCOM stands for “Lack [of] Cohesion of Methods.” If you look up cohesion in the dictionary, you’ll find the second definition particularly suited for this conversation: “cohering or tending to cohere; well-integrated; unified.” We’d say that a type is “cohesive” if it is unified in purpose or, to borrow from the annals of clean code, if it conforms to the single responsibility principle. To get a little more concrete, consider an extremely cohesive class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
public class Recipe { private List _ingredients = new List(); public void AddIngredient(Ingredient ingredient) { _ingredients.Add(ingredient); } public void StartOver() { _ingredients.Clear(); } public IEnumerable GetAllIngredients() { return _ingredients; } } |
This class is extremely cohesive. It has one field and three methods, and every method in the class operates on the field. Type cohesion might be described as “how close do you get to every method operating on every field?”
Now, here’s the crux of our challenge in defining a composite metric: how do you take that anecdotal, qualitative description, and put a number to it? How do you get from “wow, Recipe is pretty cohesive” to 0.0?
Well, this is where the mathematical transform part comes in. Here is how NDepend calculates Lack of Cohesion of Methods (LCOM).
- LCOM = 1 – (SUM(MF)/(M*F))
Where:
- M is the number of methods in class (both static and instance methods are counted, it includes also constructors, properties getters/setters, events add/remove methods).
- F is the number of instance fields in the class.
- MF is the number of methods of the class accessing a particular instance field.
- Sum(MF) is the sum of MF over all instance fields of the class.
Quantifying the Qualitative
Whoah. Okay, let’s walk before we run. It’ll be helpful to work backward from an already proposed formula. What do we know by looking at this?
Well, at the very highest level, we’re talking about fields and methods in the class, and how they interact. It’s easy enough to count the number of methods and fields — so far, so good. MF, for a given field, the number of methods in the class that access that field, which means that Sum(MF) is the aggregate for all fields. Sum(MF) will therefore be less than or equal to M*F. They’re only equal in the case where every method accesses every field.
Thus the term SUM(MF)/(M*F) will range from 0 to 1, which means that the value of this metric ranges from 1 to 0. 1 is thus a perfectly non-cohesive class and 0 is a perfectly cohesive class. Notice that I described Recipe as “0.0”? If you run this metric on that class, you’ll see that it scores 0 for a “perfect cohesion score.” And so, the goal here becomes obvious. The creator of this metric wanted to come up with a way to describe cohesion normalized between 0 and 1 with a concept of bounded, “perfect” endpoints.
And this is the essence of compositeness, though to build such a metric, you create the transform rather than working backward to deduce the reasoning. You start out with a qualitative evaluation and then think about a hypothesis for how you want to represent that data. Is there a minimum? A maximum? Should the curve between them be linear? Exponential? Logarithmic?
It’s not a trivial line of thinking, by any stretch. As it turns out, there isn’t even agreement, per se, on the best way to describe type cohesion. You’ll notice that NDepend supports a secondary metric (that I intentionally omitted from the formula definition above for simplicity) for cohesion called LCOM HS, which stands for the “Henderson Sellers Lack of Cohesion of Methods.” This is a slightly different algorithm for computing cohesion. And man, if experts in the field can’t agree on the ideal metric, you can see that this is a tall order. But hey, I didn’t say it’d be easy — just fun.
So, having seen a little bit more of NDepend and established a foundation for understanding a bit of the theory behind composite code metrics, I’ll leave off until next time, when I’ll dig a bit into how we can start reasoning about our own composite metric for “time to understand a method.” Stay tuned because that will get interesting — I’ll go through a little bit more of NDepend and even get into Newtonian Mechanics a little. But not too far. I promise.
Comments:
Comments are closed.