NDepend

Improve your .NET code quality with NDepend

Let’s Build a Metric: Using CQLinq to Reason about Application State

I’ve been letting the experiments run for a bit before posting results so as to give all participants enough time to submit, if they so choose.  So, I’ll refresh everyone’s memory a bit here.  Last time, I published a study of how long it took, in seconds (self reported) for readers to comprehend a series of methods that varied by lines of code.  (Gist here).  The result was that comprehension appears to vary roughly quadratically with the number of logical lines of code.  The results of the next study are now ready, and they’re interesting!

Off the cuff, I fully expected cyclomatic complexity to drive up comprehension time faster than the number of lines of code.  It turns out, however, that this isn’t the case.  Here is a graph of the results of people’s time to comprehend code that varied only by cyclomatic complexity.  (Gist here).

SecondsVsCyclomaticComplexity

If you look at the shape of this graph, the increase is slightly more aggressive than linear, but not nearly as aggressive as the increase that comes with an increase in lines of code.  When you account for the fact that a control flow statement is also a line of code, it actually appears that conditionals are easier to comprehend than the mathematical statements from the first experiment.

Because of this finding, I’m going to ignore cyclomatic complexity for the time being in our rough cut time to comprehend metrics.  I’ll assume that control flow statements impact time to comprehend as lines of code more than as conditional branching scenarios.  Perhaps this makes sense, too, since understanding all of the branching of a method is probably an easier task than testing all paths through it.

As an aside, one of the things I love about NDepend is that it lets me be relatively scientific about the approach to code.  I constantly have questions about the character and makeup of code, and NDepend provides a great framework for getting answers quickly.  I’ve actually parlayed this into a nice component of my consulting work — doing professional assessments of code bases and looking for gaps that can be assessed.

Going back to our in-progress metric, it’s going to be important to start reasoning about other factors that pertain to methods.  Here are a couple of the original hypotheses from earlier in the series that we could explore next.

  • Understanding methods that refer to class fields take longer than purely functional methods.
  • Time to comprehend is dramatically increased by reference to global variables/state.

If I turn a critical eye to these predictions, there are two key components: scope and popularity.  By scope, I mean, “how closely to the method is this thing defined?”  Is it a local variable, defined right there in the method?  Is it a class field that I have to scroll up to find a definition of?  Is it defined in some other file somewhere (or even some other assembly)?  One would assume that having to pause reading the method, navigate to some other file, open it, and read to find the definition of a variable would mean a sharp spike in time to comprehend versus an integer declared on the first line of the method.

And, by popularity, I mean, how hard is it to reason about the state of the member in question?  If you have a class with a field and two methods that use it, it’s pretty easy to understand the relationship and what the field’s value is likely to be.  If we’re talking about a global variable, then it quickly becomes almost unknowable what the thing might be and when.  You have to suck the entirety of the application’s behavior into your head to understand all the things that might happen in your method.

I’m not going to boil that ocean here, but I am going to introduce a few lesser known bits of awesomeness that come along for the ride in CQLinq.  Take a look at the following CQLinq.

If your reaction is anything like mine the first time I encountered this, you’re probably thinking, “you can do THAT?!” Yep, you sure can. Here’s what it looks like against a specific method in my Chess TDD code base.

MethodFieldsAndParametersResults

The constructor highlighted above is shown here:

BoardConstructor

As you can see, it has one parameter, uses two fields, and assigns both of those fields.

When you simply browse through the out of the box metrics that come with NDepend, these are not the kind of things you notice immediately.  The things toward which most people gravitate are obvious metrics, like method size, cyclomatic complexity, and test coverage.  But, under the hood, in the world of CQLinq, there are so many more questions that you can answer about a code base.

Stay tuned for next time, as we start exploring them in more detail and looking at how we can validate potential hypotheses about impact on time to comprehend.

And if you want to take part in this on going experiment, click below to sign up.




Join the Experiment



dev manager in a meeting

Mistakes Dev Managers Make

Managing a team of software developers is a tall order. This is doubly true when the line management includes both org chart duties (career development, HR administrivia, etc) and responsibility for the team’s performance when it comes to shipping. In this case, you’re being asked to understand their day to day performance well enough to evaluate their performance and drive improvement, in spite of the fact that what they do is utterly opaque to you. It’s like being asked to simultaneously coach a team and referee the game for a sport whose rules you don’t know. As I said, a tall order.

I’ll grant that, if you’re a dev manager, you may have been technical at some point, perhaps even recently. Or maybe not, but you’ve been around it long enough to pick up a lot of concepts, at least in the abstract. But in neither case, if you were asked what, exactly, Alice coded up yesterday, would you be able to answer. Whether it’s due to total lack of experience, being “rusty” from not having programmed in a number of years, or simply being unable to keep up with what 8 other people are doing, their work is opaque to you.

As with coaching/refereeing the game that you don’t understand, you can pick up on their body language and gestures. If all members of the team appear disgusted with one of their mates, that one probably did something bad. You’re not totally without context clues and levers to pull, but it wouldn’t be hard at all for them to put one over on you, if they were so inclined. You’re navigating a pretty tough obstacle course.

And so it becomes pretty easy to make mistakes. It’s also pretty understandable, given the lay of the land. I’ll take you through a few of the more common ones that I tend to see, and offer some thoughts on what you can do instead. Continue reading Mistakes Dev Managers Make

Let’s Build a Metric: Incorporating Results and Exploring CQLinq

It turns out I was wrong in the last post, at least if the early returns from the second experiment are to be believed.  Luckily, the scientific method allows for wrongness and is even so kind as to provide a means for correcting it.  I hypothesized that time to comprehend would vary at a higher order with cyclomatic complexity than with lines of code.  This appears not to be the case.  Hey, that’s why we are running the experiments, right?

By the way, as always, you can join the experiment if you want.
You don’t need to have participated from the beginning by any stretch, and you can opt in or out for any given experiment as suits your schedule.

Join the Experiment

 

Results of the First Experiment

Recall that the first experiment asked people to record time to comprehend for a series of methods that varied by number of lines of code.  To keep the signal to noise ratio as high as possible, the methods were simply sequential arithmetic operations, operating on an input and eventually returning a transformed output.  There were no local variables or class level fields, no control flow statements, no method invocations, and no reaching into global state.  Here is a graph of the results from doing this on 3 methods, with 1, 5, and 10 logical lines of code.

LogicalLinesOfCodeComprehensionTime

So as not to overburden anyone with work, and because it’s still early, the experiment contained three methods, yielding three points.  Because this looked loosely quadratic, I used the three points to generate a quadratic formula, which turned out to be this.

GraphOfComprehensionVsLLOC

It’s far from perfect, but this gives us our first crack at shaping time to comprehend as something experimental, rather than purely hypothetical.  Let’s take a look at how to do this using NDepend in Visual Studio.  Recall all the way back in the second post in this series that I defined a metric for time to comprehend.  It was essentially a placeholder for the concept, pending experimental results.

All we’re doing is setting the unit we’ve defined, “Seconds,” equal to the number of lines of code in a method.  But hey, now that we’ve got some actual data, let’s go with it!  The code for this metric now looks like this.

I’ve spread on multiple lines for the sake of readability and with a nod to the notion that this will grow as time goes by. Also to note is that I’ve included, for now, the number of logical lines of code as a handy reference point.

Exploring CQLinq Functionality

This is all fine, but it’s a little hard to read.  As long as we’re here, let’s do a brief foray into NDepend’s functionality.  I’m talking specifically about CQLinq syntax.  If you’re going to get as much mileage as humanly possible out of this tool, you need to become familiar with CQLinq.  It’s what will let you define your own custom ways of looking at and reasoning about your code.

I’ve  made no secret that I prefer fluent/expression Linq syntax over the operator syntax, but there are times when the former isn’t your best bet.  This is one of those times, because I want to take advantage of the “let” keyword to define some things up front for readability.  Here’s the metric converted to the operator syntax.

With that in place, let’s get rid of the cumbersome repetition of “m.NbLinesOfCode” by using the let keyword. And, while we’re at it, let’s give NbLinesOfCode a different name. Here’s what that looks like in CQLinq.

That looks a lot more readable, huh? It’s now something at least resembling the equation pictured above. But there are a few more tweaks we can make here to really clean this thing up, and they just so happen to demonstrate slightly more advanced CQLinq functionality. We’ll use the let keyword to define a function instead of a simple assignment, and then we’ll expand the names out a bit to boot. Here’s the result.

Pretty darned readable, if I do say so myself! It’s particularly nice the way seconds is now expressed — as a function of our LengthFactor equation. As we incorporate more results, this approach will allow this thing to scale better with readability, as you’ll be able to see how each consideration contributes to the seconds.

So, what does it look like? Check it out.

Updated Seconds Metric with CQLinq

Now we can examine the code base and get a nice readout of our (extremely rudimentary) calculation of how long a given method will take to understand.  And you know what else is cool?  The data points of 8.6 seconds for the 1 LLOC method and 51 for the 5 LLOC method.  Those are cool because those were the experimental averages, and seeing them in the IDE means that I did the math right. 🙂

So we finally have some experimental progress and there’s some good learning about CQLinq here.  Stay tuned for next time!

 

 

 

Refactoring is a Development Technique, Not a Project

One of the more puzzling misconceptions that I hear pertains to the topic of refactoring. I consult on a lot of legacy rescue efforts, and refactoring, and people in and around those efforts tend to think of “refactor” as “massive cleanup effort.”  I suspect this is one of those conflations that happens subconsciously.  If you actually asked some of these folks whether “refactor” and “massive cleanup effort” were synonyms, they would say no, but they never conceive of the terms in any other way during their day to day activities.

Let’s be clear.  Here is the actual definition of refactoring, per wikipedia.

 

Code refactoring is the process of restructuring existing computer code – changing the factoring – without changing its external behavior.

 

Significantly, this definition mentions nothing about the scope of the effort.  Refactoring is changing the code without changing the application’s behavior.  This means the following would be examples of refactoring, provided they changed nothing about the way the system interacted with external forces.

  • Renaming variables in a single method.
  • Adding whitespace to a class for readability.
  • Eliminating dead code.
  • Deleting code that has been commented out.
  • Breaking a large method apart into a few smaller ones.

I deliberately picked the examples above because they should be semi-understandable, even by non technical folks, and because they’re all scalable down to the tiny.  Some of these activities could be done by a developer in under a minute.  These are simple, low-effort refactorings.

Let’s now consider another definition of refactoring that can be found at Martin Fowler’s website.


“Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which “too small to be worth doing”. However the cumulative effect of each of these transformations is quite significant.”

 

I took the wikipedia definition and used it to suggest that refactorings could be small and low-effort.  Fowler takes it a step further and suggests that they should be small and low effort.  In fact, he suggests that they should be “too small to be worth doing.”  That’s fascinating. Continue reading Refactoring is a Development Technique, Not a Project

NDepend Case Study: Increasing Development Efficiency in the Medical Laboratory Sector

Developing applications for use in the health care industry is stressful because the margin of error is almost non-existent. Whether your tool is for treatment, research, or analysis, it needs to be dependable and accurate. The more complex the application is, the higher the chance for errors and delays in development. Dependable companies abide by rigorous methodologies to develop their code before deploying it to clients. In this NDepend case study, we learn why a company in this sector chose NDepend, and why it became an integral part of their development process.

Stago works in the medical lab industry, producing lab analysis tools that focus on haemostasis and coagulation. Working hard for over 60 years and valuing long term investments, they have created a name for themselves in the industry. A few years ago, they wanted to make their software development process more efficient. In addition, they wanted to easily enforce their own best practices and code quality standards across their teams. The goal was to be able to catch issues earlier in the development cycle to cut costs and time spent on quality assurance post-development.

“We selected NDepend after reviewing all the other options on the market and it quickly became the backbone of our development effort.”
– Fabien Prestavoine, Software Architect at Stago

We are very grateful for Stago for sharing their success story with us. Stories such as these is one of the main driving forces behind creating one of the most comprehensive and powerful .NET analysis tool on the market. Since implementing NDepend, Stago has:

  • Easily met all delivery deadlines
  • Cut both cost and time spent on quality assurance
  • Delivered a consistently dependable product
  • Improved communication between their developers and architects

To read more about how NDepend helped Stago streamline their development process, click here to download a PDF of the complete case study.

Or check out this slideshow:

Let’s Build a Metric 7: Counting the Inputs

Over the last two Let’s Build a Metric installments of this series, I’ve talked about different ways to count lines of code and about ways to count different paths through your code. So far, I’ve offered up the hypotheses that more statements/lines in a method means more time to comprehend, and that more paths through the code mean more time to comprehend. I’ll further offer the hypothesis that comprehension time varies more strongly with complexity than it does with lines of code.

I do have results in for the first hypothesis, but will hold off for one more installment before posting those. Everyone on the mailing list will soon receive the second experiment, around complexity, so I’ll post the results there in an installment or two, when I circle back to modifying the composite metric. If you haven’t yet signed up for the experiment, please do so here.

Join the Experiment

More Parameters Means Harder to Read?

In this post, I’d like to address another consideration that I hypothesize will directly correlate with time to comprehend a method: parameter count. Now, unlike these last two posts, parameter count offers no hidden surprises. Unlike lines of code, I don’t know of several different ways that one might approach counting method parameters, and unlike cyclomatic complexity, there’s no slick term for this that involves exponential growth vectors. This is just a matter of tabulating the number of arguments to your methods.

Instead of offering some cool new fact for geek water-cooler trivia, I’ll offer a relatively strong opinion about method parameters. Don’t have a lot of them. In fact, don’t have more than 3, and even 3 is pushing it. Do I have your attention? Good. Continue reading Let’s Build a Metric 7: Counting the Inputs

Software Rewrite: The Chase

Last week, a post I wrote, “The Myth of the Software Rewrite“, became pretty popular.  This generated a lot of comments and discussion, so I decided just to write a follow-up post to address the discussion, as opposed to typing a blog post’s worth of thoughts, distributed over 20 or 30 comments.  This is that post.

No Misconceptions

First of all, I want to be clear about what I’m talking about.  I’m specifically talking  about a situation where the prime, determining factor in whether or not to rewrite the software is that the development group has made a mess and is clamoring to rewrite it.  In essence, they’re declaring bankruptcy — “we’re in over our heads and need outside assistance to wipe the slate clean so we can have a fresh start.”  They’re telling the business and their stakeholders that the only path to joy is letting them start over.

Here are some situations that the article was not meant to address:

  • The business decides it wants a rewrite (which makes me skeptical, but I’m not addressing business decisions).
  • Piecemeal rewrite, a chunk at a time (because this is, in fact, what I would advocate).
  • A rewrite because the original made design assumptions that have become completely obsolete (e.g. designed around disk space being extremely expensive).
  • Rewriting the software to significantly expand or alter the offering (e.g. “we need to move from web to mobile devices and offer some new features, so let’s start fresh.”)

A Lesson From Joseph Heller

Joseph Heller is the author of one of my all time favorite works of fiction, Catch 22.  Even if you’ve never read this book, you’re probably familiar with the term from conversational reference.  A catch 22 is a paradoxical, no-win situation.  Consider an example from the book.

John Yossarian, the ‘protagonist,’ is an anti-heroic bombardier in World War II.  Among other character foibles, one is an intense desire not to participate in the war by flying missions.  He’d prefer to stay on the ground, where it’s safe.  To advance this interest, he attempts to convince an army doctor that he’s insane and thus not able to fly missions.  The army doctor responds with the eponymous catch 22:  “anyone who wants to get out of combat duty isn’t really crazy.”

If you take this to its logical conclusion, the only way that Yossarian could be too crazy to fly missions is if he actually wanted to fly missions.  And if he wanted to fly them, he wouldn’t be noteworthy and he wouldn’t be trying to get out of flying them in the first place.

I mention this vis a vis software rewrites for a simple reason.  The only team I would trust with a rewrite is a team that didn’t view rewriting the software as necessary or even preferable.

It’s the people who know how to manufacture small wins and who can inch back incrementally from the brink that I trust to start a codebase clean and keep it clean. People who view a periodic bankruptcy as “just the way it goes” are the people who are going to lead you to bankruptcy. Continue reading Software Rewrite: The Chase

Relax. Everyone’s Code Rots

I earn my living, or part of it, anyway, doing something very awkward.  I get called in to assess and analyze codebases for health and maintainability.  As you can no doubt imagine, this tends not to make me particularly popular with the folks who have constructed and who maintain this code.  “Who is this guy, and what does he know, anyway?” is a question that they ask, particularly when confronted with the parts of the assessment that paint the code in a less than flattering light.  And, frankly, they’re right to ask it.

But in reality, it’s not so much about who I am and what I know as it is about properties of code bases.  Are code elements, like types and methods, larger and more complex than those of the average code base?  Is there a high degree of coupling and a low degree of cohesion?  Are the portions of the code with the highest fan-in exercised by automated tests or are they extremely risky to change?  Are there volatile parts of the code base that are touched with every commit?  And, for all of these considerations and more, are they trending better or worse?

It’s this last question that is, perhaps, most important.  And it helps me answer the question, “who are you and what do you know?”  I’m a guy who has run these types of analyses on a lot of codebases and I can see how yours stacks up and where it’s going.  And where it’s going isn’t awesome — it’s rotting.

But I’ll come back to that point later. Continue reading Relax. Everyone’s Code Rots

Let’s Build a Metric 6: Cyclomatic Complexity Explained

In the last installment of this series, I talked a good bit about lines of code. As it turns out, the question, “what is a line of code?” is actually more complex than it first appears.  Of the three different ways I mentioned to regard a line of code, I settled on “logical lines of code” as the one to use as part of assessing time to comprehend.

As promised, I sent code to use as part of the experiment, and got some responses. So, thanks to everyone who participated. If you’d like to sign up for the experiment, but have yet to do so, please feel free to click below.

Join the Experiment

Here is the code that went out for consideration. I’m not posting the results yet so that people can still submit without any anchoring effect and also because I’m not, in this installment, going to be updating the composite metric just yet.

The reason that I’m discussing this code is to show how simple it was. I mean, really, look at this code and think of all that’s missing.

  • There are no control flow statements.
  • There are no field accesses.
  • There is no interaction with collaborators.
  • There is no interaction with global state.
  • There is no internal scoping of any kind.

These are purely functional methods that take an integer as input, do things to it using local declarations, and then return it as output.  And via this approach, we’ve fired the first tracer bullet at isolating logical lines of code in a method.  So let’s set that aside for now and fire another one at an orthogonal concern.

Before, I talked about the meaning of a line of code.  Now I’d like to talk about the meaning of complexity in your methods.  Specifically here, I’m referring to what’s called “cyclomatic complexity.”  Cyclomatic complexity is a measure of the number of path’s through a piece of source code.  Let’s see a few examples to get the hang of it.

Consider the following method from the Pawn class in the Chess TDD codebase.

This method has a cyclomatic complexity of 1 because there is only one path through the code. Contrast that with the following method from the Board class.

The cyclomatic complexity of this method is 2 because there are two paths through it.

  1. The if condition evaluates to true and the method throws an exception.
  2. The if condition evaluates to false and the method finishes executing.

Be mindful that “if” isn’t the only way to create multiple paths through the code.  For instance, this method also has a cyclomatic complexity of 2 because of the ternary operator creating two different execution paths.

Cyclomatic complexity can increase quite rapidly, particularly when nested conditionals enter the equation. This method has a cyclomatic complexity of 4, and you can see it already is starting to get hard to figure out exactly why.

Imagine what it starts to look like as methods have things like large degrees of nesting, switch statements, and conditional after conditional. The cyclomatic complexity can soar to the point where it’s unlikely that every path through the code has even ever been executed, let alone tested.

So it stands to reason that something pretty simple to articulate, like complexity, can have a nuanced effect on the time to comprehend a code base. In the upcoming installment of our experiments, I’d like to focus on cyclomatic complexity and its effect on method time to comprehend.

But I’ll close out this post by offering up a video showing you one of the ways that NDepend allows you to browse around your code by cyclomatic complexity.

 

 Let’s Build a Metric 7: Counting the Inputs >

< Let’s Build a Metric 5: Flavors of Lines of Code

Hidden Costs in Your Software

One of the things I remember most vividly from my CIO days was the RFP process for handling spikes in demands on my group’s time.  In case you haven’t participated in this on either side, the dance involves writing up a brief project description, sending it over to a handful of software consulting firms, and getting back tentative project proposals.  I’d pick one, and work would begin.

There were a lot more documents and legalese involved, but the gist, writ small, might be something like, “we need an application that will run in our data center, take information about customers out of our proprietary, and put it back into our CRM as notes, depending on a series of business rules.”  The response, in proposal form, would essentially be, “we’ll do that, and we think it’ll cost you about $30,000.”

This is what most people think of as the cost of a software project.  Perhaps it’s not a little, 5-figure line of business utility.  Perhaps it’s a $300,000 web or mobile application.  Perhaps it’s even a $30,000,000 enterprise workflow overhaul.  Whatever the case may be, there’s a tendency to think of software in terms of the cost of the labor necessary to write it.  The consulting firms would always base their proposals and estimates on a “blended rate” of the hourly cost of the labor.  Firms with in-house development staffs tend to reason about the cost of software projects as a function of the salaries of the folks tasked with the work.

Of course, if you’re a veteran at managing software projects and teams, you realize that there’s more to the total spend than just the cost of the labor.  No doubt you factor in the up-front and licensing cost of the tools you use to develop the software as well as the cost of the hardware on which it will eventually run.  You probably even factor in the cost of training users and operations folks, and paying maintenance programmers to keep the lights on.

But there are other, more subtle, hidden costs that I’d like to discuss — costs related to your approach to software development.  These are variable costs that depend on the nature of the code that your team is writing. Continue reading Hidden Costs in Your Software