When I’m called in to do a strategic assessment of a codebase, it’s never the result of everything being awesome. That is, no one calls me up and says, “we’re ahead of schedule, under budget, and knocking it out of the park, so can you come in and tell us what you think of our code?” Rather, I get calls when something isn’t going according to plan and the business people involved want to get some insight into what underlying causes there are in the code and in the team’s approach.
When the business gets involved this way, there is invariably a fiscal operational concern, either overtly or lurking just beneath the surface. I’ll roll this up to the general consideration of “total cost of ownership” for the codebase. The business is thus asking, “why are things proving to be more expensive than we thought?”
Typically, I come in and size up the situation, quantify it objectively, and then use analogies and examples to make clear what’s happening. After I do this, pretty much without exception, the decision-makers to whom I’m speaking want to know what small things they can do, internally, to course correct. This makes sense when you think about it. If your doctor told you that your health outlook wasn’t great, you’d cross your fingers and say, “but I can fix it by changing my diet and exercise a little, right?” You wouldn’t throw yourself on the table and say, “cut me open and make sure whatever you do is expensive!”
I am thus frequently asked, by both developers and by management, “what are the little things we can do to improve and maintain code quality?” As such, this seems like excellent fodder for a blog post. Here are my tips, based on years of observation of what correlates with healthy codebases and what correlates with distressed ones.
One of the things that has surprised me over the years is how infrequently people take advantage of custom code metrics. I say this not from the perspective of a geek with esoteric interest in a subject, wishing other people would share my interest. Rather, I say this from the perspective of a business man, making money, and wondering why I seem to have little competition.
As I’ve mentioned before, a segment of my consulting practice involves strategic code assessments that serve organizations in a number of ways. When I do this, the absolute most important differentiator is my ability to tailor metrics to the client and specific codebases on the fly. Anyone can walk in, install a tool, and say, “yep, your cyclomatic complexity in this class is too high, as evidenced by this tool I installed saying ‘your cyclomatic complexity in this class is too high.'” Not just anyone can come in and identify client-specific idiosyncrasies and back those findings with tangible data.
But, if they would invest some up-front learning time in how to create custom code metrics, they’d be a lot closer.
Being able to customize code metrics allows you to reason about code quality in very dynamic and targeted terms, and that is valuable. But you might think that, unless you want a career in code base assessment, value doesn’t apply to you. Let me assure you that it does, albeit not in a quite as direct way as it applies to me.
Custom code metrics can help make your team better and they can do so in a variety of ways. Let’s take a look at a few.
Code review is a subject with which I’m quite familiar. I’m familiar first as a participant, both reviewing and being reviewed, but it goes deeper than that. As an IT management consultant, I’ve advised on instituting and refining such processes and I actually write for SmartBear, whose products include Collaborator, a code review tool. In spite of this, however, I’ve never written much about the intersection between NDepend and code review. But I’d like to do so today.
I suppose it’s the nature of my own work that has made this topic less than foremost on my mind. Over the last couple of years, I’ve done a lot of lone wolf, consultative code assessments for clients. In essence, I take a codebase and its version history and use NDepend and other tools to perform an extensive analysis. I also quietly apply some of the same practices to my own code that I use for example purposes. But neither of these is collaborative because it’s been a while since I logged a lot of time in a collaborative delivery team environment.
But my situation being somewhat out of sync with industry norms does not, in any way, alter industry norms. And the norm is that software development is generally a highly collaborative affair, and that most code review is happening in highly collaborative environments. And NDepend is not just a way for lone wolves or pedants to do deep dives on code. It really shines in the group setting.
NDepend Can Automate the Easy Stuff out of Code Review
When discussing code review, I’m often tempted to leave “automate what you can” for the end, since it’s a powerful point. But, on the other hand, I also think it’s perhaps the first thing that you should go and do right out of the gate, so I’ll mention it here. After all, automating the easily-automated frees humans up to focus on things that require human intervention.
It’s pretty likely that you have some kind of automation in process for enforcing coding standards. And, if you don’t, get some in place. You should not be wasting time at code review with, “you didn’t put an underscore in front of that field.” That’s the sort of thing that a machine can easily figure out, and that many, many plugins will figure out for you.
The advantages here are many, but two quick ones bear mentioning here. First is the time-savings that I’ve discussed, and second is the tightening of the feedback loop. If a developer writes a line of code, forgetting that underscore, the code review may not happen for a week or more. If there’s a tool in place creating warnings, preventing a commit, or generating a failed build, the feedback loop is much tighter between undesirable code and undesirable outcome. This makes improvement more rapid, and it makes the source of the feedback an impartial machine instead of a (perceived) judgmental coworker.
I can still remember my reaction to Linq when I was first exposed to it. And I mean my very first reaction. You’d think, as a connoisseur of the programming profession, it would have been, “wow, groundbreaking!” But, really, it was, “wait, what? Why?!” I couldn’t fathom why we’d want to merge SQL queries with application languages.
Up until that point, a little after .NET 3.5 shipped, I’d done most of my programming in PHP, C++ and Java (and, if I’m being totally honest, a good bit of VB6 and VBA that I could never seem to escape). I was new to C#, and, at that time, it didn’t seem much different than Java. And, in all of these languages, there was a nice, established pattern. Application languages were where you wrote loops and business logic and such, and parameterized SQL strings were where you defined how you’d query the database. I’d just gotten to the point where ORMs were second nature. And now, here was something weird.
But, I would quickly realize, here was something powerful.
The object oriented languages that I mentioned (and whatever PHP is) are imperative languages. This means that you’re giving the compiler/interpreter a step by step series of instructions on how to do something. “For an integer i, start at zero, increment by one, continue if less than 10, and for each integer…” SQL, on the other hand, is a declarative language. You describe what you want, and let something else (e.g. the RDBMS server) sort out the details. “I want all of the customer records where the customer’s city is ‘Chicago’ and the customer is less than 40 years old — you figure out how to do that and just give me the results.”
And now, all of a sudden, an object oriented language could be declarative. I didn’t have to write loop boilerplate anymore!
Here’s a campfire horror story of legacy code that probably sounds at least somewhat familiar.
One day, your manager strolls by casually, sipping a cup of coffee, and drops a grenade in your lap. “Do you think we can add an extra field to the customer information form?” Sure, it may sound innocuous to an outsider, but you know better.
The customer information form is supported by something written almost a decade ago, by a developer long departed. Getting that data out of the database and onto the form prominently features a 60,000 line class called DataRepositoryManagerHelper and it also makes use of a gigantic XML file with odd spacing and no schema. Trying to add a field to that form casts you as Odysseus, navigating between Scylla and Charybdis. In fact, you’re pretty sure that author of the legacy code made it necessary for the assigned developer to cut off and sacrifice a finger to get it working.
Aware of all of this, you look at your manager with a mix of incredulity and horror, telling her that you’ll need at least 6 weeks to do this. Already swirling around your mind is the dilemma between refactoring strategically where you can and running exhaustive manual testing for every character of the source code and XML that you change. It’s now her turn to look incredulous and she says, “I’m just asking for a new field on one form.” You’ve told her before about this, and she’s clearly forgotten. You’re frustrated, but can you really blame her? After all, it does sound a little crazy.
If you have a sadistic streak and manage a team of software developers, it’s probably high entertainment to dredge up some old, dusty piece of software and then to task them with maintaining it. If, on the other hand, you’re a normal human being and you’re asking this because it’s necessary for your business, you brace yourself. After all, this is legacy software, and the reaction of the team is likely to be quite predictable.
Alright, let’s take a look at this thing. Oh, man, look at that right there. A global variable. And — oh my god — there are dozens of these things. Who even wrote this? And, look at this over here. That’s the kind of idiotic, backward code that we used to have to write 20 years and 6 language versions ago when this code was current. But even when it was current, this code was horrible. This was obviously written by a trained ape.
When you’re a developer, the only thing worse and more contemptible than the uninformed code you wrote years ago, is the code that someone else wrote years ago. Not only is it alien to you in makeup and reasoning, this legacy code also features patterns that have gone out of date or even been forgotten.
But lest you, as a manager, assume that this is simply a matter of developers being prima donnas, consider that an encounter with legacy code bothers developers precisely because it renders them less effective. They’re professionals, wanting to do good work, and the lead balloon you’ve dropped in their lap is an impediment to that.
Not too long ago, someone asked me for a comparison of ReSharper (commonly and affectionately abbreviated R#) and NDepend. I didn’t really grok the question, so I asked, “in what sense?” The response was, “well, let’s say NDepend vs ReSharper — which makes more sense for a given person?” Bemused, my slightly snarky quip in response was, “doctor vs dentist — which makes more sense for a given person?”
I went on to clarify the analogy. Doctors and dentists both provide healthcare services, so, in this sense, one could theoretically view them as competitors. But practically speaking, that competition is going to be rare or nonexistent. There is an intersection between what the tools offer, as would be the case if a dentist noticed a throat infection or a doctor needed to peer into your mouth. And yet that intersection is small because the two products, like doctors and dentists, have fundamentally different charters.
We firmly believe spaghetti belongs on the dinner table and not in code. Our mission when starting NDepend was to create a tool to make best coding practices easier to maintain and improve. Writing has always been part of our message (see Patrick Smacchia’s work on CodeBetter.com) and we are proud to present our favorite pieces of writing from around the web in the last year, collected in what we are calling the Better Code Book.
We wanted to focus not only on how people use NDepend to improve their code for developers and architects, but also how to use static analysis in a broader, management sense. We are extremely grateful for our contributors in this project. Let us introduce them:
Bjørn Einar Bjartnes is a developer at the Norwegian Broadcasting Corporation. His current role is a backend developer at the API team, serving web, mobile, TV clients and more metadata about programs- and video-streams. He holds a MSc in Engineering Cybernetics and has a background from the petroleum industry, which has probably shaped his view on systems design. Also, Bjørn is active in the local F# Meetup and a proud member of the lambda club, playing with all things useless related to computers. You can also follow him on Twitter: @bjartnes
Jack Robinson is a twenty-something student in his final year of a degree in Software Engineering at Victoria University of Wellington. Currently an Intern Developer at Xero, he enjoys writing clean code, playing a board game or two with his friends, or just sitting down and watching a good film. You can read not just his musings on computer science, but also reviews on films and more at his website jackrobinson.co.nz
Prasad Narravula is a programmer, architect, consultant, and problem-solving leader. He helps teams in agile development essentials- feedback loops to fail fast, enabling (engineering) practices, iterative and incremental design, starting at the right place, discovery, and learning. When time permits, he writes at ObjectCraftworks.com.
Erik Dietrich, founder of DaedTech LLC, is a programmer, architect, development coach, writer, Pluralsight author, and technologist. You can read his writing and find out more about him at http://www.daedtech.com/ and you can follow him on Twitter @daedtech.
Anthony Sciamanna is a software developer from Philadelphia, PA who has worked in the industry for nearly 20 years. He specializes in leading and coaching development teams, improving development practices for cross-functional teams, Test-Driven Development (TDD), unit testing, pair programming, and other Agile / eXtreme Programming (XP) practices. He can be contacted via his website: anthonysciamanna.com
Tomasz Jaskula is a software craftsman, founder and organizer of Paris user groups for F# and Domain Driven Design. He focuses on creating software delivering true business value which aligns with the business’s strategic initiatives and bears solutions with clearly identifiable competitive advantage. He is currently working for a big French bank building reactive applications in F# and C#. In his free time, he runs a startup project on applying machine learning with F# to the recruitment field, speaks at conferences and user groups, and writes blogs and articles for a French magazine for coders called “Programmez !” You can visit his site jaskula.fr
When it comes to pets, there’s a heartbreaking lie that parents often tell little children when they believe that those children are not yet ready to wrap their heads around the concept of death. “Rex went to a nice farm in the countryside where he can run and play with all of the other animals all day!” In this fantasy, Rex the dog isn’t dead — he lives on in perpetuity.
Memoirs of a Dead Method
In the source code of an application, you can witness a similar lie, but in the other direction. Code lives on indefinitely, actively participating in the fate of an application, and yet we call it “dead.” I know this because I’ve lived it. Let me explain.
You see, I’m a method in a codebase — probably one that would be familiar to you. My name is GetCustomerById(int id) and I hail from a class called CustomerDaoMySqlImpl that implements the interface ICustomerDao.
I was born into this world during a time of both promise and tumult — a time when the application architects were not sure whether the application would be using SQL Server or MySQL. To hedge their bets, they mandated data access interfaces and had developers do a bit of prototyping with both tools. And so I came into this world, my destiny taking a single integer and using MySQL to turn that integer into a customer.
In the last installment of this series, I talked a good bit about lines of code. As it turns out, the question, “what is a line of code?” is actually more complex than it first appears. Of the three different ways I mentioned to regard a line of code, I settled on “logical lines of code” as the one to use as part of assessing time to comprehend.
As promised, I sent code to use as part of the experiment, and got some responses. So, thanks to everyone who participated. If you’d like to sign up for the experiment, but have yet to do so, please feel free to click below.
Here is the code that went out for consideration. I’m not posting the results yet so that people can still submit without any anchoring effect and also because I’m not, in this installment, going to be updating the composite metric just yet.
The reason that I’m discussing this code is to show how simple it was. I mean, really, look at this code and think of all that’s missing.
There are no control flow statements.
There are no field accesses.
There is no interaction with collaborators.
There is no interaction with global state.
There is no internal scoping of any kind.
These are purely functional methods that take an integer as input, do things to it using local declarations, and then return it as output. And via this approach, we’ve fired the first tracer bullet at isolating logical lines of code in a method. So let’s set that aside for now and fire another one at an orthogonal concern.
Before, I talked about the meaning of a line of code. Now I’d like to talk about the meaning of complexity in your methods. Specifically here, I’m referring to what’s called “cyclomatic complexity.” Cyclomatic complexity is a measure of the number of path’s through a piece of source code. Let’s see a few examples to get the hang of it.
The cyclomatic complexity of this method is 2 because there are two paths through it.
The if condition evaluates to true and the method throws an exception.
The if condition evaluates to false and the method finishes executing.
Be mindful that “if” isn’t the only way to create multiple paths through the code. For instance, this method also has a cyclomatic complexity of 2 because of the ternary operator creating two different execution paths.
Cyclomatic complexity can increase quite rapidly, particularly when nested conditionals enter the equation. This method has a cyclomatic complexity of 4, and you can see it already is starting to get hard to figure out exactly why.
Imagine what it starts to look like as methods have things like large degrees of nesting, switch statements, and conditional after conditional. The cyclomatic complexity can soar to the point where it’s unlikely that every path through the code has even ever been executed, let alone tested.
So it stands to reason that something pretty simple to articulate, like complexity, can have a nuanced effect on the time to comprehend a code base. In the upcoming installment of our experiments, I’d like to focus on cyclomatic complexity and its effect on method time to comprehend.
But I’ll close out this post by offering up a video showing you one of the ways that NDepend allows you to browse around your code by cyclomatic complexity.