NDepend

Improve your .NET code quality with NDepend

Let’s Build a Metric: Using CQLinq to Reason about Application State

I’ve been letting the experiments run for a bit before posting results so as to give all participants enough time to submit, if they so choose.  So, I’ll refresh everyone’s memory a bit here.  Last time, I published a study of how long it took, in seconds (self reported) for readers to comprehend a series of methods that varied by lines of code.  (Gist here).  The result was that comprehension appears to vary roughly quadratically with the number of logical lines of code.  The results of the next study are now ready, and they’re interesting!

Off the cuff, I fully expected cyclomatic complexity to drive up comprehension time faster than the number of lines of code.  It turns out, however, that this isn’t the case.  Here is a graph of the results of people’s time to comprehend code that varied only by cyclomatic complexity.  (Gist here).

SecondsVsCyclomaticComplexity

If you look at the shape of this graph, the increase is slightly more aggressive than linear, but not nearly as aggressive as the increase that comes with an increase in lines of code.  When you account for the fact that a control flow statement is also a line of code, it actually appears that conditionals are easier to comprehend than the mathematical statements from the first experiment.

Because of this finding, I’m going to ignore cyclomatic complexity for the time being in our rough cut time to comprehend metrics.  I’ll assume that control flow statements impact time to comprehend as lines of code more than as conditional branching scenarios.  Perhaps this makes sense, too, since understanding all of the branching of a method is probably an easier task than testing all paths through it.

As an aside, one of the things I love about NDepend is that it lets me be relatively scientific about the approach to code.  I constantly have questions about the character and makeup of code, and NDepend provides a great framework for getting answers quickly.  I’ve actually parlayed this into a nice component of my consulting work — doing professional assessments of code bases and looking for gaps that can be assessed.

Going back to our in-progress metric, it’s going to be important to start reasoning about other factors that pertain to methods.  Here are a couple of the original hypotheses from earlier in the series that we could explore next.

  • Understanding methods that refer to class fields take longer than purely functional methods.
  • Time to comprehend is dramatically increased by reference to global variables/state.

If I turn a critical eye to these predictions, there are two key components: scope and popularity.  By scope, I mean, “how closely to the method is this thing defined?”  Is it a local variable, defined right there in the method?  Is it a class field that I have to scroll up to find a definition of?  Is it defined in some other file somewhere (or even some other assembly)?  One would assume that having to pause reading the method, navigate to some other file, open it, and read to find the definition of a variable would mean a sharp spike in time to comprehend versus an integer declared on the first line of the method.

And, by popularity, I mean, how hard is it to reason about the state of the member in question?  If you have a class with a field and two methods that use it, it’s pretty easy to understand the relationship and what the field’s value is likely to be.  If we’re talking about a global variable, then it quickly becomes almost unknowable what the thing might be and when.  You have to suck the entirety of the application’s behavior into your head to understand all the things that might happen in your method.

I’m not going to boil that ocean here, but I am going to introduce a few lesser known bits of awesomeness that come along for the ride in CQLinq.  Take a look at the following CQLinq.

If your reaction is anything like mine the first time I encountered this, you’re probably thinking, “you can do THAT?!” Yep, you sure can. Here’s what it looks like against a specific method in my Chess TDD code base.

MethodFieldsAndParametersResults

The constructor highlighted above is shown here:

BoardConstructor

As you can see, it has one parameter, uses two fields, and assigns both of those fields.

When you simply browse through the out of the box metrics that come with NDepend, these are not the kind of things you notice immediately.  The things toward which most people gravitate are obvious metrics, like method size, cyclomatic complexity, and test coverage.  But, under the hood, in the world of CQLinq, there are so many more questions that you can answer about a code base.

Stay tuned for next time, as we start exploring them in more detail and looking at how we can validate potential hypotheses about impact on time to comprehend.

And if you want to take part in this on going experiment, click below to sign up.




Join the Experiment



Let’s Build a Metric: Incorporating Results and Exploring CQLinq

It turns out I was wrong in the last post, at least if the early returns from the second experiment are to be believed.  Luckily, the scientific method allows for wrongness and is even so kind as to provide a means for correcting it.  I hypothesized that time to comprehend would vary at a higher order with cyclomatic complexity than with lines of code.  This appears not to be the case.  Hey, that’s why we are running the experiments, right?

By the way, as always, you can join the experiment if you want.
You don’t need to have participated from the beginning by any stretch, and you can opt in or out for any given experiment as suits your schedule.

Join the Experiment

 

Results of the First Experiment

Recall that the first experiment asked people to record time to comprehend for a series of methods that varied by number of lines of code.  To keep the signal to noise ratio as high as possible, the methods were simply sequential arithmetic operations, operating on an input and eventually returning a transformed output.  There were no local variables or class level fields, no control flow statements, no method invocations, and no reaching into global state.  Here is a graph of the results from doing this on 3 methods, with 1, 5, and 10 logical lines of code.

LogicalLinesOfCodeComprehensionTime

So as not to overburden anyone with work, and because it’s still early, the experiment contained three methods, yielding three points.  Because this looked loosely quadratic, I used the three points to generate a quadratic formula, which turned out to be this.

GraphOfComprehensionVsLLOC

It’s far from perfect, but this gives us our first crack at shaping time to comprehend as something experimental, rather than purely hypothetical.  Let’s take a look at how to do this using NDepend in Visual Studio.  Recall all the way back in the second post in this series that I defined a metric for time to comprehend.  It was essentially a placeholder for the concept, pending experimental results.

All we’re doing is setting the unit we’ve defined, “Seconds,” equal to the number of lines of code in a method.  But hey, now that we’ve got some actual data, let’s go with it!  The code for this metric now looks like this.

I’ve spread on multiple lines for the sake of readability and with a nod to the notion that this will grow as time goes by. Also to note is that I’ve included, for now, the number of logical lines of code as a handy reference point.

Exploring CQLinq Functionality

This is all fine, but it’s a little hard to read.  As long as we’re here, let’s do a brief foray into NDepend’s functionality.  I’m talking specifically about CQLinq syntax.  If you’re going to get as much mileage as humanly possible out of this tool, you need to become familiar with CQLinq.  It’s what will let you define your own custom ways of looking at and reasoning about your code.

I’ve  made no secret that I prefer fluent/expression Linq syntax over the operator syntax, but there are times when the former isn’t your best bet.  This is one of those times, because I want to take advantage of the “let” keyword to define some things up front for readability.  Here’s the metric converted to the operator syntax.

With that in place, let’s get rid of the cumbersome repetition of “m.NbLinesOfCode” by using the let keyword. And, while we’re at it, let’s give NbLinesOfCode a different name. Here’s what that looks like in CQLinq.

That looks a lot more readable, huh? It’s now something at least resembling the equation pictured above. But there are a few more tweaks we can make here to really clean this thing up, and they just so happen to demonstrate slightly more advanced CQLinq functionality. We’ll use the let keyword to define a function instead of a simple assignment, and then we’ll expand the names out a bit to boot. Here’s the result.

Pretty darned readable, if I do say so myself! It’s particularly nice the way seconds is now expressed — as a function of our LengthFactor equation. As we incorporate more results, this approach will allow this thing to scale better with readability, as you’ll be able to see how each consideration contributes to the seconds.

So, what does it look like? Check it out.

Updated Seconds Metric with CQLinq

Now we can examine the code base and get a nice readout of our (extremely rudimentary) calculation of how long a given method will take to understand.  And you know what else is cool?  The data points of 8.6 seconds for the 1 LLOC method and 51 for the 5 LLOC method.  Those are cool because those were the experimental averages, and seeing them in the IDE means that I did the math right. 🙂

So we finally have some experimental progress and there’s some good learning about CQLinq here.  Stay tuned for next time!

 

 

 

Toward Bug Free Software: Lines of Defense

Hurrah!! Last week we released NDepend v6 RTM. Once again we relied on a 2 months private beta-testing period and a one month Release Candidate period to do our best to release a polished and stable product.

I’d like to talk about our lines of defense to fix as many bugs as possible. Except a few pieces of software in the world that can afford mathematical demonstrations to prove they are bug-free (like plane and some medical ones), all other pieces of software, including NDepend, rely on an empirical approach to chasing bugs and fixing them. An empirical approach is an evidence based approach that relies on direct observations and experimentation in the acquisition of new knowledge. An empirical approach will never lead to a bug-free product, but it can help a lot in keeping the number of bugs low, and make it so that bugs happen in rare enough situations that won’t have any impact on most user’s experience.

I could write many blog posts about each line of defense, after more than a decade applying them there is so much to say, but I want this post to be synthetic.

Production Crash Logs

Crashes are due to unhandled exceptions. Unhandled exceptions are due to situations at runtime that were unexpected, this typically includes:

  • null reference access,
  • division by zero,
  • disposed object accessed,
  • invalid cast,
  • wrong method call parameters,

A bug doesn’t necessarily lead to a crash, but a crash necessarily mean that there is a bug. Certainly the most important line of defense against bugs is to log all production crashes and relentlessly fix them all. The .NET Framework offers several unhandled exception access points, including:

In some environments like Visual Studio hosting, these access points don’t work and you’ll have to write code to catch all exceptions in all possible handlers of your program.

Of course some users are disconnected from internet, or behind a proxy, and you won’t get those production crashes. Our statistics show that this concerns only 20% of users at most. So being aware of 80% of all production crashes is certainly enough to have a good measure of what’s going wrong in a production. And because Windows and .NET are highly sophisticated technologies that are constantly evolving, you can expect plenty of issues that never occurred on your team’s machines! For example, the NDepend v6 Release Candidate Period shows us that users running NDepend v6 RC on a fresh Windows 8.1 or Windows 10 install, experienced crashes because of a P/Invoked win32 method that our code calls. Oddly enough this P/Invoked method behaves differently when .NET v3.5 is not installed on the machine!

The key to successful production crash logs, is to get as many useful data as possible per log. For example here is a production crash below we had a few week ago. When the same issue lead to several crash logs, we can start doing data mining on it. Do they all occur with the same stack trace? with the same high DPI resolution? on the same Windows platform version ? on the same machine only? Notice also the stack frames improved with IL offset retrieved with the StackFrame.GetILOffSet() method. Many times in the past, this alone lead us to the root cause.

ProductionCrash

 

You’ll notice that we only log crashes, we don’t have other forms of runtime logs, like logging every major event that happen (button clicked, panel opened, analysis started…). Our experience with logging events is that ultimately we logged too much or not enough of them. In both cases, the information that could help fixi a particular issue is then lost or hard to find. We found out that having verbose crash logs was enough. Sometime we can ask a user a question, like which action did you do just before the crash, but in the vast majority of real-world cases, this information is implicitly contained in the stack trace. For the same reason we don’t use remote debugging nor Windows dump files. In our context, custom and verbose production crash logs are enough.

Code Covered by Automatic Tests and Code Contracts

Not only are production crash logs an important line of defense, but they also demand just a few days of dedicated work to set up.

Having automatic tests is the second line of defense. Contrary to production crash log,s not only does it require a lot of work, but it even changes (forever) the way you write code. After a decade of writing automatic tests, a lot of conclusions can be made. In my attempt to remain synthetic in this post, let’s try to summarize the most relevant ones in a few points:

  • The number of tests is absolutely meaningless.  When it comes to unit/automatic tests, the king measure is the percentage of code covered by tests.
  • A high percentage of code covered by tests is not enough, everything that can be checked must be checked. In almost all literature related to unit testing, you’ll read that checks are assertions in unit test code. Few actually realize that assertions in the code itself are at least as important as assertions in unit-test code. There is a scientific terminology for assertions in code: Code Contracts. The important thing about code contracts is that they must fail both at manual-test time (see later) and at automatic-test time.
  • In NDepend, we use more than 20K simple Debug.Assert() in code. These are our contracts. Debug.Assert() are removed in production code. This is ok for us since we want maximum performance and having some sophisticated assertions at runtime can significantly slow down an application. Hence we decided to sacrifice an important line of defense in the name of performance. By using MS Code Contract that could let have assertions fail in production, we could increase the number and the accuracy of production crash logs. This is a choice you must make depending of your application. Let’s precise that NDepend.API actually supports MS Code Contract for its great ability to provide active documentation to users.
  • How much coverage is enough? My answer is 100%.
    • Typically, developers don’t want to lose time writing tests to cover say, properties getter and setter. My point is twofold: typically you can write higher level tests that will cover these + if these getters and setters contain assertions, this is even better.
    • Typically, developers claim that 10% of a class is difficult to test, it takes as much effort to test this 10% as it does to test the remaining 90%. Once again, they don’t want to lose time! My point is that this 10% of code is by definition not easily testable, as a consequence this code is both complex and not-well designed, and as a consequence it is certainly highly-bug-prone. So basically, the highest bug-prone portion of code ends up being not covered by automatic tests!! This is non-sense but this is the reality in most dev shops.
    • Typically, developers say that not everything is coverable by tests and I agree. Code calling blocking methods like MessageBox.Show() is just not coverable, this is why such calls must be mocked. Some other UI code can be especially tricky to test. The approach we use for this is that we designed our UIs in a way that the underlying code can be triggered by unit tests (some would say automated by tests) and then, we mostly rely on assertions in UI code itself to catch any potential regression. Of course when possible, assertions in tests are welcome and of course such UI code is highly decoupled from non-UI logic that has its own set of tests. Doing so has been proving work for our dev shop.
    • I’ll add that when a class or a group of classes are 100% covered by tests, experience shows that the innocent fact that suddenly a coverage hole appears, often means that there is a new problem, either in the code, in test code, or  in both. More often than not, we discovered regression bugs this way that were not caught by assertions. This is why we use tooling (aka NDepend) to check that all code that used to be 100% covered must remain 100% covered.
    • Last but not least, when a bug is fixed, if the fixed code portion is already covered by tests, it is easy to write assertions specific to the fix to avoid any regression in the future. And when most classes are 100% covered, more often than not it is a matter of minutes, or even seconds, to write such assertions.
  • If your application is successful enough, the code base will grow over the years. Finally, the biggest benefit you can expect from writing coverage-oriented automatic tests, is that the number of regression issues will remain under control because it won’t be proportional to the growing size of the code base. Keep in mind that only code covered by tests whose result is asserted somehow is protected by this line of defense.

Let’s illustrate this section with the NDepend 82.6% code coverage visualized with the NDepend metric view. We abide by our rules.

Coverage

Static Analysis and Code Review

I see static analysis as unit tests, but instead of exercising the code dynamically, static analysis exercises properties that can be inferred from the code. In the previous section for example, I wrote that if a class was 100% covered by tests, it must remain 100% covered by tests.  And I even underlined that if a hole suddenly pops up in this perfect coverage, more often than not, understanding the root cause of the hole will lead to a bug fix. This illustrates how static analysis is actually a line of defense against bugs.

In the previous analogy between static analysis and unit-tests, a test is actually a code rule. NDepend makes it easy to write custom code rules, it is just a matter of writing a C# LINQ query based on a fluent API, for example:

Code rules involving code coverage and diff after setting a baseline are especially suited to hunt regression bugs. But static analysis can handle many other properties of the code and it is not only related to bugs but also to code maintainability.

  • Code metrics: for example, methods with too many loops, if, else, switch, case… end up being non-understandable, hence non-maintainable. Counting these through the code metric Cyclomatic Complexity is a great way to assess when a method becomes too complex.
  • Dependencies: if the classes of your program are entangled, effects of any changes in the code becomes unpredictable. Static analysis can help to assess when classes and components are entangled.
  • Immutability: types that are used concurrently by several threads should be immutable, else you’ll have to protect state read/write access with complex lock strategies that will end up being un-maintainable. Static analysis can make sure that some classes remain immutable.
  • Dead code: dead code is code that can be removed safely, because it is not invoked anymore at runtime. Not only can it be removed, but it must be removed, because this extra code add unnecessary complexity to the program. Static analysis can find most of dead code in your program (yet not all).
  • API breaking change: if you present an API to your client, it is very easy to remove a public member without noticing and thus, breaking your clients code. Static analysis can compare two states of a program and can warn about this pitfall.
  • API usage: some APIs are intended to be used carefully. For example, a class that hold disposable fields must be itself disposable in general, except when the disposable field lifetime is not aligned with the class instances lifetime, which then sounds like a design problem.

The list of code properties that can be exercised by static analysis is endless. And the quoted ones refer to NDepend’s capabilities, some other tools like Resharper or CodeRush offer some other sorts of static analysis to warn about micro potential issues, like if a foreach variable is accessed from a closure for example, this can lead to major problems.

Static analysis is not only about directly finding bugs, but also about finding bug-prone situations that can decrease code understanding and maintainability.

Concerning code review, I don’t have much to say. This is static analysis except that the logic is statically checked by human instead of being checked by automatic rules. Thus it is highly imperfect and time consuming, yet we still practice it because experience shows that it helps finding issues that can hardly be found otherwise. The key to code review is to do it on bug-prone code, which include refactored code and new code that haven’t reached production yet.

Manual Tests and Beta Testing

No matter how good a team is at the previously explained lines of defense, if it fails at manual tests and beta testing, the end product will be buggy and ultimately unusable.

Because not all bugs lead to obvious crashes, tests done by humans are essential. For example, only a potential user can notice an incoherent numerical result in a UI.

Manual testing is like code review, highly imperfect and time consuming, and a team cannot capitalize on it. Yet, experience shows that it helps finding issue that can hardly be found otherwise.

We mentioned previously code contracts, they work hand in hand with manual tests. When, during a manual test session, I have the chance to break an assertion, this actually makes me happy 🙂 because I know that this is a great opportunity to fix a bug before it reaches the next release line.

Manual testing actually includes user feedback. Users are paying for a product and one main goal is to offer them a bug free product. Nevertheless, de-facto users are also testers and listening carefully to a user’s bug report and relentlessly struggling to fix them is an essential line of defense. Of course this does not only apply to bugs, but also to feature improvements, new feature suggestions, documentation gap, and much more, but these are another topics.