NDepend

Improve your .NET code quality with NDepend

Don’t rely on someone else to protect your software

This morning I stumbled on this post Decompilation of C# code made easy with Visual Studio on the Visual Studio blog. Basically VS will soon be able to not only decompile third-party code but also generate some sort of PDB information that will make the decompiled code debuggable. The promise is no more  “No Symbols Loaded” or “Source Not Found” from within debugging session in VS and personally I found this awesome.

However most of this post’s comments are like “how do I protect my code from being decompiled then?!” and “Microsoft does not care about its customers’ intellectual property.”. These comments are absurd. Since its inception in 2002 .NET compiled code can be decompiled and read crystal clear with popular tools like .NET Reflector, IL Spy, dotPeak… This is a direct consequence of having IL/byte code and a CLR with a JIT compiler. I wonder to which extend those that wrote these comments are aware of that?

Protect your Intellectual Property

The first step is to make sure that your EULA forbid from decompiling your code, something like:  Licensee may not reverse-engineer, decompile, disassemble, modify, or translate the Product, or make any attempt to discover the source code of the Product;

The second step is to obfuscate your compiled code. Since 2002 those that want to protect their compiled code from decompilation just have to obfuscate it. There are mature free tools like ConfuserEx and paid tools available. This is what we do within our .NET shop since 2007 with success.

Also with .net native and AOT (Ahead of Time compilation) one can add a whole new layer of complexity by skipping the IL code and compile directly to machine code (e.g. X64 instructions).

But keep in mind that Obfuscators and AOT only protect the intellectual property to some extent. Your code best kept secrets are still executable in both scenarios, it means they are still there. Someone skilled ready to spend a large amount of time to reverse engineer your code can still have access to your intellectual property.

The ultimate way to protect your intellectual property is to provide your services as online SaaS. This way nobody will ever have access to your code. For example the whole SEO industry is based on guessing what Google and others web search engine are doing. These algorithms are protected because nobody except Google employees have access to them. In many scenarios Saas means also that one forces his clients to share their sensitive data, since they are processed on one’s server, and in many scenario this is not applicable. By spying your data Google and Facebook knows you better than you do, but it seems that only a fraction of the humanity disagree with that. Ok I digressed…

Protect your software from hackers

Keep in mind that Obfuscators and AOT don’t protect your code from hackers. It is still easy for any solid hacker to crack your license-checking-layer and provide a free version of your software online as a warez. The only possible protection from hackers that have access to your code are integrity checks. An integrity check is made of two parts:

  1. A layer that detects if some bytes of your compiled code have been tweaked (typically with a custom or standard hash function)
  2. A subtle malfunction of your software that prevents to use it if the compiled code has been tweaked.

The whole point of an integrity check is to consume the time of the hacker. This is why the word subtle is in bold, the malfunction is not a dumb exception to provoke a fail-fast, the malfunction must be something that appears minutes after the integrity check failed and finally makes the software totally unusable (like a massive memory leak – clearing some data – freezing some UIs – firing a timer to close the user session after a random number of minutes…)

By multiplying the integrity checks and the subtle corresponding malfunctions one can only hope to discourage a talented hackers to waste days, weeks or months to crack his software. But be aware that these guys are primarily driven by challenges… The good news is that with .NET there are tons of possible ways to write subtle integrity checks.

Conclusion

Protecting the intellectual property and the software itself is a difficult task. Don’t complain to Microsoft or anyone else that they should offer an out-of-the-box tool for that. If such a mainstream protection tool existed it would be a challenge for the best talented hackers and it wouldn’t resist long anyway.

The question you should ask for is: is it worth spending resources to protect my assets instead of spending these resources to make my paid clients even more happy. This is a difficult trade-off that must be carefully thought out. But always keep in mind that you must not rely on someone else to protect your software.

Mythical man month : 10 lines per developer day

The mythical book, Mythical man month quotes that no matter the programming language chosen, a professional developer will write on average 10 lines of code (LoC) day.

After 14 years of full-time development on the tool NDepend I’d like to elaborate a bit here.

Let’s start with the definition of logical Line of Code. Basically, a logical LoC is a PDB sequence point except sequence points corresponding to opening and closing method brace. So here we have a 5 logical lines of code method for example:

I already hear readers complaining that LoC has nothing to do with productivity. Bill Gates once said “Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs.“.

And indeed, measured on a few days or a few weeks range, LoC has nothing to do with productivity. As a full-time developer some days I write 200 LoC in a row, some days I spend 8 hours fixing a pesky bug by not even adding a LoC. Some day I clean dead code and remove some LoC. Some other days I refactor existing code without, all in all, adding a single LoC. Some days I create a large and complex UI control and the editor generates automatically 300 additional LoC. Some days are dedicated solely to performance enhancement or writing tests…

What is interesting is the average number of LoC obtained from the long term. And if I do the simple math our average is around 80 LoC per day. Let’s precise that we are strict on high code quality standard both in terms of code structure and formatting, and in terms of testing and code coverage ratio (see the last picture of this post that shows the NDepend code coverage map). For a code quality tool for developers, being strict on code quality means dogfooding☺.

So this average score of 80 LoC produced per day doesn’t sacrifice to code quality, and is a sustainable rhythm. Things get interesting with LoC after calibration: caring about counting LoC becomes an accurate estimation tool. After coding and measuring dozens of features achieved in this particular context of development, the size of any feature can be estimated accurately in terms of LoC. Hence with simple math, the time it’ll take to deliver a feature to production can be accurately estimated. To illustrate this fact, here is a decorated treemap view of the NDepend code base, K means 1.000 LoC. This view is obtained from the NDepend metric view panel with handmade coloring to illustrate my point. The small rectangles are methods grouped by parent classes, parent namespaces and parent assemblies. A rectangle area is proportional to the corresponding method #LoC.

treemap code-metric

Thanks to this map, I can compare the size in terms of LoC of most components. Coupling this information with the fact that the average coding score if 80 LoC per day, and looking back on cost in times for each component, we have an accurate method to tune our way of coding and estimate future schedules.

Of course not all components are equals. Most of them are the result of a long evolutive coding process. For example, the code model had undergone much more refactoring since the beginning than say, the dependency matrix for example that had been delivered out-of-the-box after a few months of development.

This picture reveals something else interesting. We can see that all these years spent polishing the tool to meet high professional standards in terms of ergonomy and performance, consumed actually quite a few LoC. Obviously building a performant code query engine based of C# LINQ that is now the backbone of the product took years. This feature alone now weights 34K LoC. More surprisingly just having a clean Project Properties UI management and model takes (model + UI) =(4K + 7K) = 11K LoC. While a flagship feature such as the interactive Dependency Graph only consumes 8K LoC, not as much as the Project Properties implementation. Of course the interactive Dependency Graph capitalizes a lot on the existing infrastructure developed for other features including the Dependency Model. But as a matter of fact, it took the same amount of effort to develop  the Dependency Graph than to develop a polished Project Properties model and UI.

All this confirms an essential lesson for everyone in charge of an ISV. It is lightweight and easy to develop a nice and flashy prototype application that’ll bring enthusiast users. What is really costly is to transform it into something usable, stable, clean, fast with all possible ergonomy candy to make the life of the user easier. And these are all these non-functional requirements that will make the difference between a product used by a few dozens of enthusiast users only, and a product used by the mass.

To finish, it is also interesting to visualize the code base through the prism of code coverage ratio. The NDepend code base being 86% covered, by comparing both pictures we can easily see which part is almost 100% covered and which part need more testing effort.