The mythical book, Mythical man month quotes that no matter the programming language chosen, a professional developer will write on average 10 lines of code (LoC) day.
After 14 years of full-time development on the tool NDepend I’d like to elaborate a bit here.
Let’s start with the definition of logical Line of Code. Basically, a logical LoC is a PDB sequence point except sequence points corresponding to opening and closing method brace. So here we have a 5 logical lines of code method for example:
I already hear readers complaining that LoC has nothing to do with productivity. Bill Gates once said “Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs.“.
And indeed, measured on a few days or a few weeks range, LoC has nothing to do with productivity. As a full-time developer some days I write 200 LoC in a row, some days I spend 8 hours fixing a pesky bug by not even adding a LoC. Some day I clean dead code and remove some LoC. Some other days I refactor existing code without, all in all, adding a single LoC. Some days I create a large and complex UI control and the editor generates automatically 300 additional LoC. Some days are dedicated solely to performance enhancement or writing tests…
What is interesting is the average number of LoC obtained from the long term. And if I do the simple math our average is around 80 LoC per day. Let’s precise that we are strict on high code quality standard both in terms of code structure and formatting, and in terms of testing and code coverage ratio (see the last picture of this post that shows the NDepend code coverage map). For a code quality tool for developers, being strict on code quality means dogfooding☺.
So this average score of 80 LoC produced per day doesn’t sacrifice to code quality, and is a sustainable rhythm. Things get interesting with LoC after calibration: caring about counting LoC becomes an accurate estimation tool. After coding and measuring dozens of features achieved in this particular context of development, the size of any feature can be estimated accurately in terms of LoC. Hence with simple math, the time it’ll take to deliver a feature to production can be accurately estimated. To illustrate this fact, here is a decorated treemap view of the NDepend code base, K means 1.000 LoC. This view is obtained from the NDepend metric view panel with handmade coloring to illustrate my point. The small rectangles are methods grouped by parent classes, parent namespaces and parent assemblies. A rectangle area is proportional to the corresponding method #LoC.
Thanks to this map, I can compare the size in terms of LoC of most components. Coupling this information with the fact that the average coding score if 80 LoC per day, and looking back on cost in times for each component, we have an accurate method to tune our way of coding and estimate future schedules.
Of course not all components are equals. Most of them are the result of a long evolutive coding process. For example, the code model had undergone much more refactoring since the beginning than say, the dependency matrix for example that had been delivered out-of-the-box after a few months of development.
This picture reveals something else interesting. We can see that all these years spent polishing the tool to meet high professional standards in terms of ergonomy and performance, consumed actually quite a few LoC. Obviously building a performant code query engine based of C# LINQ that is now the backbone of the product took years. This feature alone now weights 34K LoC. More surprisingly just having a clean Project Properties UI management and model takes (model + UI) =(4K + 7K) = 11K LoC. While a flagship feature such as the interactive Dependency Graph only consumes 8K LoC, not as much as the Project Properties implementation. Of course the interactive Dependency Graph capitalizes a lot on the existing infrastructure developed for other features including the Dependency Model. But as a matter of fact, it took the same amount of effort to develop the Dependency Graph than to develop a polished Project Properties model and UI.
All this confirms an essential lesson for everyone in charge of an ISV. It is lightweight and easy to develop a nice and flashy prototype application that’ll bring enthusiast users. What is really costly is to transform it into something usable, stable, clean, fast with all possible ergonomy candy to make the life of the user easier. And these are all these non-functional requirements that will make the difference between a product used by a few dozens of enthusiast users only, and a product used by the mass.
To finish, it is also interesting to visualize the code base through the prism of code coverage ratio. The NDepend code base being 86% covered, by comparing both pictures we can easily see which part is almost 100% covered and which part need more testing effort.
Nice article, what tool did you use to generate the heat map?
The tool NDepend (that analysis itself actually)
https://www.ndepend.com/docs/treemap-visualization-of-code-metrics
Coding rate is an F distribution in my experience. It has a near peak, falls very fast, with a long tail of rare people. Some years ago, I worked at Mastercard as a contractor. Most of the staff were contractors. LOC was an automated metric, one of several. It included time from official project start through integration testing. Most were in the range from 500 LOC to about 2500. I was considered very high at around 5000. But there was one person there who came in above 30,000 LOC. This was so high that they thought their tool had to be broken. Then, they thought this woman had to be cutting and pasting large amounts of code. But, after looking into it neither was found to be true. Not only that, she had the lowest number of bugs per k LOC of anyone. Her code floated through unit and integration test. I found it as hard to believe as anybody. She had no idea she was doing so much better. She thought she was barely making it, and trying hard to keep up with everyone else.
Thanks for sharing Brian, your story is even more interesting taking account that the super-hero-coder is an humble woman
I have issues believing the 30.000LOC/day average number given by Brian,
So let’s assume this coder works in average 8h and 20 minutes per day, without being interrupted (no lunch, no stand-up, 8hours and 20 minutes coding straight without interruption in the zone.
We all know that there is 3600 seconds in an hour.
What you are telling me here is that this person can write one line of code per second.
Assuming a line of code of 30 characters, this is one character typed every 33ms, which is – I am sorry impossible since most keyboard’s latency are above this number this number.
So your coder is basically typing as fast as leaving a finger on a key repeat – 8h20min straight. Without switching windows neither compiling anything, just leave the finger pressed on a key.
Numbers are wrong, maybe you count 30.000 LOC are per week or per month.
@Ant1
Read the whole comment: “It included time from official project start through integration testing.”
It’s not LOC/day. That was LOC/month. Sorry for not saying so. I thought I had, but I didn’t.
It was an assembler shop. So lines are shorter to type.
500 LOC/month, assuming 20 working days/month would be 25 LOC/day
2500 / 20 = 125 LOC/day
5000 / 20 = 250 LOC/day
30,000 / 20 = 1500/day
Yes, she typed quickly. Faster than me for sure. She sat across a divider from me and was scary focused. And very tense.
Days weren’t 8 hour days. They were 8-12. Probably 10 average.
Like I said, they thought their system had to be broken. It seemed impossible.
She had really clean, nice code with excellent comments too. Short, to the point and clear.
But no, it wasn’t milliseconds. LOL.
Yes, Patrick. Although I don’t know how this coding bro idea that women aren’t good at coding came about. Women started out BEING the computers that ran calculations for government agencies and large firms. Women were critical to computing at the start. A woman wrote the code that didn’t break for Apollo. Women defined languages. I’m older, so in my career women were at least half the workforce. I worked in a group once where I was one of 2 men in the group of 9. The manager was a woman, and her boss was a woman and so was her bosses boss a woman. It was normal.
My experience, women are just as good as men are at coding. What they aren’t as good at (in general) is handling a workplace. that looks down on them. In my experience, women tend to have really good memories for where they’ve seen things and finding them, which is useful in coding.
Women are very well represented in sciences. They dominate veterinary science for instance, and vet school is often considered harder than med school. There’s lots of data showing this, and the dominance of men in coding today is something that appeared after 2000. (With easier languages, BTW.)
You could measure the hours to develop each feature. Would be a more accurate estimation tool.