The two pillars of code maintainability are automatic testing and clean code structure.
- Testing is used to regularly challenge code correctness and detect regression early. Testing can be easily assessed with numbers like code coverage ratio and the amount of assertions tested.
- A clean code structure prevents the phenomenon of spaghetti code, entangled code that is hard to understand and hard to maintain. However assessing the code structure cannot be achieved through numbers like for testing. Moreover the structure emerges from a myriad of details buried in many source files and thus appropriate tooling is needed.
For most engineers, code dependency graph is the tool of choice to explore code structure. Boxes and arrows graph is intuitive and well adapted to visualize a small amount of dependencies. However to visualize complex portion of code the Dependency Structure Matrix (DSM) is more adapted. See below the same set of 34 namespaces visualized with the NDepend Dependency Graph and the NDepend Dependency Matrix.
If the concept of dependency matrix is something new to you, it is important to note that:
- The Matrix headers’ elements represent graph boxes
- The Matrix non-empty cells correspond to graph arrows. Numbers on the cells represents a measure of the coupling in terms of numbers of methods and fields involved. In a symmetric matrix a pair of blue and green cell is symmetric because both cells represents the same thing: the blue cell represents A uses B and the green cells represents B is used by A.
Here is a 5 minutes introduction video if you are not familiar with the dependency matrix:
Clearly the graph is more intuitive, but apart the two red arrows that represent two pairs of namespaces mutually dependent this graph tells few things about the overall structure.
On the other hand the matrix algorithm naturally attempts to layer code elements, exhibit dependency cycles, shows which element is used a lot or not… Let’s enumerate some structural patterns that can be visualized at a glance with the dependency matrix:
Layers
One pattern that is made obvious by a DSM is layered structure (i.e acyclic structure). When the matrix is triangular, with all blue cells in the lower-left triangle and all green cells in the upper-right triangle, then it shows that the structure is perfectly layered. In other words, the structure doesn’t contain any dependency cycle.
On the right part of the snapshot, the same layered structure is represented with a graph. All arrows have the same left to right direction. The problem with graph, is that the graph layout doesn’t scale. Here, we can barely see the big picture of the structure. If the number of boxes would be multiplied by 2, the graph would be completely unreadable. On the other side, the DSM representation wouldn’t be affected; we say that the DSM scales better than graph.
Notice that NDepend proposes 2 rules out of the box to control layering by preventing dependency cycles to appear: ND1400 Avoid namespaces mutually dependent and ND1401 Avoid namespaces dependency cycles.
Interestingly enough, most of graph layout algorithms rely on the fact that a graph is acyclic. To compute layout of a graph with cycles, these algorithms temporarily discard some dependencies to deal with a layered graph, and then append the discarded dependencies at the last step of the computation.
Cycles
If a structure contains a cycle, the cycle is displayed by a red square on the DSM. We can see that inside the red square, green and blue cells are mixed across the diagonal. There are also some black cells that represent mutual direct usage (i.e A is using B and B is using A).
The NDepend’s DSM comes with the option Indirect Dependency. An indirect dependency between A and B means that A is using something, that is using something, that is using something … that is using B. Below is shown the same DSM with a cycle but in indirect mode. We can see that the red square is filled up with only black cells. It just means that given any element A and B in the cycle, A and B are indirectly and mutually dependent.
Here is the same structure represented with a graph. The red arrow shows that several elements are mutually dependent. But the graph is not of any help to highlight all elements involved in the parent cycle.
Notice that in NDepend, we provided a button to highlight cycles in the DSM (if any). If the structure is layered, then this button has for effect to triangularize the matrix and to keep non-empty cells as closed as possible to the diagonal.
High Cohesion / Low-Coupling
The idea of high-cohesion (inside a component) / low-coupling (between components) is popular nowadays. But if one cannot measure and visualize dependencies, it is hard to get a concrete evaluation of cohesion and coupling. DSM is good at showing high cohesion. In the DSM below, an obvious squared aggregate around the diagonal is displayed. It means that elements involved in the square have a high cohesion: they are strongly dependent on each other although. Moreover, we can see that they are layered since there is no cycle. They are certainly candidate to be grouped into a parent artifact (such as a namespace or an assembly).
On the other hand, the fact that most cells around the square are empty advocate for low-coupling between elements of the square and other elements.
In the DSM below, we can see 2 components with high cohesion (upper and lower square) and a pretty low coupling between them.
While refactoring, having such an indicator can be pretty useful to know if there are opportunities to split coarse components into several more fine-grained components.
Too many responsibilities
The popular Single Responsibility Principle (SRP) states that: a class shouldn’t have more than one reason to change. Another way to interpret the SRP is that a class shouldn’t use too many different other types. If we extend the idea at other level (assemblies, namespaces and method), certainly, if a code element is using dozens of other different code elements (at same level), it has too many responsibilities. Often the term God class or God component is used to qualify such piece of code.
DSM can help pinpoint code elements with too many responsibilities. Such code element is represented by columns with many blue cells and by rows with many green cells. The DSM below exposes this phenomenon.
Popular Code Elements
A popular code element is used by many other code elements. Popular code elements are unavoidable (think of the String class for example).
A popular code element is not a flaw. However it is advised that popular elements are interfaces and enumerations. This way consumers rely on abstractions and not on implementations details. The benefit is that consumers are less often broken because abstraction are less subject to change than implementations.
A popular code element is represented by columns with many green cells and by rows with many blue cells. The DSM below highlights a popular code element.
Something to notice is that when one is keeping its code structure perfectly layered, popular components are naturally kept at low-level. Indeed, a popular component cannot de-facto use many things, because popular component are low-level, they cannot use something at a higher level. This would create a dependency from low-level to high-level and this would break the acyclic property of the structure.
Mutual dependencies
You can see the coupling between 2 components by right clicking a non-empty cell, and select the menu Open this dependency.
If the opened cell was black as in the snapshot above (i.e if A and B are mutually dependent) then the resulting rectangular matrix will contains both green and blue cells (and eventually black cells as well) as in the snapshot below.
In this situation, you’ll often notice a deficit of green or blue cells (3 blue cells for 1 green cell here). It is because even if 2 code elements are mutually dependent, there often exists a natural level order between them. For example, consider the System.Threading namespaces and the System.String class. They are mutually dependent; they both rely on each other. But the matrix shows that Threading is much more dependent on String than the opposite (there are much more blue cells than green cells). This confirms the intuition that Threading is upper level than String.