NDepend Blog

Improve your .NET code quality with NDepend

Find C# Code Duplicate

October 22, 2024 3 minutes read

Find C# Code Duplicate

Duplicate code copy-pasted is problematic. It increases maintenance efforts and the risk of inconsistencies. Changes made in one instance may not be reflected in others, leading to bugs and reduced code quality.

NDepend can assist in identifying C# code duplicated. First, we will present a case study, followed by an explanation of how it works.

Chasing for C# Code Clone: A Case Study

First starts NDepend.PowerTools.exe found in the NDepend redistributable.

NDepend.PowerTools.exe

Note that Power Tools are open-source, and you can view the source code in the NDepend.PowerTools.SourceCode folder.

Additionally, you can find a version of Power Tools that runs on Linux or MacOS in the net8.0, net7.0, or net6.0 directories (supported versions will vary with future .NET releases).

Type the key “e” to start the power tool Search For Code Duplicate.

NDepend Power Tools

Then select an NDepend project. Here we selected a project that we created and that scans NodaTime version 3.1.0. Here is a web report obtained on this project.

Within a few seconds, potential C# code duplicates are identified and listed individually. Pressing the “o” key allows to open the source declarations.

CSharp Code Duplicate Found

For example, here is a duplicate that was found:

Understanding the Code Duplicate Heuristic

The algorithm behind this code duplicate Power Tool is simple yet highly effective in practice. It works by identifying groups of methods that use the same members—such as calling the same methods, reading from or writing to the same fields. These groups are referred to as “suspect sets.” The suspect sets are then ranked based on how many common members they share.

The algorithm follows three key steps:

  1. Investigate each method (including third-party ones) to determine if their callers could be considered suspects. Methods that are called frequently, such as those from System.Collections.Generics are discarded to reduce false positives.
  2. Merge suspect sets obtained from the first step.
  3. Sort suspect sets based on a weight calculated by the number of common members called.

Pros and Cons

The duplicates identified by this algorithm are generally highly relevant. One of its key advantages over other algorithms is its resilience to minor modifications in copy-pasted code, meaning it isn’t easily fooled by slightly altered duplicates. Another strength is that the algorithm can be run directly on IL code, without requiring the source code. It’s worth noting that while this post shows examples with two methods in a suspect set, a suspect set can actually contain more than two methods.

On the downside, some suspect sets may occasionally be considered false positives from a human perspective. However, it’s almost always possible to refactor the identified duplicates into one or more parameterized methods, as seen with the two layout methods. The reality is that when multiple methods use the same set of members, it rarely happens by coincidence.

Additionally, this algorithm is extremely fast, running 10 to 100 times faster than other algorithms based on source code scanning. In practical terms, it only takes a few seconds to execute on a large, real-world codebase. This impressive speed is because NDepend.API is highly optimized for quickly navigating dependencies.

Chasing for C# Code Duplicates from your CI/CD Pipeline

Regarding the algorithm itself, it is open-source, allowing you to browse and modify it as you wish. This combination of being open-source and utilizing the NDepend.API makes it particularly well-suited for integration into a continuous integration (CI) process.

Leave a Reply

Your email address will not be published. Required fields are marked *