NDepend Blog

Improve your .NET code quality with NDepend

C# Binary Search: Fast find of a free slot between 0 and uint.MaxValue

January 3, 2022 2 minutes read

When a user is exporting a result to a document with NDepend, the tool needs to find a file name not taken in the temporary dir. Generating a GUID for the file name could make it. However it is more elegant to increment a counter to get readable file names like QueryResult0.html, QueryResult1.html If the user exported many files without clearing up the temporary directory there can be thousands of files to check before finding the next free slot. This situation slows down the export operation. This is how we came to the need of implementing a fast find of a free slot between 0 and uint.MaxValue.

Here is the algorithm based on binary search. Notice that:

  • You can reuse this algorithm as-is since it has nothing to do with file name testing thanks to the abstracted procedure Func<uint, bool> isTakenProc.
  • Tests are also provided to show it works also on edge cases.
  • There is no recursive method call. This way the algorithm is faster and safer since there is no risk of StackOverflowException.
  • Some invariant assertions show that there is no risk of infinite loop.
  • An OverflowException is thrown if 0, 1, all powers of two and uint.MaxValue are taken. Actually this binary search algorithm relies on the fact that there exists A in [0, 2^32-1]  where x >= A implies x is free. Without such condition I guess nothing better than a O(N) algorithm can be used (while this algorithm is O( log^2(N) )).
  • Shift << and >> operators are used for fast multiple / divide by two. The IL code generated is different for both situations (shift right/left and mul/div). However this @Paulo Morgado sample code shows that the JIT compiler is smart enough to generate the same code based on add eax,eax and shr eax,1 in both situations.

In the worst case where the expected result is uint.MaxValue, isTakenProc is called 65 times. For reasonable range of values tested in practice (10.000 or less) isTakenProc is called 29 times or less. For intellectual satisfaction, if you find a way to run this algorithm faster, put your way in comment 🙂

The tests:

 

 

Comments:

  1. Patrick Tingen says:

    For temporary files I often use a date/time stamp as part of the name. A typical example could be QueryResult.20220103.203112.12.html where date and time are easy recognizable. An extra digit can be added for uniqueness but is hardly needed in my use-cases.

    How does your algorithm work when there are lots of temp files, but not necessarily evenly distributed.

  2. Patrick Smacchia says:

    @Patrick Tingen Clearly the algorithm relies on the fact that there exists A between [0, 2^32-1] where x >= A implies x is free. Without this condition it can throws an OverflowException even if there are some free slots, like for example when only 0, 1, all powers of two and uint.MaxValue are taken. Without such condition nothing can be assumed and we fall back to a O(N) search algorithm.

    Using date is a nice option but since export can be achieved through NDepend.API very quickly we need to either choose a fine grained data convention (milli-second or less), or use such anti-collision index as explained when needed.

  3. Patrick Smacchia says:

    Indeed thanks @Paulo, the reflex of using shift came from when I was coding against 68K chip more than 30 years ago 🙂

  4. @Patrick Smacchia, I still struggle with that and other patterns too.

  5. Without this condition, it may throw an Overflow Exception even though there are some vacant slots, as as when just 0 and 1, all powers of two, and unit are used. Max Value has already been taken. Nothing can be assumed in the absence of such a condition, and we are forced to resort to an O(N) search algorithm.

Comments are closed.