NDepend

Improve your .NET code quality with NDepend

C# Binary Search: Fast find of a free slot between 0 and uint.MaxValue

When a user is exporting a result to a document with NDepend, the tool needs to find a file name not taken in the temporary dir. Generating a GUID for the file name could make it. However it is more elegant to increment a counter to get readable file names like QueryResult0.html, QueryResult1.html If the user exported many files without clearing up the temporary directory there can be thousands of files to check before finding the next free slot. This situation slows down the export operation. This is how we came to the need of implementing a fast find of a free slot between 0 and uint.MaxValue.

Here is the algorithm based on binary search. Notice that:

  • You can reuse this algorithm as-is since it has nothing to do with file name testing thanks to the abstracted procedure Func<uint, bool> isTakenProc.
  • Tests are also provided to show it works also on edge cases.
  • There is no recursive method call. This way the algorithm is faster and safer since there is no risk of StackOverflowException.
  • Some invariant assertions show that there is no risk of infinite loop.
  • An OverflowException is thrown if 0, 1, all powers of two and uint.MaxValue are taken. Actually this binary search algorithm relies on the fact that there exists A in [0, 2^32-1]  where x >= A implies x is free. Without such condition I guess nothing better than a O(N) algorithm can be used (while this algorithm is O( log^2(N) )).
  • Shift << and >> operators are used for fast multiple / divide by two. The IL code generated is different for both situations (shift right/left and mul/div). However this @Paulo Morgado sample code shows that the JIT compiler is smart enough to generate the same code based on add eax,eax and shr eax,1 in both situations.

In the worst case where the expected result is uint.MaxValue, isTakenProc is called 65 times. For reasonable range of values tested in practice (10.000 or less) isTakenProc is called 29 times or less. For intellectual satisfaction, if you find a way to run this algorithm faster, put your way in comment 🙂

The tests:

 

 

My dad being an early programmer in the 70's, I have been fortunate to switch from playing with Lego, to program my own micro-games, when I was still a kid. Since then I never stop programming.

I graduated in Mathematics and Software engineering. After a decade of C++ programming and consultancy, I got interested in the brand new .NET platform in 2002. I had the chance to write the best-seller book (in French) on .NET and C#, published by O'Reilly and also did manage some academic and professional courses on the platform and C#.

Over my consulting years I built an expertise about the architecture, the evolution and the maintenance challenges of large & complex real-world applications. It seemed like the spaghetti & entangled monolithic legacy concerned every sufficiently large team. As a consequence, I got interested in static code analysis and started the project NDepend in 2004.

Nowadays NDepend is a full-fledged Independent Software Vendor (ISV). With more than 12.000 client companies, including many of the Fortune 500 ones, NDepend offers deeper insight and full control on their application to a wide range of professional users around the world.

I live with my wife and our twin kids Léna and Paul in the beautiful island of Mauritius in the Indian Ocean.

Comments:

  1. For temporary files I often use a date/time stamp as part of the name. A typical example could be QueryResult.20220103.203112.12.html where date and time are easy recognizable. An extra digit can be added for uniqueness but is hardly needed in my use-cases.

    How does your algorithm work when there are lots of temp files, but not necessarily evenly distributed.

  2. @Patrick Tingen Clearly the algorithm relies on the fact that there exists A between [0, 2^32-1] where x >= A implies x is free. Without this condition it can throws an OverflowException even if there are some free slots, like for example when only 0, 1, all powers of two and uint.MaxValue are taken. Without such condition nothing can be assumed and we fall back to a O(N) search algorithm.

    Using date is a nice option but since export can be achieved through NDepend.API very quickly we need to either choose a fine grained data convention (milli-second or less), or use such anti-collision index as explained when needed.

  3. Indeed thanks @Paulo, the reflex of using shift came from when I was coding against 68K chip more than 30 years ago 🙂

  4. Without this condition, it may throw an Overflow Exception even though there are some vacant slots, as as when just 0 and 1, all powers of two, and unit are used. Max Value has already been taken. Nothing can be assumed in the absence of such a condition, and we are forced to resort to an O(N) search algorithm.

Comments are closed.