When a user is exporting a result to a document with NDepend, the tool needs to find a file name not taken in the temporary dir. Generating a GUID for the file name could make it. However it is more elegant to increment a counter to get readable file names like QueryResult0.html
, QueryResult1.html
… If the user exported many files without clearing up the temporary directory there can be thousands of files to check before finding the next free slot. This situation slows down the export operation. This is how we came to the need of implementing a fast find of a free slot between 0
and uint.MaxValue
.
Here is the algorithm based on binary search. Notice that:
- You can reuse this algorithm as-is since it has nothing to do with file name testing thanks to the abstracted procedure
Func<uint, bool> isTakenProc
. - Tests are also provided to show it works also on edge cases.
- There is no recursive method call. This way the algorithm is faster and safer since there is no risk of
StackOverflowException
. - Some invariant assertions show that there is no risk of infinite loop.
- An
OverflowException
is thrown if0
,1
, all powers of two anduint.MaxValue
are taken. Actually this binary search algorithm relies on the fact that there existsA
in[0, 2^32-1]
wherex >= A
impliesx
is free. Without such condition I guess nothing better than aO(N)
algorithm can be used (while this algorithm isO( log^2(N) )
). - Shift
<<
and>>
operators are used for fast multiple / divide by two. The IL code generated is different for both situations (shift right/left and mul/div). However this @Paulo Morgado sample code shows that the JIT compiler is smart enough to generate the same code based onadd eax,eax
andshr eax,1
in both situations.
In the worst case where the expected result is uint.MaxValue
, isTakenProc
is called 65 times. For reasonable range of values tested in practice (10.000 or less) isTakenProc
is called 29 times or less. For intellectual satisfaction, if you find a way to run this algorithm faster, put your way in comment 🙂
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
internal static uint FindFreeSlot(Func<uint, bool> isTakenProc) { Assert.IsNotNull(isTakenProc); // Test 0 and 1 slots if (!isTakenProc(0)) { return 0; } if (!isTakenProc(1)) { return 1; } // Find the next power of 2 free slot uint powerOfTwo = 1, taken = powerOfTwo, free; while (true) { uint nextPowerOfTwo = powerOfTwo << 1; // multiply by 2 if (nextPowerOfTwo < powerOfTwo) { // Power of 2 went greater than uint.MaxValue (which is 2^32-1) // Try find a free slot between 'taken' and 'uint.MaxValue' // unless 'uint.MaxValue' is taken which leads to OverflowException free = uint.MaxValue; if (isTakenProc(free)) { throw new OverflowException(); } break; } // Invariant that shows that we avoid infinite loop Assert.IsTrue(nextPowerOfTwo > powerOfTwo); powerOfTwo = nextPowerOfTwo; if (isTakenProc(powerOfTwo)) { taken = powerOfTwo; continue; } free = powerOfTwo; break; } // Binary search algorithm to find free slot between 'free' and 'taken' while (true) { Assert.IsTrue(taken < free); uint increment = (free - taken) >> 1; // divide by 2 if (increment == 0) { return free; } uint middle = taken + increment; // Invariants that show that we avoid infinite loop Assert.IsTrue(middle < free); Assert.IsTrue(middle > taken); if (!isTakenProc(middle)) { free = middle; } else { taken = middle; } } } |
The tests:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
[Test] public void Test_FindFreeSlot() { // Test many values, including edge values close to uint.MaxValue for (uint i = 0; i < 1 * 1000 * 1000; i++) { foreach (uint freeSlot in new uint[] { i , uint.MaxValue - i, uint.MaxValue / 2 + i, uint.MaxValue / 2 - i }) { Assert.IsTrue(BinarySearch.FindFreeSlot(j => j < freeSlot) == freeSlot); } } // If all slots are taken, expect OverflowException Assert.Throws(typeof(OverflowException), () => BinarySearch.FindFreeSlot(j => true)); } [Test] public void Test_FindFreeSlot_WorstCase() { int nbCalls = 0; Assert.IsTrue(BinarySearch.FindFreeSlot(j => { nbCalls++; return j < uint.MaxValue; }) == uint.MaxValue); Assert.IsTrue(nbCalls == 65); } |
Have you benchmarked this? I would expect the generated assembly to be the same.
See it here: https://sharplab.io/#v2:C4LghgzgtgNAJiA1AHwAICYCMBYAUKgZgAIMiBhIgbzyNpONQBYiBlACwEsAzYACgFcOAO2BEAHgEoqNOrIBuYAE5EwRALziiAHi1FMAbhmzaC5QCN1mgHxW9h3LIC+R+iWYBZfgBsAIhzkCwqKS0g7GJkpEAMaWYkQAVETo9uERynCxRAD0SfpELs64jkA=
For temporary files I often use a date/time stamp as part of the name. A typical example could be QueryResult.20220103.203112.12.html where date and time are easy recognizable. An extra digit can be added for uniqueness but is hardly needed in my use-cases.
How does your algorithm work when there are lots of temp files, but not necessarily evenly distributed.
@Patrick Tingen Clearly the algorithm relies on the fact that there exists A between [0, 2^32-1] where x >= A implies x is free. Without this condition it can throws an OverflowException even if there are some free slots, like for example when only 0, 1, all powers of two and uint.MaxValue are taken. Without such condition nothing can be assumed and we fall back to a O(N) search algorithm.
Using date is a nice option but since export can be achieved through NDepend.API very quickly we need to either choose a fine grained data convention (milli-second or less), or use such anti-collision index as explained when needed.
Indeed thanks @Paulo, the reflex of using shift came from when I was coding against 68K chip more than 30 years ago 🙂
@Patrick Smacchia, I still struggle with that and other patterns too.
Without this condition, it may throw an Overflow Exception even though there are some vacant slots, as as when just 0 and 1, all powers of two, and unit are used. Max Value has already been taken. Nothing can be assumed in the absence of such a condition, and we are forced to resort to an O(N) search algorithm.