NDepend Blog

Improve your .NET code quality with NDepend

Faster Dictionary in C#

September 2, 2024 5 minutes read

faster-csharp-dictonary

In the .NET Base Class Library, Dictionary<TKey, TValue> is an essential key-based hashtable providing constant time access to values. This means accessing dictionary[key] takes the same amount of time regardless of the number of entries in the dictionary. Constant time access is usually referred to as O(1) access.

Avoiding duplicate lookup

Rewriting an algorithm using a hashtable can drastically boost performance. However, the main cost in hashtables comes from the internal O(1) lookup for each access. Some algorithms may become less efficient when they involve duplicated lookup to the same key.

A common dictionary use-case is to add an entry if the key is not already there:

Here the lookup for the key guid is achieved twice: during dico.ContainsKey(guid) and during dico.Add(guid, value). To solve this duplicate lookup the method TryAdd(key, value) has been added to Dictionary<TKey, TValue> in .NET Core 2.0 released in 2017. Now you can write instead:

Let’s benchmark with Benchmark.NET how faster it is to use this method. Here is the benchmark result:

Here is the benchmark code:

To add the package Benchmark.NET in your project just add to the .csproj:

Notice that the method TryAdd(key, value) is not available in the good old .NET Framework.

Accessing dictionary internal layout with managed pointers

There remain duplicate lookup scenarios not addressed by the TryAdd(key, value) method. This is why two methods were added to the class System.Runtime.InteropServices.CollectionsMarshal in .NET 6 released in 2021:

CollectionMarshal documentation states that it is an unsafe class that provides a set of methods to access the underlying data representations of collections.

The ref keyword and managed pointers

The usage of ref keywords in the methods’ definition above and the terms unsafe and underlying data representations of collections means that we are entering the realm of managed pointer. Managed pointer is a .NET runtime feature that existed since the inception of .NET 1.0 in 2001. C# 1.0 relied on it to pass parameters to a method with the ref and out keywords. A managed pointer must live on the stack and can point to any kind of memory: object reference, slot within an array, field within an object, or unmanaged memory. In a runtime with a garbage collector like .NET, managed pointers offer the best of both worlds:

  • They are as fast as a regular C or C++ pointer
  • They are tracked by the runtime in case the memory pointed gets moved by the garbage collector

This is why since the release of C# 7.0 in 2017, Microsoft expanded over the years the use of the ref keyword to a broader set of scenarios, including ref locals, ref returns, ref struct, and ref fields. This is all explained in this article: Managed pointers, Span, ref struct, C#11 ref fields and the scoped keyword

Scenario 1: Creating the value only if the key is not present in the dictionary

Now let’s see how we can concretely improve our dictionaries’ performance. The scenario 1 is similar to the previous one with TryAdd(), but here, the value is created only when the key is not already present in the dictionary. Here TryAdd() cannot be used anymore:

Thanks to the method CollectionMarshall.GetValueRefOrAddDefault() it is possible to address this scenario with only a single lookup for key:

The screen recording below demonstrates that GetValueRefOrAddDefault() functions as intended: if the key is absent from the dictionary, it adds the key with a default value (default(string) which is null), and then returns a reference to the value’s location within the dictionary’s layout.

You might be intrigued by the ! character at the end of:

It simply instructs the compiler not to emit a nullable-related warning about the possibility of a Dictionary<Guid, string> containing a null reference for a string value.

Benchmarking scenario 1

Here is the benchmark result:

Here is the benchmark code:

Scenario 2: Modifying structs value in dictionary

If the values in a dictionary are structs and you need to modify a value, a duplicate lookup is required. This happens because accessing the value through dico[guid] returns a copy of the struct stored in the dictionary.

Again, using a managed pointer to the value’s location within the dictionary can eliminate the need for the second lookup:

If it’s uncertain whether the key is present in the dictionary you must use if (!Unsafe.IsNullRef(pointerToValLocation)) instead of if (pointerToValLocation != null) because else you get this compiler warning:

Benchmarking scenario 2

Here is the benchmark result:

Here is the benchmark code:

Scenario 3: Large Structs value in dictionary

In this scenario, the dictionary values are large structs. With the usual approach, accessing a value results in a copy, which incurs a performance cost proportional to the struct footprint. Instead, we can access the large struct value using a managed pointer. Below are the benchmark results:

And here is the benchmark code:

Caution when using managed pointers on a dictionary

Using MemoryMarshal methods with a dictionary can be powerful but require careful handling. It lets obtain managed pointers toward dictionary’s internal layout. However, a dictionary layout at runtime is complex and involves numerous arrays called buckets. A managed pointer references a slot within one of these buckets. While the garbage collector safely updates such managed pointers if it moves a bucket array, problems can arise when entries are added or removed. Such operations might cause the dictionary internal layout to reorganize with new buckets, potentially leaving your managed pointer pointing to an orphaned array.

Don’t add or remove entries from a dictionary while holding a managed pointer obtained through a MemoryMarshal method on that dictionary.

Conclusion

Year after year, Microsoft enhances .NET to enable developers to write even more performant code. Managed pointers, accessible via the ref keyword, have become a crucial tool in achieving this goal. The initial push for increased use of managed pointers came with the introduction of Span<T>, as explained in Improve C# code performance with Span<T>. However, they have proven valuable in various scenarios, including optimizing dictionaries, faster array and list loops, and improving performance in C# 13 params collections.