In the .NET Base Class Library, Dictionary<TKey, TValue>
is an essential key-based hashtable providing constant time access to values. This means accessing dictionary[key]
takes the same amount of time regardless of the number of entries in the dictionary. Constant time access is usually referred to as O(1) access.
Avoiding duplicate lookup
Rewriting an algorithm using a hashtable can drastically boost performance. However, the main cost in hashtables comes from the internal O(1) lookup for each access. Some algorithms may become less efficient when they involve duplicated lookup to the same key.
A common dictionary use-case is to add an entry if the key is not already there:
1 2 3 4 5 6 |
var dico = new Dictionary<Guid, string>(); var guid = Guid.NewGuid(); var value = guid.ToString(); if(!dico.ContainsKey(guid)) { dico.Add(guid, value); } |
Here the lookup for the key guid
is achieved twice: during dico.ContainsKey(guid)
and during dico.Add(guid, value)
. To solve this duplicate lookup the method TryAdd(key, value)
has been added to Dictionary<TKey, TValue>
in .NET Core 2.0 released in 2017. Now you can write instead:
1 |
bool added = dico.TryAdd(guid, value); |
Let’s benchmark with Benchmark.NET how faster it is to use this method. Here is the benchmark result:
1 2 3 4 |
| Method | Mean | Allocated | |-------------------------- |---------:|----------:| | TryAddKeyValuePair | 274.6 ns | 518 B | | TryAddKeyValuePair_Faster | 211.4 ns | - | |
Here is the benchmark code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; BenchmarkRunner.Run<Benchmarks>(); [MemoryDiagnoser] [HideColumns("StdDev", "Median", "Job", "RatioSD", "Error", "Gen0", "Alloc Ratio")] [Config(typeof(FastRunConfig))] public class Benchmarks { static readonly Dictionary<Guid, string> s_Dico1 = new Dictionary<Guid, string>(); const string s_Value = "value"; [Benchmark] public void TryAddKeyValuePair() { var key = Guid.NewGuid(); if(!s_Dico1.ContainsKey(key)) { s_Dico1.Add(key, s_Value); } } static readonly Dictionary<Guid, string> s_Dico2 = new Dictionary<Guid, string>(); [Benchmark] public void TryAddKeyValuePair_Faster() { var key = Guid.NewGuid(); bool added = s_Dico1.TryAdd(key, s_Value); } } |
To add the package Benchmark.NET in your project just add to the .csproj:
1 2 3 |
<ItemGroup> <PackageReference Include="BenchmarkDotNet" Version="0.14.0" /> </ItemGroup> |
Notice that the method TryAdd(key, value)
is not available in the good old .NET Framework.
Accessing dictionary internal layout with managed pointers
There remain duplicate lookup scenarios not addressed by the TryAdd(key, value)
method. This is why two methods were added to the class System.Runtime.InteropServices.CollectionsMarshal
in .NET 6 released in 2021:
1 2 3 4 5 |
public static ref TValue GetValueRefOrNullRef<TKey,TValue> (System.Collections.Generic.Dictionary<TKey,TValue> dictionary, TKey key); public static ref TValue? GetValueRefOrAddDefault<TKey,TValue> (System.Collections.Generic.Dictionary<TKey,TValue> dictionary, TKey key, out bool exists); |
CollectionMarshal
documentation states that it is an unsafe class that provides a set of methods to access the underlying data representations of collections.
The ref keyword and managed pointers
The usage of ref keywords in the methods’ definition above and the terms unsafe and underlying data representations of collections means that we are entering the realm of managed pointer. Managed pointer is a .NET runtime feature that existed since the inception of .NET 1.0 in 2001. C# 1.0 relied on it to pass parameters to a method with the ref
and out
keywords. A managed pointer must live on the stack and can point to any kind of memory: object reference, slot within an array, field within an object, or unmanaged memory. In a runtime with a garbage collector like .NET, managed pointers offer the best of both worlds:
- They are as fast as a regular C or C++ pointer
- They are tracked by the runtime in case the memory pointed gets moved by the garbage collector
This is why since the release of C# 7.0 in 2017, Microsoft expanded over the years the use of the ref
keyword to a broader set of scenarios, including ref
locals, ref
returns, ref struct
, and ref
fields. This is all explained in this article: Managed pointers, Span, ref struct, C#11 ref fields and the scoped keyword
Scenario 1: Creating the value only if the key is not present in the dictionary
Now let’s see how we can concretely improve our dictionaries’ performance. The scenario 1 is similar to the previous one with TryAdd()
, but here, the value is created only when the key is not already present in the dictionary. Here TryAdd()
cannot be used anymore:
1 2 3 4 5 6 7 |
var key = Guid.NewGuid(); if(!s_Dico1.ContainsKey(key)) { // Create the object to be used as key's value // because the key is not present in the dictionary string val = key.ToString(); dico.Add(key, val); } |
Thanks to the method CollectionMarshall.GetValueRefOrAddDefault()
it is possible to address this scenario with only a single lookup for key
:
1 2 3 4 5 6 7 8 9 10 |
var key = Guid.NewGuid(); ref string pointerToValLocation = ref CollectionsMarshal.GetValueRefOrAddDefault( dico, key, out bool exists)!; if (!exists) { // Create the object to be used as key's value // because the key is not present in the dictionary var val = key.ToString(); pointerToValLocation = val; } |
The screen recording below demonstrates that GetValueRefOrAddDefault()
functions as intended: if the key is absent from the dictionary, it adds the key with a default value (default(string)
which is null
), and then returns a reference to the value’s location within the dictionary’s layout.
You might be intrigued by the !
character at the end of:
1 2 3 |
ref string pointerToValLocation = ref CollectionsMarshal.GetValueRefOrAddDefault( dico, key, out bool exists)!; |
Dictionary<Guid, string>
containing a null
reference for a string value.Benchmarking scenario 1
Here is the benchmark result:
1 2 3 4 |
| Method | Mean | Gen1 | Allocated | |------------------------------- |---------:|-------:|----------:| | CreateValIfKeyNotInDico | 575.5 ns | 0.0076 | 96 B | | CreateValIfKeyNotInDico_Faster | 361.9 ns | 0.0073 | 96 B | |
Here is the benchmark code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
using System.Runtime.InteropServices; using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; BenchmarkRunner.Run<Benchmarks>(); [MemoryDiagnoser] [HideColumns("StdDev", "Median", "Job", "RatioSD", "Error", "Gen0", "Alloc Ratio")] public class Benchmarks { static readonly Dictionary<Guid, string> s_Dico1 = new(); [Benchmark] public void CreateValIfKeyNotInDico() { var key = Guid.NewGuid(); if(!s_Dico1.ContainsKey(key)) { // Create the object to be used as key's value // because the key is not present in the dictionary string val = key.ToString(); s_Dico1.Add(key, val); } } static readonly Dictionary<Guid, string> s_Dico2 = new(); [Benchmark] public void CreateValIfKeyNotInDico_Faster() { var key = Guid.NewGuid(); ref string pointerToValLocation = ref CollectionsMarshal.GetValueRefOrAddDefault(s_Dico2, key, out bool exists); if (!exists) { // Create the object to be used as key's value // because the key is not present in the dictionary var val = key.ToString(); pointerToValLocation = val; } } } |