C# 7.2 introduced the structure System.Span<T>
. First we’ll present a concrete example where Span<T>
helps achieve better performance. Then we’ll explain what makes Span<T>
so special.
Span<T>
primary goal is to avoid allocating new objects on the heap when one needs to work with a contiguous region of arbitrary memory. Performance gain is twofold:
- A) the allocation on heap operation is not performed
- B) less pressure on the Garbage Collector (GC) since it doesn’t need to track non-allocated objects.
Using Span<T> to parse a comma separated uint string
Let’s use Span<T>
to obtain an array of uint
from the string "163,496,691,1729"
.
- Without
Span<T>
one would use"163,496,691,1729"
.Split(',')
. This call allocates four strings and an array to reference these four strings. Thenuint.Parse(string)
is used to parse each sub-string. - With
Span<T>
(actually withReadOnlySpan<char>
because the content of a string is immutable) the input string gets sliced into four spans. BecauseReadOnlySpan<T>
is a structure, each span is concretely a few bytes added on the current thread stack. Stack allocation is super fast and the GC is not impacted by values allocated on the stack. Thenuint.Parse(ReadOnlySpan<char>)
is used to parse each slice.
The method uint.Parse(ReadOnlySpan<char>)
shows that it’s not just about Span<T>
itself: many .NET APIs related to string
and other memory representations have been extended to accept Span<T>
or ReadOnlySpan<T>
.
Here is pseudo code and diagrams that summarizes both approaches:
Benchmarking Span<T> performance gain
Below is the complete code that can be pasted in a C# 6 Program.cs
source file. The NuGet package BenchmarkDotNet needs to be referenced. Here is the github project BenchmarkDotNet. Before digging into Benchmark.NET results, let’s note that:
- A third method
GetUIntArrayWithAstuteParsing()
shows an optimized way to parse"163,496,691,1729"
without the need ofSpan<T>
. - In the real-world the number of
uint
in the comma separated string input wouldn’t be known upfront. AList<uint>
would be used to storeuint
values parsed until we get them all. But here we want to highlight that no allocation is made bySpan<T>
. Thus to avoid cluttering the resultuint[] arrayToFill
is pre-allocated with the proper length.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Order; using BenchmarkDotNet.Running; BenchmarkRunner.Run<UIntParserBenchmarks>(); [RankColumn] [Orderer(SummaryOrderPolicy.FastestToSlowest)] [MemoryDiagnoser] public class UIntParserBenchmarks { // We want to avoid allocating arrays to fill during benchmarks // thus s_NbUInt pre-determines their length const int s_NbUInt = 4; const string s_CommaSeparatedUInt = "163,496,691,1729"; uint[] m_ArrayToFill1 = new uint[s_NbUInt]; [Benchmark(Baseline = true)] public void GetUIntArrayWithSplit() { GetUIntArrayWithStringSplit(s_CommaSeparatedUInt, m_ArrayToFill1); } uint[] m_ArrayToFill2 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithSpan() { GetUIntArrayWithSpan(s_CommaSeparatedUInt, m_ArrayToFill2); } uint[] m_ArrayToFill3 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithAstuteParsing() { GetUIntArrayWithAstuteParsing(s_CommaSeparatedUInt, m_ArrayToFill3); } static uint[] GetUIntArrayWithStringSplit(string commaSeparatedUInt, uint[] arrayToFill){ // Split() allocates an array and 4x strings string[] arrayOfString = commaSeparatedUInt.Split(','); var length = arrayOfString.Length; for (int i = 0; i < length; i++) { arrayToFill[i] = uint.Parse(arrayOfString[i]); } return arrayToFill; } static void GetUIntArrayWithSpan(string commaSeparatedUInt, uint[] arrayToFill) { // View the string as a span, so we can slice it in loop ReadOnlySpan<char> span = commaSeparatedUInt.AsSpan(); int nextCommaIndex = 0; int insertValAtIndex = 0; bool isLastLoop = false; while (!isLastLoop) { int indexStart = nextCommaIndex; nextCommaIndex = commaSeparatedUInt.IndexOf(',', indexStart); isLastLoop = (nextCommaIndex == -1); if (isLastLoop) { nextCommaIndex = commaSeparatedUInt.Length; // Parse last uint } // Get a slice of the string that contains the next uint... ReadOnlySpan<char> slice = span.Slice(indexStart, nextCommaIndex - indexStart); // ... and parse it uint valParsed = uint.Parse(slice); // Then insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = valParsed; insertValAtIndex++; // Skip the comma for next iteration nextCommaIndex++; } } static void GetUIntArrayWithAstuteParsing(string commaSeparatedUInt, uint[] arrayToFill){ var length = commaSeparatedUInt.Length; int insertValAtIndex = 0; int valParsed = 0; // Don't use a uint to avoid casting in astute parsing formula for (int i = 0; i < length; i++) { char @char = commaSeparatedUInt[i]; if (@char != ',') { // Astute Parsing: Modify valParsed from the actual @char valParsed = valParsed * 10 + (@char - '0'); continue; } // A comma is an opportunity to insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = (uint)valParsed; insertValAtIndex++; valParsed = 0; } // Insert last valParsed arrayToFill[insertValAtIndex] = (uint)valParsed; } } |
For each cases, Benchmark.NET measures both memory allocation and durations. Here is how it presents the results:
1 2 3 4 5 |
Method | Mean | Error | StdDev | Rank | Gen 0 | Allocated | ----------------------------- |----------:|---------:|---------:|-----:|-------:|----------:| GetUIntArrayWithAstuteParsing | 18.46 ns | 0.162 ns | 0.151 ns | 1 | - | - | GetUIntArrayWithSpan | 79.99 ns | 1.247 ns | 1.166 ns | 2 | - | - | GetUIntArrayWithSplit | 129.36 ns | 1.464 ns | 1.369 ns | 3 | 0.0293 | 184 B | |
GetUIntArrayWithAstuteParsing()
is the fastest way and doesn’t allocate anything. The performance gain comes from the fact that we wrote our own dedicateduint
parsing implementation. This shows well that despite new goodies in the framework, best performance is often achieve with well though-out algorithm.GetUIntArrayWithSpan()
is 38% faster thanGetUIntArrayWithSplit()
. This is great but the essential saving is that nothing gets allocated. In a real world scenario where this method would be used to parse millions ofuint
values, a lot of GC pressure would be saved.
Span<T> implementation is base on managed pointer!!
Most of Span<T>
articles I read stop here. We now have a new cool way to avoid allocating sub-strings. But the key is that the super-performant implementation of Span<T>
required heavy changes in the runtime. Let’s explain what happened.
The Span<T>
source code (in .NET 7.0 and upper) shows that it contains two fields.
1 2 3 4 5 6 7 8 |
public readonly ref struct Span<T> { //A managed pointer (ref field is a new C#11 feature) internal readonly ref T _reference; //The number of elements this Span contains. private readonly int _length; ... } |
The _length
value is internally multiplied by sizeof(T)
to obtain the end address of the slice. Thus the slice in memory is the range [_reference, _reference + _length*sizeof(T)]
.
_reference
is a managed pointer field (or ref field). The ref field feature is a new feature added in C# 11. Before that the implementation of Span<T>
(in .NET 6.0, 5.0…) used an internal trick to reference a managed pointer through an internal ref struct named ByReference<T>
.
Span<T>
is declared as a ref struct
. A structure marked with ref
, is a special structure that can only live on the thread stack. This way it can hold a managed pointer as a field (ref field explained above). The feature ref struct was released with C# 7.2 just to make the implementation of Span<T>
upon a managed pointer possible. If the .NET team did all this efforts this is because a Span<T>
implementation based on managed pointer has significant advantages:
- Safe: Managed pointers are pointers but they belong to the safe world. There is no need to be within an
unsafe
scope to work withSpan<T>
. - Performance wise:
Span<T>
performance overhead is close to zero because even if they are managed, managed pointers are pointers and thus they have almost no overhead. Managed pointers are managed in the sense: A) the C# compiler refuses code that could lead to a managed pointer pointing to an invalid memory and B) if a managed pointer points to an object on the heap, the runtime takes care of updating such pointer if the GC relocates the pointed object. - Flexibility: Managed pointer can reference any kind of memory (object on heap, unmanaged buffer, value on the stack, field with an object, a slot within an array, a position within a string…). The
Span<T>
implementation benefits from this flexibility that makes its API and implementation concise. Because the memory pointed is typed asref T
, there is no need to bother if it’s a string, a slot of an array or a location in stack. - Thread safe: A fortunate consequence of being stack-only is that a
Span<T>
instance belongs to a single thread. This makesSpan<T>
de-facto thread-safe.
Managed pointer, ref struct
,ref field, extended usage of the keyword ref
, is an interesting topic and I dedicated an entire article to it: Managed pointers, Span<T>, ref struct, C#11 ref fields and the scoped keyword
No restriction with Memory<T>
The struct Memory<T>
is similar to Span<T>
but without the ref struct
restrictions. It can be used as a field of a class for example. As a consequence Memory<T>
doesn’t have this special relation with the GC and is a bit less performant. This performance loss is because its implementation has 3x fields instead of 2x: instead of having a special ByReference<T>
pointer, Memory<T>
needs to reference the _object
and then the _index
in the object.
1 2 3 4 5 6 7 8 |
public readonly struct Memory : IEquatable<Memory> { // NOTE: With the current implementation, Memory and ReadOnlyMemory must have the same layout, // as code uses Unsafe.As to cast between them. private readonly object? _object; private readonly int _index; private readonly int _length; ... } |
I wanted to benchmark the comma separated string code above with Memory<T>
but realized that there is no uint.Parse(Memory<T>)
API which suggests Memory<T>
didn’t get as much love as Span<T>
.
Span<T> and the .NET Framework
Because Span<T>
and ByReference<T>
imply significant updates on the runtime GC, they were not ported to the .NET Framework. They are only available on the .NET Core runtime (.NET 5, .NET 6…) since version 2.1. Here are Microsoft engineers discussions about it: “Fast Span is too fundamental change to be quirklable in reasonable way.”.
However an implementation of Span<T>
exists for .NET Framework, it is referred as slow span. To use it the Nuget package System.Memory must be referenced. This implementation is similar to the Memory<T>
implementation with 3x fields:
1 2 3 4 5 6 |
public readonly ref partial struct Span<T> { private readonly Pinnable<T> _pinnable; private readonly IntPtr _byteOffset; private readonly int _length; ... } |
Also when referencing the System.Memory package from a .NET Framework project you won’t get APIs like uint.Parse(Span<T>)
which makes it less attractive.
Span<T> API
As we saw Span<T>
is especially suitable to improve performance when working with string because it provides a way to manipulate sub-string without any allocation. However Span<T>
is a generic type and can be used on any kind of element like byte
. The complete Span<T> API including extension methods is huge because of all the methods overloaded. Here is a simplified API below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ref struct Span<T> { Span(T[]? array); Span(T[]? array, int startIndex); Span(T[]? array, int startIndex, int length); unsafe Span(void* memory, int length); int Length { get; } ref T this[int index] { get; set; } Span<T> Slice(int start); Span<T> Slice(int start, int length); public T[] ToArray(); void Clear(); void Fill(T value); void CopyTo(Span<T> destination); bool TryCopyTo(Span<T> destination); } |
Notice above the unsafe constructor that takes a void*
pointer. Span can work on any kind of memory including unmanaged memory. It thus represents a simpler way to work with pointers and unmanaged memory.
1 2 3 4 5 |
Span<byte> stackMemory = stackalloc byte[1024]; IntPtr unmanagedHandle = Marshal.AllocHGlobal(1024); Span<byte> unmanaged = new Span<byte>(unmanagedHandle.ToPointer(), 1024); Marshal.FreeHGlobal(unmanagedHandle); |
However what makes Span<T>/ReadOnlySpan<T>
shines is that more than 5.000 methods in the .NET API use it. This is shown by the screenshot below taken from NDepend analyzing the .NET 6 framework in the directory C:\Program Files\dotnet\shared\Microsoft.NETCore.App\6.0.2:
Span<T> vs. Array
At this point one might wonder how Span<T>
differs from array and especially ArraySegment<T>
.
Span<T>
has special relation with GC that makes it more performant thanArraySegment<T>
in stack-only scenarios.ArraySegment<T>
is limited to managed memory while we sawSpan<T>
can handle also unmanaged memory.ArraySegment<T>
doesn’t provide a read-only view whileReadOnlySpan<T>
does.- The confusion between
Span<T>
and array comes from the fact thatSpan<T>
is a view on some data and most of the time this data is represented through an array. So array is still needed,Span<T>
is a just a convenient view on it.
Conclusion
Span<T>/ReadOnlySpan<T>
are special API that required heavy runtime modifications to offer improvements in very high performance critical scenarios. Not everyone needs its power but for those that needs it, it is a game changer.
Excellent article !