C# 7.2 introduced the structure System.Span<T>
. First we’ll present a concrete example where Span<T>
helps achieve better performance. Then we’ll explain what makes Span<T>
so special.
Span<T>
primary goal is to avoid allocating new objects on the heap when one needs to work with a contiguous region of arbitrary memory. Performance gain is twofold:
- A) the allocation on heap operation is not performed
- B) less pressure on the Garbage Collector (GC) since it doesn’t need to track non-allocated objects.
Using Span<T> to parse a comma separated uint string
Let’s use Span<T>
to obtain an array of uint
from the string "163,496,691,1729"
.
- Without
Span<T>
one would use"163,496,691,1729"
.Split(',')
. This call allocates four strings and an array to reference these four strings. Thenuint.Parse(string)
is used to parse each sub-string. - With
Span<T>
(actually withReadOnlySpan<char>
because the content of a string is immutable) the input string gets sliced into four spans. BecauseReadOnlySpan<T>
is a structure, each span is concretely a few bytes added on the current thread stack. Stack allocation is super fast and the GC is not impacted by values allocated on the stack. Thenuint.Parse(ReadOnlySpan<char>)
is used to parse each slice.
The method uint.Parse(ReadOnlySpan<char>)
shows that it’s not just about Span<T>
itself: many .NET APIs related to string
and other memory representations have been extended to accept Span<T>
or ReadOnlySpan<T>
.
Here is pseudo code and diagrams that summarizes both approaches:
Benchmarking Span<T> performance gain
Below is the complete code that can be pasted in a C# 6 Program.cs
source file. The NuGet package BenchmarkDotNet needs to be referenced. Here is the github project BenchmarkDotNet. Before digging into Benchmark.NET results, let’s note that:
- A third method
GetUIntArrayWithAstuteParsing()
shows an optimized way to parse"163,496,691,1729"
without the need ofSpan<T>
. - In the real-world the number of
uint
in the comma separated string input wouldn’t be known upfront. AList<uint>
would be used to storeuint
values parsed until we get them all. But here we want to highlight that no allocation is made bySpan<T>
. Thus to avoid cluttering the resultuint[] arrayToFill
is pre-allocated with the proper length.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Order; using BenchmarkDotNet.Running; BenchmarkRunner.Run<UIntParserBenchmarks>(); [RankColumn] [Orderer(SummaryOrderPolicy.FastestToSlowest)] [MemoryDiagnoser] public class UIntParserBenchmarks { // We want to avoid allocating arrays to fill during benchmarks // thus s_NbUInt pre-determines their length const int s_NbUInt = 4; const string s_CommaSeparatedUInt = "163,496,691,1729"; uint[] m_ArrayToFill1 = new uint[s_NbUInt]; [Benchmark(Baseline = true)] public void GetUIntArrayWithSplit() { GetUIntArrayWithStringSplit(s_CommaSeparatedUInt, m_ArrayToFill1); } uint[] m_ArrayToFill2 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithSpan() { GetUIntArrayWithSpan(s_CommaSeparatedUInt, m_ArrayToFill2); } uint[] m_ArrayToFill3 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithAstuteParsing() { GetUIntArrayWithAstuteParsing(s_CommaSeparatedUInt, m_ArrayToFill3); } static uint[] GetUIntArrayWithStringSplit(string commaSeparatedUInt, uint[] arrayToFill){ // Split() allocates an array and 4x strings string[] arrayOfString = commaSeparatedUInt.Split(','); var length = arrayOfString.Length; for (int i = 0; i < length; i++) { arrayToFill[i] = uint.Parse(arrayOfString[i]); } return arrayToFill; } static void GetUIntArrayWithSpan(string commaSeparatedUInt, uint[] arrayToFill) { // View the string as a span, so we can slice it in loop ReadOnlySpan<char> span = commaSeparatedUInt.AsSpan(); int nextCommaIndex = 0; int insertValAtIndex = 0; bool isLastLoop = false; while (!isLastLoop) { int indexStart = nextCommaIndex; nextCommaIndex = commaSeparatedUInt.IndexOf(',', indexStart); isLastLoop = (nextCommaIndex == -1); if (isLastLoop) { nextCommaIndex = commaSeparatedUInt.Length; // Parse last uint } // Get a slice of the string that contains the next uint... ReadOnlySpan<char> slice = span.Slice(indexStart, nextCommaIndex - indexStart); // ... and parse it uint valParsed = uint.Parse(slice); // Then insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = valParsed; insertValAtIndex++; // Skip the comma for next iteration nextCommaIndex++; } } static void GetUIntArrayWithAstuteParsing(string commaSeparatedUInt, uint[] arrayToFill){ var length = commaSeparatedUInt.Length; int insertValAtIndex = 0; int valParsed = 0; // Don't use a uint to avoid casting in astute parsing formula for (int i = 0; i < length; i++) { char @char = commaSeparatedUInt[i]; if (@char != ',') { // Astute Parsing: Modify valParsed from the actual @char valParsed = valParsed * 10 + (@char - '0'); continue; } // A comma is an opportunity to insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = (uint)valParsed; insertValAtIndex++; valParsed = 0; } // Insert last valParsed arrayToFill[insertValAtIndex] = (uint)valParsed; } } |
For each cases, Benchmark.NET measures both memory allocation and durations. Here is how it presents the results:
1 2 3 4 5 |
Method | Mean | Error | StdDev | Rank | Gen 0 | Allocated | ----------------------------- |----------:|---------:|---------:|-----:|-------:|----------:| GetUIntArrayWithAstuteParsing | 18.46 ns | 0.162 ns | 0.151 ns | 1 | - | - | GetUIntArrayWithSpan | 79.99 ns | 1.247 ns | 1.166 ns | 2 | - | - | GetUIntArrayWithSplit | 129.36 ns | 1.464 ns | 1.369 ns | 3 | 0.0293 | 184 B | |
GetUIntArrayWithAstuteParsing()
is the fastest way and doesn’t allocate anything. The performance gain comes from the fact that we wrote our own dedicateduint
parsing implementation. This shows well that despite new goodies in the framework, best performance is often achieve with well though-out algorithm.GetUIntArrayWithSpan()
is 38% faster thanGetUIntArrayWithSplit()
. This is great but the essential saving is that nothing gets allocated. In a real world scenario where this method would be used to parse millions ofuint
values, a lot of GC pressure would be saved.
Understand what makes Span<T> special
Most of Span<T>
articles I read stop here. We now have a new cool way to avoid allocating sub-strings.
But the key is that the super-performant implementation of Span<T>
required heavy changes in the runtime. Let’s explain what happened.
Span<T> has a special relation with the GC
The Span<T>
source code shows that it contains two fields.
1 2 3 4 5 6 7 8 |
public readonly ref struct Span<T> { //A byref or a native ptr. internal readonly ByReference<T> _pointer; //The number of elements this Span contains. private readonly int _length; ... } |
The _length
value is internally multiplied by sizeof(T)
to obtain the end address of the slice. Thus the slice in memory is the range [_pointer, _pointer + _length*sizeof(T)]
.
_pointer
is typed with the special structure ByReference<T>
. This structure is special because it represents a pointer carefully handled by the GC. In our scenario _pointer
points to an offset position within a string object. When compacting memory the GC might move the string at a different address and the GC would then translate _pointer
properly. ByReference<T>
is not a public API. Because ByReference<T>
and Span<T>
have special relations with the GC they cannot live on the heap managed by the GC. They are stack-only values, they can only live on the thread stack. A Span<T>
can be passed as method parameter and can be returned by a method but it cannot be a field of an object for example.
A fortunate consequence of being stack-only is that a Span<T>
instance belongs to a single thread. This makes Span<T>
de-facto thread-safe.
ref struct restrictions
As we saw a whole range of restrictions applies to Span<T>
to guarantee it’ll never go to the heap. This is why it is declared as a ref struct
and not as a struct
. ref struct
is here to tell the compiler to applies restrictions.
Here are more restrictions that prevent that a ref struct
value ends up on the heap at runtime:
- A ref struct can’t be the element type of an array.
- A ref struct can’t be a declared type of a field of a class or a struct. However a ref-struct can type a field of a ref-struct. This is illustrated by the field
ByReference<T> _pointer
declared withinSpan<T>
. - A ref struct can’t implement interfaces.
- A ref struct can’t be boxed to
System.ValueType
orSystem.Object
. - A ref struct can’t be a type argument.
- A ref struct variable can’t be captured by a lambda expression or a local function.
- A ref struct variable can’t be used in an async method. However, you can use ref struct variables in synchronous methods, for example, in those that return
Task
orTask<TResult>
. - A ref struct variable can’t be used in iterators.
Interestingly enough, I learned about ref struct
peculiarities when some NDepend users got false positive on the rule Don’t use obsolete types, methods or fields. The compiler tags ref struct
with ObsoleteAttribute
to prevent them being used by older versions of C# that don’t know about the stack-only restrictions of ref struct
. Hence the false positive when using ref struct
that is fixed for the next version
No restriction with Memory<T>
The struct Memory<T>
is similar to Span<T>
but without the ref struct
restrictions. It can be used as a field of a class for example. As a consequence Memory<T>
doesn’t have this special relation with the GC and is a bit less performant. This performance loss is because its implementation has 3x fields instead of 2x: instead of having a special ByReference<T>
pointer, Memory<T>
needs to reference the _object
and then the _index
in the object.
1 2 3 4 5 6 7 8 |
public readonly struct Memory : IEquatable<Memory> { // NOTE: With the current implementation, Memory and ReadOnlyMemory must have the same layout, // as code uses Unsafe.As to cast between them. private readonly object? _object; private readonly int _index; private readonly int _length; ... } |
I wanted to benchmark the comma separated string code above with Memory<T>
but realized that there is no uint.Parse(Memory<T>)
API which suggests Memory<T>
didn’t get as much love as Span<T>
.
Span<T> and the .NET Framework
Because Span<T>
and ByReference<T>
imply significant updates on the runtime GC, they were not ported to the .NET Framework. They are only available on the .NET Core runtime (.NET 5, .NET 6…) since version 2.1. Here are Microsoft engineers discussions about it: “Fast Span is too fundamental change to be quirklable in reasonable way.”.
However an implementation of Span<T>
exists for .NET Framework, it is referred as slow span. To use it the Nuget package System.Memory must be referenced. This implementation is similar to the Memory<T>
implementation with 3x fields:
1 2 3 4 5 6 |
public readonly ref partial struct Span<T> { private readonly Pinnable<T> _pinnable; private readonly IntPtr _byteOffset; private readonly int _length; ... } |
Also when referencing the System.Memory package from a .NET Framework project you won’t get APIs like uint.Parse(Span<T>)
which makes it less attractive.
Span<T> API
As we saw Span<T>
is especially suitable to improve performance when working with string because it provides a way to manipulate sub-string without any allocation. However Span<T>
is a generic type and can be used on any kind of element like byte
. The complete Span<T> API including extension methods is huge because of all the methods overloaded. Here is a simplified API below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ref struct Span<T> { Span(T[]? array); Span(T[]? array, int startIndex); Span(T[]? array, int startIndex, int length); unsafe Span(void* memory, int length); int Length { get; } ref T this[int index] { get; set; } Span<T> Slice(int start); Span<T> Slice(int start, int length); public T[] ToArray(); void Clear(); void Fill(T value); void CopyTo(Span<T> destination); bool TryCopyTo(Span<T> destination); } |
Notice above the unsafe constructor that takes a void*
pointer. Span can work on any kind of memory including unmanaged memory. It thus represents a simpler way to work with pointers and unmanaged memory.
1 2 3 4 5 |
Span<byte> stackMemory = stackalloc byte[1024]; IntPtr unmanagedHandle = Marshal.AllocHGlobal(1024); Span<byte> unmanaged = new Span<byte>(unmanagedHandle.ToPointer(), 1024); Marshal.FreeHGlobal(unmanagedHandle); |
However what makes Span<T>/ReadOnlySpan<T>
shines is that more than 5.000 methods in the .NET API use it. This is shown by the screenshot below taken from NDepend analyzing the .NET 6 framework in the directory C:\Program Files\dotnet\shared\Microsoft.NETCore.App\6.0.2:
Span<T> vs. Array
At this point one might wonder how Span<T>
differs from array and especially ArraySegment<T>
.
Span<T>
has special relation with GC that makes it more performant thanArraySegment<T>
in stack-only scenarios.ArraySegment<T>
is limited to managed memory while we sawSpan<T>
can handle also unmanaged memory.ArraySegment<T>
doesn’t provide a read-only view whileReadOnlySpan<T>
does.- The confusion between
Span<T>
and array comes from the fact thatSpan<T>
is a view on some data and most of the time this data is represented through an array. So array is still needed,Span<T>
is a just a convenient view on it.
Conclusion
Span<T>/ReadOnlySpan<T>
are special API that required heavy runtime modifications to offer improvements in very high performance critical scenarios. Not everyone needs its power but for those that needs it, it is a game changer.
Excellent article !