Welcome to our exploration of System.Span<T>
and System.ReadOnlySpan<T>
, two powerful structures introduced in C# 7.2 back in 2017. As a type-safe way to access contiguous memory regions, Span<T>
can manage sequences of bytes stored on the heap, the stack, or even in unmanaged memory. This flexibility makes Span<T>
a robust tool in a developer’s arsenal.
Span<T>
in C# is not your everyday structure. It is declared as a ref struct
which means it is restricted to stack allocation only. We will explain how this design choice brings more performance but also more restrictions. For instance, Span<T>
cannot be used as a field in a class or harnessed within asynchronous methods.
In this blog post, we will delve into practical examples and benchmarks to demonstrate how Span<T>
can enhance performance. Additionally, we’ll discuss why Span<T>
often outperforms typical C# code, offering insights into its efficient utilization. Join us as we uncover the potential of Span<T>
streamlining your code and boosting its execution speed.
C# Programming with Span<T>
In this section, we’ll dive into the practical use of Span<T>
in C# programming by exploring a few code samples. This will help us understand how Span<T>
can be employed to enhance code performance through more efficient data manipulation and memory management.
Basic Usage of Span<T>
First, let’s look at a simple example that demonstrates how to initialize and use Span<T>
for basic operations. In this example, Span<int>
is created from an array of integers. We then modify the first element of the Span
, which also modifies the original array, demonstrating the by-reference nature of Span<T>
.
1 2 3 4 5 6 7 8 9 |
int[] numbers = new int[] { 1, 2, 3, 4, 5 }; Span<int> numbersSpan = new Span<int>(numbers); // Modifying through the Span will modify the original array numbersSpan[0] = 99; foreach (var number in numbers) { Console.WriteLine(number); // Output: 99, 2, 3, 4, 5 } |
Slicing with Span<T>
Span<T>
excels in creating slices of data without allocating new memory. Here’s how you can create slices. This example showcases how to slice a Span<byte>
to focus on a specific segment of the array without copying the data, demonstrating the efficiency of Span<T>
.
1 2 3 4 5 6 7 8 9 10 |
byte[] data = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; Span<byte> dataSpan = new Span<byte>(data); // Create a slice of the original Span Span<byte> slice = dataSpan.Slice(3, 5); // Display the contents of the slice foreach (var val in slice) { Console.WriteLine(val); // Output: 3, 4, 5, 6, 7 } |
String and ReadOnlySpan<char>
A ReadOnlySpan<char>
is highly effective for executing read-only, memory-efficient operations on strings in C#. Here’s a straightforward example showing how to manipulate a substring within a string using ReadOnlySpan<char>
, without incurring extra memory allocation. Conversely, using String.Substring(1, 3)
would actually allocate a new string object containing "234"
:
1 2 3 4 5 6 7 8 9 10 11 12 |
string greeting = "123456789"; ReadOnlySpan<char> span = greeting.AsSpan(); // Access a slice of the string, // a bit like SubString() but with no new string allocation ReadOnlySpan<char> subStringSpan = span.Slice(1, 3); // Parse the subString as an UInt without having allocated any new string uint i = uint.Parse(subStringSpan); // Output the slice Console.WriteLine(i); // Output: 234 |
Span<T> APIs
As we’ve seen Span<T>
is particularly effective in improving performance when dealing with strings. This is because it enables the manipulation of substrings without any need for memory allocation. However, Span<T>
is a generic type. It can be used with various data types, such as byte
. The complete Span<T> API including extension methods is huge because of all the overloaded methods. Here is a simplified API below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ref struct Span<T> { Span(T[]? array); Span(T[]? array, int startIndex); Span(T[]? array, int startIndex, int length); unsafe Span(void* memory, int length); int Length { get; } ref T this[int index] { get; set; } Span<T> Slice(int start); Span<T> Slice(int start, int length); public T[] ToArray(); void Clear(); void Fill(T value); void CopyTo(Span<T> destination); bool TryCopyTo(Span<T> destination); } |
Notice above the unsafe constructor that takes a void*
pointer. Span can work on any kind of memory including unmanaged memory. It thus represents a simpler way to work with pointers and unmanaged memory like in this code sample:
1 2 3 4 5 |
Span<byte> stackMemory = stackalloc byte[1024]; IntPtr unmanagedHandle = Marshal.AllocHGlobal(1024); Span<byte> unmanaged = new Span<byte>(unmanagedHandle.ToPointer(), 1024); Marshal.FreeHGlobal(unmanagedHandle); |
Above we were able to call uint i = uint.Parse(subStringSpan);
because a new overload of uint.Parse(ReadOnlySpan<char>)
exists in the .NET Base Class Library (BCL). What truly sets Span<T>
and ReadOnlySpan<T>
apart is their widespread integration into the BCL. This fact is illustrated in the screenshot below. It shows NDepend analyzing the .NET 8 framework in the directory C:\Program Files\dotnet\shared\Microsoft.NETCore.App\8.0.0:
Span<T> vs. Array
At this point, one might wonder how Span<T>
differs from standard arrays and especially the structure ArraySegment<T>
.
Span<T>
has a special relation with GC that makes it more performant thanArraySegment<T>
in stack-only scenarios.ArraySegment<T>
is limited to managed memory while we sawSpan<T>
can handle also unmanaged memory.ArraySegment<T>
doesn’t provide a read-only view whileReadOnlySpan<T>
does.- The confusion between
Span<T>
and array arises from the fact thatSpan<T>
is a view on some data. Most of the time this data is represented through an array. So array is still needed. In this context,Span<T>
is just a convenient view on arrays.
Improving some C# code performance with Span<T>
Now let’s put Span<T>
to work and see how it can significantly boost performance in a practical, real-world scenario.
In this section, we will use Span<T>
to obtain an array of uint
from the string "163,496,691,1729"
.
- Without
Span<T>
one would use"163,496,691,1729".Split(',')
. This call allocates four strings and an array to reference these four strings. Thenuint.Parse(string)
is used to parse each sub-string. - Actually, we will use
ReadOnlySpan<char>
because the content of a string is immutable. - With
ReadOnlySpan<T>
the input string gets sliced into four spans. BecauseReadOnlySpan<T>
is aref struct
, each of its instances occupies only a few bytes located on the current thread stack. Stack allocation is super fast and it does not impact the GC with values allocated on the stack. Thenuint.Parse(ReadOnlySpan<char>)
is used to parse each slice.
Here is a pseudo-code and some diagrams that summarize both approaches:
Benchmarking Span<T> performance gain
Below is the complete code that can be pasted into a C# Program.cs
source file. To run this benchmark you need to reference the NuGet package BenchmarkDotNet. Here is the github project BenchmarkDotNet. Before digging into Benchmark.NET results, let’s note that:
- A third approach with the method
GetUIntArrayWithAstuteParsing()
presents an optimized method for parsing"163,496,691,1729"
without the requirement of usingSpan<T>
. - In the real world, the number of
uint
in the comma-separated string input may not be known in advance. Typically, aList<uint>
would be used to storeuint
values parsed until all of them are obtained. But here we want to demonstrate that no allocation is made bySpan<T>
. Thus to avoid cluttering the performance result,uint[] arrayToFill
is pre-allocated with the proper length.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Order; using BenchmarkDotNet.Running; BenchmarkRunner.Run<UIntParserBenchmarks>(); [RankColumn] [Orderer(SummaryOrderPolicy.FastestToSlowest)] [MemoryDiagnoser] public class UIntParserBenchmarks { // We want to avoid allocating arrays to fill during benchmarks // thus s_NbUInt pre-determines their length const int s_NbUInt = 4; const string s_CommaSeparatedUInt = "163,496,691,1729"; uint[] m_ArrayToFill1 = new uint[s_NbUInt]; [Benchmark(Baseline = true)] public void GetUIntArrayWithSplit() { GetUIntArrayWithStringSplit(s_CommaSeparatedUInt, m_ArrayToFill1); } uint[] m_ArrayToFill2 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithSpan() { GetUIntArrayWithSpan(s_CommaSeparatedUInt, m_ArrayToFill2); } uint[] m_ArrayToFill3 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithAstuteParsing() { GetUIntArrayWithAstuteParsing(s_CommaSeparatedUInt, m_ArrayToFill3); } static uint[] GetUIntArrayWithStringSplit(string commaSeparatedUInt, uint[] arrayToFill){ // Split() allocates an array and 4x strings string[] arrayOfString = commaSeparatedUInt.Split(','); var length = arrayOfString.Length; for (int i = 0; i < length; i++) { arrayToFill[i] = uint.Parse(arrayOfString[i]); } return arrayToFill; } static void GetUIntArrayWithSpan(string commaSeparatedUInt, uint[] arrayToFill) { // View the string as a span, so we can slice it in loop ReadOnlySpan<char> span = commaSeparatedUInt.AsSpan(); int nextCommaIndex = 0; int insertValAtIndex = 0; bool isLastLoop = false; while (!isLastLoop) { int indexStart = nextCommaIndex; nextCommaIndex = commaSeparatedUInt.IndexOf(',', indexStart); isLastLoop = (nextCommaIndex == -1); if (isLastLoop) { nextCommaIndex = commaSeparatedUInt.Length; // Parse last uint } // Get a slice of the string that contains the next uint... ReadOnlySpan<char> slice = span.Slice(indexStart, nextCommaIndex - indexStart); // ... and parse it uint valParsed = uint.Parse(slice); // Then insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = valParsed; insertValAtIndex++; // Skip the comma for next iteration nextCommaIndex++; } } static void GetUIntArrayWithAstuteParsing(string commaSeparatedUInt, uint[] arrayToFill){ var length = commaSeparatedUInt.Length; int insertValAtIndex = 0; int valParsed = 0; // Don't use a uint to avoid casting in astute parsing formula for (int i = 0; i < length; i++) { char @char = commaSeparatedUInt[i]; if (@char != ',') { // Astute Parsing: Modify valParsed from the actual @char valParsed = valParsed * 10 + (@char - '0'); continue; } // A comma is an opportunity to insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = (uint)valParsed; insertValAtIndex++; valParsed = 0; } // Insert last valParsed arrayToFill[insertValAtIndex] = (uint)valParsed; } } |
For each case, Benchmark.NET measures both memory allocation and duration. Here is how it presents the results:
1 2 3 4 5 |
Method | Mean | Error | StdDev | Rank | Gen 0 | Allocated | ----------------------------- |----------:|---------:|---------:|-----:|-------:|----------:| GetUIntArrayWithAstuteParsing | 18.46 ns | 0.162 ns | 0.151 ns | 1 | - | - | GetUIntArrayWithSpan | 79.99 ns | 1.247 ns | 1.166 ns | 2 | - | - | GetUIntArrayWithSplit | 129.36 ns | 1.464 ns | 1.369 ns | 3 | 0.0293 | 184 B | |
GetUIntArrayWithAstuteParsing()
is the fastest way and doesn’t allocate anything. The performance gain comes from the fact that we wrote our own dedicateduint
parsing implementation. This clearly illustrates that, despite the presence of new features in the framework, the best performance often results from well-thought-out algorithms.GetUIntArrayWithSpan()
is 38% faster thanGetUIntArrayWithSplit()
. This is already a significant win. However, the core of performance gain is that there is no heap allocation. In a real-world scenario where this method would be used to parse millions ofuint
values, a lot of GC pressure would be saved.
Explanations About the Magic Behind Span<T> Implementation
Many articles discussing Span<T>
tend to conclude at this point. We’ve introduced an efficient approach to sidestep the need for allocating sub-strings. However, the critical aspect lies in the substantial runtime modifications necessary to achieve this performant implementation of Span<T>
. Let’s explain what happened.
The Span<T>
source code shows that it contains two fields.
1 2 3 4 5 6 7 8 |
public readonly ref struct Span<T> { //A managed pointer (ref field is a new C#11 feature) internal readonly ref T _reference; //The number of elements this Span contains. private readonly int _length; ... } |
The _length
value is internally multiplied by sizeof(T)
to obtain the offset address of the slice. Thus the slice in memory is the range [_reference, _reference + _length*sizeof(T)]
.
_reference
is a managed pointer field (or ref field). The ref
field feature is a new feature added in C# 11 and .NET 7.0. Before that, the implementation of Span<T>
(in .NET 6.0 and before…) used an internal trick to reference a managed pointer through an internal ref struct
struct named ByReference<T>
.
Span<T>
is declared as a ref struct
. A structure marked with ref
, is a special structure that can only be allocated on the thread stack. This way it can hold a managed pointer as a field (ref field explained above).
The advantages of managed pointers
ref struct
was released with C# 7.2 just to make the implementation of Span<T>
through a managed pointer possible. If the .NET team achieved all these efforts this is because the Span<T>
implementation being based on managed pointer has significant advantages:
- Safe: Managed pointers are pointers but they belong to the safe world. There is no need to declare an
unsafe
scope to work withSpan<T>
. - Performance wise: The performance overhead of
Span<T>
is nearly negligible. This is because managed pointers, even though they are managed, are essentially regular pointers. Consequently, they incur minimal overhead. The management of these pointers includes two key aspects:- A) the C# compiler refuses code that could lead to a managed pointer pointing to an invalid memory and
- B) if a managed pointer points to an object on the heap, the runtime automatically handles the updating of such pointers in the event of the GC relocating the referenced object
- Flexibility: A managed pointer can point to various types of memory, including objects on the heap, unmanaged buffer, value on the stack, field within an object, a slot within an array, or a position within a string. The
Span<T>
implementation benefits from this flexibility making its API and implementation concise. Because the memory pointed is typed asref T
, there is no need to bother if it’s a string, a slot of an array or a location on the stack. - Thread safe: A fortunate consequence of being stack-only is that a
Span<T>
instance belongs to a single thread. This makesSpan<T>
de-facto thread-safe.
Managed pointer, ref struct
, ref field, extended usage of the keyword ref
, is an interesting topic and we dedicated an entire article to it: Managed pointers, Span<T>, ref struct, C#11 ref fields and the scoped keyword
No stack-only restriction with Memory<T>
The structures System.Memory<T>
and System.ReadOnlyMemory<T>
were introduced alongside System.Span<T>
and System.ReadOnlySpan<T>
in the same release.
Memory<T>
shares similarities with Span<T>
but it is a regular structure. It doesn’t have the ref struct
stack-only restrictions. This makes it suitable for use as a field in a class, for instance. However, this lack of constraint also means Memory<T>
doesn’t have this special relation with the GC. Consequently, it is slightly less performant. This performance loss arises from the fact that its implementation has 3x fields instead of 2x: instead of having a special ref
pointer, Memory<T>
needs to reference both the _object
and then the _index
in the object.
1 2 3 4 5 6 7 8 |
public readonly struct Memory : IEquatable<Memory> { // NOTE: With the current implementation, Memory and ReadOnlyMemory must have the same layout, // as code uses Unsafe.As to cast between them. private readonly object? _object; private readonly int _index; private readonly int _length; ... } |
I wanted to benchmark the comma-separated string code above with Memory<T>
.Then I realized that there is no uint.Parse(Memory<T>)
API which suggests Memory<T>
didn’t get as much love as Span<T>
.
Span<T> and the .NET Framework
Because Span<T>
and ref
fields imply significant updates on the runtime GC, they were not ported to the .NET Framework. They are only available on the .NET Core runtime (.NET 7, .NET 8…) since version 2.1. Here is a Microsoft engineers discussion about it: “Fast Span is too fundamental change to be quirklable in reasonable way.”.
However the implementation of Span<T>
exists for .NET Framework. It is referred to as slow span. To use it, reference the Nuget package System.Memory from your .NET Framework project. This implementation is similar to the Memory<T>
implementation with 3x fields:
1 2 3 4 5 6 |
public readonly ref partial struct Span<T> { private readonly Pinnable<T> _pinnable; private readonly IntPtr _byteOffset; private readonly int _length; ... } |
Also when referencing the System.Memory package from a .NET Framework project you won’t get APIs similar to uint.Parse(Span<T>)
which makes it less attractive.
Conclusion
In this article, we’ve explored the innovative Span<T>
and ReadOnlySpan<T>
structures and their applications in refining code for peak performance.
Span<T>
and ReadOnlySpan<T>
hold a unique and significant place within the .NET Base Class Library. These types required substantial runtime modifications to deliver performance enhancements in extremely high-performance, critical scenarios. While not everyone may require their capabilities, for those who do, they can be a game-changing tool.
Excellent article !