Welcome to our deep dive into System.Span<T>
and System.ReadOnlySpan<T>
. Introduced in C# 7.2 (2017) and supported by the .NET Core runtime, these powerful structures provide a type-safe way to work with contiguous memory regions. Span<T>
offers the flexibility to manage sequences of bytes across the heap, stack, and even unmanaged memory, making it an essential tool for high-performance applications.
This post explores practical examples and benchmarks to showcase how Span<T>
enhances performance. We’ll also explain why it often outperforms typical C# code and how to use it efficiently. Join us to unlock its potential for faster, streamlined code!
Understanding Span<T>
Here is how Span<T>
is declared:
1 2 3 4 5 |
public readonly ref struct Span<T> { private readonly ref T _pointer; private readonly int _length; // ... } |
Let’s notice that Span<T>
in C# is not just any structure. Declared as a ref struct
, it is restricted to stack allocation, enhancing performance while imposing limitations. This design choice prevents Span<T>
from being used as a class field or within asynchronous methods.
The ref
field allows passing values by reference, like a C pointer, creating a ref T
on the stack. This makes operations as efficient as arrays since indexing a span doesn’t require extra computations—it inherently tracks the pointer and offset.
Spans are merely views into existing memory, not a way to allocate it. Span<T>
allows read-write access, while ReadOnlySpan<T>
is read-only. Multiple spans on the same array create separate views of the same memory.
C# Programming with Span<T>
In this section, we’ll dive into the practical use of Span<T>
in C# programming by exploring a few code samples. This will help us understand how Span<T>
can be employed to enhance code performance through more efficient data manipulation and memory management.
Basic Usage of Span<T>
Let’s look at a simple example that demonstrates how to initialize and use Span<T>
for basic operations. In this example, Span<int>
is created from an array of integers. We then modify the first element of the Span
, which also modifies the original array, demonstrating the by-reference nature of Span<T>
.
1 2 3 4 5 6 7 8 9 |
int[] numbers = new int[] { 1, 2, 3, 4, 5 }; Span<int> numbersSpan = new Span<int>(numbers); // Modifying through the Span will modify the original array numbersSpan[0] = 99; foreach (var number in numbers) { Console.WriteLine(number); // Output: 99, 2, 3, 4, 5 } |
Slicing with Span<T>
Span<T>
excels in creating slices of T data without allocating new memory.
Here’s how you can create slices. This example showcases how to slice a Span<byte>
to focus on a specific segment of the array without copying the data, demonstrating the efficiency of Span<T>
.
1 2 3 4 5 6 7 8 9 10 |
byte[] data = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; Span<byte> dataSpan = new Span<byte>(data); // Create a slice of the original Span Span<byte> slice = dataSpan.Slice(3, 5); // Display the contents of the slice foreach (var val in slice) { Console.WriteLine(val); // Output: 3, 4, 5, 6, 7 } |
String and ReadOnlySpan<char>
ReadOnlySpan<char>
enables efficient, read-only string operations in C# without extra memory allocation. Here’s a simple example of extracting a substring using ReadOnlySpan<char>
. In contrast, String.Substring(1, 3)
would allocate a new string object containing "234"
:
1 2 3 4 5 6 7 8 9 10 11 12 |
string greeting = "123456789"; ReadOnlySpan<char> span = greeting.AsSpan(); // Access a slice of the string, // a bit like SubString() but with no new string allocation ReadOnlySpan<char> subStringSpan = span.Slice(1, 3); // Parse the subString as an UInt without having allocated any new string uint i = uint.Parse(subStringSpan); // Output the slice Console.WriteLine(i); // Output: 234 |
Span<T> APIs
We’ve seen that Span<T>
excels at optimizing string operations by enabling substring manipulation without memory allocation. However, as a generic type, it works with various data types, including byte
. The complete Span<T> API including extension methods is extensive, with many overloaded methods. Here’s a simplified version:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ref struct Span<T> { Span(T[]? array); Span(T[]? array, int startIndex); Span(T[]? array, int startIndex, int length); unsafe Span(void* memory, int length); int Length { get; } ref T this[int index] { get; set; } Span<T> Slice(int start); Span<T> Slice(int start, int length); public T[] ToArray(); void Clear(); void Fill(T value); void CopyTo(Span<T> destination); bool TryCopyTo(Span<T> destination); } |
Notice above the unsafe constructor that takes a void*
pointer. Span can work on any kind of memory including unmanaged memory. It thus represents a simpler way to work with pointers and unmanaged memory like in this code sample:
1 2 3 4 5 |
Span<byte> stackMemory = stackalloc byte[1024]; IntPtr unmanagedHandle = Marshal.AllocHGlobal(1024); Span<byte> unmanaged = new Span<byte>(unmanagedHandle.ToPointer(), 1024); Marshal.FreeHGlobal(unmanagedHandle); |
Above we were able to call uint i = uint.Parse(subStringSpan);
because a new overload of uint.Parse(ReadOnlySpan<char>)
exists in the .NET Base Class Library (BCL). What truly sets Span<T>
and ReadOnlySpan<T>
apart is their widespread integration into the BCL. This fact is illustrated in the screenshot below. It shows NDepend analyzing the .NET 9 framework in the directory C:\Program Files\dotnet\shared\Microsoft.NETCore.App\9.0.0:
Span<T> vs. Array
How does Span<T>
differ from standard arrays and ArraySegment<T>
?
-
Span<T>
interacts with the GC differently, making it more efficient in stack-only scenarios. -
Unlike
ArraySegment<T>
,Span<T>
supports both managed and unmanaged memory. -
ArraySegment<T>
lacks a read-only equivalent, whileReadOnlySpan<T>
provides one.
The confusion arises because Span<T>
is just a view on data, often represented by an array. While arrays remain essential, Span<T>
offers a more flexible way to work with them.
Improving some C# code performance with Span<T>
Now let’s put Span<T>
to work and see how it can significantly boost performance in a practical, real-world scenario.
In this section, we will use Span<T>
to obtain an array of uint
from the string "163,496,691,1729"
.
- Without
Span<T>
one would use"163,496,691,1729".Split(',')
. This call allocates four strings and an array to reference these four strings. Thenuint.Parse(string)
is used to parse each sub-string. - Actually, we will use
ReadOnlySpan<char>
because the content of a string is immutable. - With
ReadOnlySpan<T>
the input string gets sliced into four spans. BecauseReadOnlySpan<T>
is aref struct
, each of its instances occupies only a few bytes located on the current thread stack. Stack allocation is super fast and it does not impact the GC with values allocated on the stack. Thenuint.Parse(ReadOnlySpan<char>)
is used to parse each slice.
Here is a pseudo-code and some diagrams that summarize both approaches:
Benchmarking Span<T> performance gain
Below is the complete code that can be pasted into a C# Program.cs
source file. To run this benchmark you need to reference the NuGet package BenchmarkDotNet. Here is the github project BenchmarkDotNet. Before digging into Benchmark.NET results, let’s note that:
- A third approach with the method
GetUIntArrayWithAstuteParsing()
presents an optimized method for parsing"163,496,691,1729"
without the requirement of usingSpan<T>
. - In the real world, the number of
uint
in the comma-separated string input may not be known in advance. Typically, aList<uint>
would be used to storeuint
values parsed until all of them are obtained. But here we want to demonstrate that no allocation is made bySpan<T>
. Thus to avoid cluttering the performance result,uint[] arrayToFill
is pre-allocated with the proper length.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Order; using BenchmarkDotNet.Running; BenchmarkRunner.Run<UIntParserBenchmarks>(); [RankColumn] [Orderer(SummaryOrderPolicy.FastestToSlowest)] [MemoryDiagnoser] public class UIntParserBenchmarks { // We want to avoid allocating arrays to fill during benchmarks // thus s_NbUInt pre-determines their length const int s_NbUInt = 4; const string s_CommaSeparatedUInt = "163,496,691,1729"; uint[] m_ArrayToFill1 = new uint[s_NbUInt]; [Benchmark(Baseline = true)] public void GetUIntArrayWithSplit() { GetUIntArrayWithStringSplit(s_CommaSeparatedUInt, m_ArrayToFill1); } uint[] m_ArrayToFill2 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithSpan() { GetUIntArrayWithSpan(s_CommaSeparatedUInt, m_ArrayToFill2); } uint[] m_ArrayToFill3 = new uint[s_NbUInt]; [Benchmark] public void GetUIntArrayWithAstuteParsing() { GetUIntArrayWithAstuteParsing(s_CommaSeparatedUInt, m_ArrayToFill3); } static uint[] GetUIntArrayWithStringSplit(string commaSeparatedUInt, uint[] arrayToFill){ // Split() allocates an array and 4x strings string[] arrayOfString = commaSeparatedUInt.Split(','); var length = arrayOfString.Length; for (int i = 0; i < length; i++) { arrayToFill[i] = uint.Parse(arrayOfString[i]); } return arrayToFill; } static void GetUIntArrayWithSpan(string commaSeparatedUInt, uint[] arrayToFill) { // View the string as a span, so we can slice it in loop ReadOnlySpan<char> span = commaSeparatedUInt.AsSpan(); int nextCommaIndex = 0; int insertValAtIndex = 0; bool isLastLoop = false; while (!isLastLoop) { int indexStart = nextCommaIndex; nextCommaIndex = commaSeparatedUInt.IndexOf(',', indexStart); isLastLoop = (nextCommaIndex == -1); if (isLastLoop) { nextCommaIndex = commaSeparatedUInt.Length; // Parse last uint } // Get a slice of the string that contains the next uint... ReadOnlySpan<char> slice = span.Slice(indexStart, nextCommaIndex - indexStart); // ... and parse it uint valParsed = uint.Parse(slice); // Then insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = valParsed; insertValAtIndex++; // Skip the comma for next iteration nextCommaIndex++; } } static void GetUIntArrayWithAstuteParsing(string commaSeparatedUInt, uint[] arrayToFill){ var length = commaSeparatedUInt.Length; int insertValAtIndex = 0; int valParsed = 0; // Don't use a uint to avoid casting in astute parsing formula for (int i = 0; i < length; i++) { char @char = commaSeparatedUInt[i]; if (@char != ',') { // Astute Parsing: Modify valParsed from the actual @char valParsed = valParsed * 10 + (@char - '0'); continue; } // A comma is an opportunity to insert valParsed in arrayToFill arrayToFill[insertValAtIndex] = (uint)valParsed; insertValAtIndex++; valParsed = 0; } // Insert last valParsed arrayToFill[insertValAtIndex] = (uint)valParsed; } } |
For each case, Benchmark.NET measures both memory allocation and duration. Here is how it presents the results:
1 2 3 4 5 |
Method | Mean | Error | StdDev | Rank | Gen 0 | Allocated | ----------------------------- |----------:|---------:|---------:|-----:|-------:|----------:| GetUIntArrayWithAstuteParsing | 18.46 ns | 0.162 ns | 0.151 ns | 1 | - | - | GetUIntArrayWithSpan | 79.99 ns | 1.247 ns | 1.166 ns | 2 | - | - | GetUIntArrayWithSplit | 129.36 ns | 1.464 ns | 1.369 ns | 3 | 0.0293 | 184 B | |
GetUIntArrayWithAstuteParsing()
is the fastest way and doesn’t allocate anything. The performance gain comes from the fact that we wrote our own dedicateduint
parsing implementation. This clearly illustrates that, despite the presence of new features in the framework, the best performance often results from well-thought-out algorithms.GetUIntArrayWithSpan()
is 38% faster thanGetUIntArrayWithSplit()
. This is already a significant win. However, the core of performance gain is that there is no heap allocation. In a real-world scenario where this method would be used to parse millions ofuint
values, a lot of GC pressure would be saved.
Explanations About the Magic Behind Span<T> Implementation
Many articles discussing Span<T>
tend to conclude at this point. We’ve introduced an efficient approach to sidestep the need for allocating sub-strings. However, the critical aspect lies in the substantial runtime modifications necessary to achieve this performant implementation of Span<T>
. Let’s explain what happened.
The Span<T>
source code shows that it contains two fields.
1 2 3 4 5 6 7 8 |
public readonly ref struct Span<T> { //A managed pointer (ref field is a new C#11 feature) internal readonly ref T _reference; //The number of elements this Span contains. private readonly int _length; ... } |
The _length
value is internally multiplied by sizeof(T)
to obtain the offset address of the slice. Thus the slice in memory is the range [_reference, _reference + _length*sizeof(T)]
.
_reference
is a managed pointer field (or ref field). The ref
field feature is a new feature added in C# 11 and .NET 7.0. Before that, the implementation of Span<T>
(in .NET 6.0 and before…) used an internal trick to reference a managed pointer through an internal ref struct
struct named ByReference<T>
.
Span<T>
is declared as a ref struct
. A structure marked with ref
, is a special structure that can only be allocated on the thread stack. This way it can hold a managed pointer as a field (ref field explained above).
The advantages of managed pointers
ref struct
was released with C# 7.2 just to make the implementation of Span<T>
through a managed pointer possible. If the .NET team achieved all these efforts this is because the Span<T>
implementation being based on managed pointer has significant advantages:
- Safe: Managed pointers are pointers but they belong to the safe world. There is no need to declare an
unsafe
scope to work withSpan<T>
. - Performance wise: The performance overhead of
Span<T>
is nearly negligible. This is because managed pointers, even though they are managed, are essentially regular pointers. Consequently, they incur minimal overhead. The management of these pointers includes two key aspects:- A) the C# compiler refuses code that could lead to a managed pointer pointing to an invalid memory and
- B) if a managed pointer points to an object on the heap, the runtime automatically handles the updating of such pointers in the event of the GC relocating the referenced object
- Flexibility: A managed pointer can point to various types of memory, including objects on the heap, unmanaged buffer, value on the stack, field within an object, a slot within an array, or a position within a string. The
Span<T>
implementation benefits from this flexibility making its API and implementation concise. Because the memory pointed is typed asref T
, there is no need to bother if it’s a string, a slot of an array or a location on the stack. - Thread safe: A fortunate consequence of being stack-only is that a
Span<T>
instance belongs to a single thread. This makesSpan<T>
de-facto thread-safe.
Managed pointer, ref struct
, ref field, extended usage of the keyword ref
, is an interesting topic and we dedicated an entire article to it: Managed pointers, Span<T>, ref struct, C#11 ref fields and the scoped keyword
No stack-only restriction with Memory<T>
The structures System.Memory<T>
and System.ReadOnlyMemory<T>
were introduced alongside System.Span<T>
and System.ReadOnlySpan<T>
in the same release.
Memory<T>
shares similarities with Span<T>
but it is a regular structure. It doesn’t have the ref struct
stack-only restrictions. This makes it suitable for use as a field in a class, for instance. However, this lack of constraint also means Memory<T>
doesn’t have this special relation with the GC. Consequently, it is slightly less performant. This performance loss arises from the fact that its implementation has 3x fields instead of 2x: instead of having a special ref
pointer, Memory<T>
needs to reference both the _object
and then the _index
in the object.
1 2 3 4 5 6 7 8 |
public readonly struct Memory : IEquatable<Memory> { // NOTE: With the current implementation, Memory and ReadOnlyMemory must have the same layout, // as code uses Unsafe.As to cast between them. private readonly object? _object; private readonly int _index; private readonly int _length; ... } |
I wanted to benchmark the comma-separated string code above with Memory<T>
.Then I realized that there is no uint.Parse(Memory<T>)
API which suggests Memory<T>
didn’t get as much love as Span<T>
.
Span<T> and the .NET Framework
Because Span<T>
and ref
fields imply significant updates on the runtime GC, they were not ported to the .NET Framework. They are only available on the .NET Core runtime (.NET 7, .NET 8…) since version 2.1. Here is a Microsoft engineers discussion about it: “Fast Span is too fundamental change to be quirklable in reasonable way.”.
However the implementation of Span<T>
exists for .NET Framework. It is referred to as slow span. To use it, reference the Nuget package System.Memory from your .NET Framework project. This implementation is similar to the Memory<T>
implementation with 3x fields:
1 2 3 4 5 6 |
public readonly ref partial struct Span<T> { private readonly Pinnable<T> _pinnable; private readonly IntPtr _byteOffset; private readonly int _length; ... } |
Also when referencing the System.Memory package from a .NET Framework project you won’t get APIs similar to uint.Parse(Span<T>)
which makes it less attractive.
Conclusion
In this article, we explored Span<T>
and ReadOnlySpan<T>
and their role in optimizing performance.
These structures are integral to the .NET Base Class Library, requiring significant runtime changes to enhance efficiency in performance-critical scenarios. While not essential for every use case, they can be a game-changer for those who need them.
Excellent article !