NDepend Blog

Improve your .NET code quality with NDepend

Improve C# code performance with Span<T>

November 7, 2023 9 minutes read

Improve C# code performance with Span of T

The structure Span<T> in the namespace System appeared with C# 7.2 in 2017. Span<T> offers type-safe access to a contiguous region of memory. Such adjacent sequence of bytes can be located on the heap, the thread’s stack, or even consist of unmanaged memory. This makes Span<T> fairly versatile.

Span<T> in C# is not your everyday structure. It is declared as a ref struct. Being a ref struct imposes a constraint that restricts its allocation to the stack rather than on the managed heap. This restriction, however, comes with some limitations compared to regular struct. For example, it cannot type a field within a class and asynchronous methods cannot use it.

The primary motivation behind the ref struct design of Span<T>in C# is to ensure that its usage does not result in additional heap allocations. This way it won’t pressure the Garbage Collector (the GC) with additional objects to track. This key design principle underpins its suitability for highly optimized use cases.

In this post, we will first go through some Span<T> performance improvement examples with benchmarks. We’ll then explain the reasons why Span<T> perform better than your regular C# code.

Improving some C# code performance with Span<T>

Let’s use Span<T> to obtain an array of uint from the string "163,496,691,1729".

  • Without Span<T> one would use "163,496,691,1729".Split(','). This call allocates four strings and an array to reference these four strings. Then uint.Parse(string) is used to parse each sub-string.
  • Actually, we will useReadOnlySpan<char> because the content of a string is immutable.
  • With ReadOnlySpan<T> the input string gets sliced into four spans. Because ReadOnlySpan<T> is a ref struct, each of its instances occupies only a few bytes located on the current thread stack. Stack allocation is super fast and it does not impact the GC with values allocated on the stack. Then uint.Parse(ReadOnlySpan<char>) is used to parse each slice.

The method uint.Parse(ReadOnlySpan<char>) shows that it’s not just about Span<T> itself: many .NET APIs related to string and other memory representations have been extended to accept Span<T> or ReadOnlySpan<T>.

Here is a pseudo-code and some diagrams that summarize both approaches:

C# Span<T>

Benchmarking Span<T> performance gain

Below is the complete code that can be pasted into a C# Program.cs source file. To run this benchmark one needs to reference the NuGet package BenchmarkDotNet. Here is the github project BenchmarkDotNet. Before digging into Benchmark.NET results, let’s note that:

  • A third approach with the method GetUIntArrayWithAstuteParsing() presents an optimized method for parsing "163,496,691,1729" without the requirement of using Span<T>.
  • In the real world, the number of uint in the comma-separated string input may not be known in advance. Typically, a List<uint> would be used to store uint values parsed until all of them are obtained. But here we want to demonstrate that no allocation is made by Span<T>. Thus to avoid cluttering the result uint[] arrayToFill is pre-allocated with the proper length.

For each case, Benchmark.NET measures both memory allocation and duration. Here is how it presents the results:

  • GetUIntArrayWithAstuteParsing() is the fastest way and doesn’t allocate anything. The performance gain comes from the fact that we wrote our own dedicated uint parsing implementation. This clearly illustrates that, despite the presence of new features in the framework, the best performance often results from well-thought-out algorithms.
  • GetUIntArrayWithSpan() is 38% faster than GetUIntArrayWithSplit(). This is already a significant win. However the core of performance gain is that there is no heap allocation. In a real-world scenario where this method would be used to parse millions of uint values, a lot of GC pressure would be saved.

Span<T> implementation is based on a managed pointer!

A glimpse at the Span<T> implementation

Many articles discussing Span<T> tend to conclude at this point. We’ve introduced an efficient approach to sidestep the need for allocating sub-strings. However, the critical aspect lies in the substantial runtime modifications necessary to achieve this performant implementation of Span<T>. Let’s explain what happened.

The Span<T> source code shows that it contains two fields.

The _length value is internally multiplied by sizeof(T) to obtain the offset address of the slice. Thus the slice in memory is the range [_reference, _reference + _length*sizeof(T)].

_reference is a managed pointer field (or ref field). The ref field feature is a new feature added in C# 11 and .NET 7.0. Before that, the implementation of Span<T> (in .NET 6.0 and before…) used an internal trick to reference a managed pointer through an internal ref struct struct named ByReference<T>.

Span<T> is declared as a ref struct. A structure marked with ref, is a special structure that can only be allocated on the thread stack. This way it can hold a managed pointer as a field (ref field explained above).

The advantages of managed pointers

ref struct was released with C# 7.2 just to make the implementation of Span<T> through a managed pointer possible. If the .NET team achieved all these efforts this is because the Span<T> implementation being based on managed pointer has significant advantages:

  • Safe: Managed pointers are pointers but they belong to the safe world. There is no need to declare an unsafe scope to work with Span<T>.
  • Performance wise: The performance overhead of Span<T> is nearly negligible. This is because managed pointers, even though they are managed, are essentially regular pointers. Consequently, they incur minimal overhead. The management of these pointers includes two key aspects:
    • A) the C# compiler refuses code that could lead to a managed pointer pointing to an invalid memory and
    • B) if a managed pointer points to an object on the heap, the runtime automatically handles the updating of such pointers in the event of the GC relocating the referenced object
  • Flexibility:  A managed pointer can point to various types of memory, including objects on the heap, unmanaged buffer, value on the stack, field within an object, a slot within an array, or a position within a string. The Span<T> implementation benefits from this flexibility that makes its API and implementation concise. Because the memory pointed is typed as ref T, there is no need to bother if it’s a string, a slot of an array or a location on the stack.
  • Thread safe: A fortunate consequence of being stack-only is that a Span<T> instance belongs to a single thread. This makes Span<T> de-facto thread-safe.

Managed pointer, ref struct , ref field, extended usage of the keyword ref, is an interesting topic and we dedicated an entire article to it: Managed pointers, Span<T>, ref struct, C#11 ref fields and the scoped keyword

No restriction with Memory<T>

Memory<T> shares similarities with Span<T> but it is a regular structure. It doesn’t have the ref struct stack-only restrictions. This makes it suitable for use as a field in a class, for instance. However, this lack of constraint also means Memory<T> doesn’t have this special relation with the GC. Consequently, it is slightly less performant.  This performance loss arises from the fact that its implementation has 3x fields instead of 2x: instead of having a special ref pointer, Memory<T> needs to reference both the _object and then the _index in the object.

I wanted to benchmark the comma-separated string code above with Memory<T>.Then I realized that there is no uint.Parse(Memory<T>) API which suggests Memory<T> didn’t get as much love as Span<T>.

Span<T> and the .NET Framework

Because Span<T> and ref fields imply significant updates on the runtime GC, they were not ported to the .NET Framework. They are only available on the .NET Core runtime (.NET 7, .NET 8…) since version 2.1. Here is a Microsoft engineers discussion about it: Fast Span is too fundamental change to be quirklable in reasonable way.”.

However the implementation of Span<T>  exists for .NET Framework. It is referred to as slow span. To use it, reference the Nuget package System.Memory from your .NET Framework project. This implementation is similar to the Memory<T> implementation with 3x fields:

Also when referencing the System.Memory package from a .NET Framework project you won’t get APIs similar to uint.Parse(Span<T>) which makes it less attractive.

Span<T> API

As we’ve seen Span<T> is particularly effective in enhancing performance when dealing with strings. This is because it enables the manipulation of substrings without any need for memory allocation. However, Span<T> is a generic type. It can be used with various data types, such as byte. The complete Span<T> API including extension methods is huge because of all the overloaded methods. Here is a simplified API below:

Notice above the unsafe constructor that takes a void* pointer. Span can work on any kind of memory including unmanaged memory. It thus represents a simpler way to work with pointers and unmanaged memory like in this code sample:

What truly sets Span<T> and ReadOnlySpan<T> apart is their widespread integration into the .NET Base Class Library (BCL). This is illustrated in the screenshot below. It shows NDepend analyzing the .NET 8 framework in the directory C:\Program Files\dotnet\shared\Microsoft.NETCore.App\8.0.0:

Span used everywhere in .NET API

Span<T> vs. Array

At this point, one might wonder how Span<T> differs from standard arrays and especially the structure ArraySegment<T>.

  • Span<T> has a special relation with GC that makes it more performant than ArraySegment<T> in stack-only scenarios.
  • ArraySegment<T> is limited to managed memory while we saw Span<T> can handle also unmanaged memory.
  • ArraySegment<T> doesn’t provide a read-only view while ReadOnlySpan<T> does.
  • The confusion between Span<T> and array arises from the fact that Span<T> is a view on some data. Most of the time this data is represented through an array. So array is still needed. In this context, Span<T> is just a convenient view on arrays.

Conclusion

In this article, we delved into the new Span<T> and ReadOnlySpan<T> structures and applied it to refactor code for optimal performance.

Span<T> and ReadOnlySpan<T> hold a unique and significant place within the .NET Base Class Library. These types required substantial runtime modifications to deliver performance enhancements in extremely high-performance, critical scenarios. While not everyone may require their capabilities, for those who do, they can be a game-changing tool.

 

Comments:

  1. Blll Woodrufrf says:

    Excellent article !

Comments are closed.