NDepend

Improve your .NET code quality with NDepend

Improve C# code performance with Span<T>

C# 7.2 introduced the structure System.Span<T>. First we’ll present a concrete example where Span<T> helps achieve better performance. Then we’ll explain what makes Span<T> so special.

Span<T> primary goal is to avoid allocating new objects on the heap when one needs to work with a contiguous region of arbitrary memory. Performance gain is twofold:

  • A) the allocation on heap operation is not performed
  • B) less pressure on the Garbage Collector (GC) since it doesn’t need to track non-allocated objects.

Using Span<T> to parse a comma separated uint string

Let’s use Span<T> to obtain an array of uint from the string "163,496,691,1729".

  • Without Span<T> one would use "163,496,691,1729".Split(','). This call allocates four strings and an array to reference these four strings. Then uint.Parse(string) is used to parse each sub-string.
  • With Span<T> (actually with ReadOnlySpan<char> because the content of a string is immutable) the input string gets sliced into four spans. Because ReadOnlySpan<T> is a structure, each span is concretely a few bytes added on the current thread stack. Stack allocation is super fast and the GC is not impacted by values allocated on the stack. Then uint.Parse(ReadOnlySpan<char>) is used to parse each slice.

The method uint.Parse(ReadOnlySpan<char>) shows that it’s not just about Span<T> itself: many .NET APIs related to string and other memory representations have been extended to accept Span<T> or ReadOnlySpan<T>.

Here is pseudo code and diagrams that summarizes both approaches:

C# Span<T>

Benchmarking Span<T> performance gain

Below is the complete code that can be pasted in a C# 6 Program.cs source file. The NuGet package BenchmarkDotNet needs to be referenced. Here is the github project BenchmarkDotNet. Before digging into Benchmark.NET results, let’s note that:

  • A third method GetUIntArrayWithAstuteParsing() shows an optimized way to parse "163,496,691,1729" without the need of Span<T>.
  • In the real-world the number of uint in the comma separated string input wouldn’t be known upfront. A List<uint> would be used to store uint values parsed until we get them all. But here we want to highlight that no allocation is made by Span<T>. Thus to avoid cluttering the result uint[] arrayToFill is pre-allocated with the proper length.

For each cases, Benchmark.NET measures both memory allocation and durations. Here is how it presents the results:

  • GetUIntArrayWithAstuteParsing() is the fastest way and doesn’t allocate anything. The performance gain comes from the fact that we wrote our own dedicated uint parsing implementation. This shows well that despite new goodies in the framework, best performance is often achieve with well though-out algorithm.
  • GetUIntArrayWithSpan() is 38% faster than GetUIntArrayWithSplit(). This is great but the essential saving is that nothing gets allocated. In a real world scenario where this method would be used to parse millions of uint values, a lot of GC pressure would be saved.

Understand what makes Span<T> special

Most of Span<T> articles I read stop here. We now have a new cool way to avoid allocating sub-strings.

But the key is that the super-performant implementation of Span<T> required heavy changes in the runtime. Let’s explain what happened.

Span<T> has a special relation with the GC

The Span<T> source code shows that it contains two fields.

The _length value is internally multiplied by sizeof(T) to obtain the end address of the slice. Thus the slice in memory is the range [_pointer, _pointer + _length*sizeof(T)].

_pointer is typed with the special structure ByReference<T>. This structure is special because it represents a pointer carefully handled by the GC. In our scenario _pointer points to an offset position within a string object. When compacting memory the GC might move the string at a different address and the GC would then translate _pointer properly. ByReference<T> is not a public API. Because ByReference<T> and Span<T> have special relations with the GC they cannot live on the heap managed by the GC. They are stack-only values, they can only live on the thread stack. A Span<T> can be passed as method parameter and can be returned by a method but it cannot be a field of an object for example.

A fortunate consequence of being stack-only is that a Span<T> instance belongs to a single thread. This makes Span<T> de-facto thread-safe.

ref struct restrictions

As we saw a whole range of restrictions applies to Span<T> to guarantee it’ll never go to the heap. This is why it is declared as a ref struct and not as a struct. ref struct is here to tell the compiler to applies restrictions.

ref struct restrictions

Here are more restrictions that prevent that a ref struct value ends up on the heap at runtime:

  • A ref struct can’t be the element type of an array.
  • A ref struct can’t be a declared type of a field of a class or a struct. However a ref-struct can type a field of a ref-struct. This is illustrated by the field ByReference<T> _pointer declared within Span<T>.
  • A ref struct can’t implement interfaces.
  • A ref struct can’t be boxed to System.ValueType  or System.Object.
  • A ref struct can’t be a type argument.
  • A ref struct variable can’t be captured by a lambda expression or a local function.
  • A ref struct variable can’t be used in an async method. However, you can use ref struct variables in synchronous methods, for example, in those that return Task  or Task<TResult>.
  • A ref struct variable can’t be used in iterators.

Interestingly enough, I learned about ref struct peculiarities when some NDepend users got false positive on the rule Don’t use obsolete types, methods or fields. The compiler tags ref struct with ObsoleteAttribute to prevent them being used by older versions of C# that don’t know about the stack-only restrictions of ref struct. Hence the false positive when using ref struct that is fixed for the next version

No restriction with Memory<T>

The struct Memory<T> is similar to Span<T> but without the ref struct restrictions. It can be used as a field of a class for example. As a consequence Memory<T> doesn’t have this special relation with the GC and is a bit less performant.  This performance loss is because its implementation has 3x fields instead of 2x: instead of having a special ByReference<T> pointer, Memory<T> needs to reference the _object and then the _index in the object.

I wanted to benchmark the comma separated string code above with Memory<T> but realized that there is no uint.Parse(Memory<T>) API which suggests Memory<T> didn’t get as much love as Span<T>.

Span<T> and the .NET Framework

Because Span<T> and ByReference<T> imply significant updates on the runtime GC, they were not ported to the .NET Framework. They are only available on the .NET Core runtime (.NET 5, .NET 6…) since version 2.1. Here are Microsoft engineers discussions about it: Fast Span is too fundamental change to be quirklable in reasonable way.”.

However an implementation of Span<T>  exists for .NET Framework, it is referred as slow span. To use it the Nuget package System.Memory must be referenced. This implementation is similar to the Memory<T> implementation with 3x fields:

Also when referencing the System.Memory package from a .NET Framework project you won’t get APIs like uint.Parse(Span<T>) which makes it less attractive.

Span<T> API

As we saw Span<T> is especially suitable to improve performance when working with string because it provides a way to manipulate sub-string without any allocation. However Span<T> is a generic type and can be used on any kind of element like byte. The complete Span<T> API including extension methods is huge because of all the methods overloaded. Here is a simplified API below:

Notice above the unsafe constructor that takes a void* pointer. Span can work on any kind of memory including unmanaged memory. It thus represents a simpler way to work with pointers and unmanaged memory.

However what makes Span<T>/ReadOnlySpan<T> shines is that more than 5.000 methods in the .NET API use it. This is shown by the screenshot below taken from NDepend analyzing the .NET 6 framework in the directory C:\Program Files\dotnet\shared\Microsoft.NETCore.App\6.0.2:

Span used everywhere in .NET API

Span<T> vs. Array

At this point one might wonder how Span<T> differs from array and especially ArraySegment<T>.

  • Span<T> has special relation with GC that makes it more performant than ArraySegment<T> in stack-only scenarios.
  • ArraySegment<T> is limited to managed memory while we saw Span<T> can handle also unmanaged memory.
  • ArraySegment<T> doesn’t provide a read-only view while ReadOnlySpan<T> does.
  • The confusion between Span<T> and array comes from the fact that Span<T> is a view on some data and most of the time this data is represented through an array. So array is still needed, Span<T> is a just a convenient view on it.

Conclusion

Span<T>/ReadOnlySpan<T> are special API that required heavy runtime modifications to offer improvements in very high performance critical scenarios. Not everyone needs its power but for those that needs it, it is a game changer.

 

My dad being an early programmer in the 70's, I have been fortunate to switch from playing with Lego, to program my own micro-games, when I was still a kid. Since then I never stop programming.

I graduated in Mathematics and Software engineering. After a decade of C++ programming and consultancy, I got interested in the brand new .NET platform in 2002. I had the chance to write the best-seller book (in French) on .NET and C#, published by O'Reilly and also did manage some academic and professional courses on the platform and C#.

Over my consulting years I built an expertise about the architecture, the evolution and the maintenance challenges of large & complex real-world applications. It seemed like the spaghetti & entangled monolithic legacy concerned every sufficiently large team. As a consequence, I got interested in static code analysis and started the project NDepend.

Today, with more than 12.000 client companies, including many of the Fortune 500 ones, NDepend offers deeper insight and full control on their application to a wide range of professional users around the world.

I live with my wife and our twin kids Léna and Paul in the beautiful island of Mauritius in the Indian Ocean.

Comments:

Comments are closed.