The concept of managed pointer exists in the NET runtime and C# since the inception of the platform in the early 2000. Managed pointers belong mostly to the pointer world, which makes them well suited for performance critical scenarios. However unlike regular pointers, an extra care from the compiler and the runtime makes their usage safe.
Only recently – from C# 7.0 (2017) to C# 11 (2022) – the .NET engineers improved the runtime and the language constructs around managed pointers to unleash their flexibility and performance gain. These improvements mostly rely in more expressions supporting the C# keyword ref
dedicated to managed pointers.
These new constructs around the keyword ref
are often not well understood. Many articles talking about new ref
stuff don’t event mention managed pointers. Only language design discussions (that are de-facto quite verbose) and a few insider posts (that often have a narrow focus) capture the primary intention. Some sparse stackoverflow answers also contain interesting remarks. So I decided to write the present article to attempt to provide the whole story through code samples to illustrate these recent C# evolutions.
Let’s keep in mind that managed pointer benefits are mostly for CPU intensive scenario on low-level constructs, like parsing a large data set. You might not use managed pointers every day but it is certainly worth knowing about their capabilities.
Managed Pointer
Since the early days of C# and .NET there was the concept of managed pointer. A managed pointer is like a pointer except that the runtime keeps track of it and thus, it doesn’t require an unsafe
scope. This code below shows one managed pointer that points to an integer on the thread stack and another managed pointer that points to a string
object. The method changes the pointed integer value and then the pointed string
object. During a GC relocation phase (when object are being compacted) the pointed string
object location can change, which leads the runtime to update the pointer, this is why it is qualified as managed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
using System.Diagnostics; int i = 6; string str = "6str"; AssignNewValues(ref i, ref str); // The value of variables i and str on stack have been modified!! Debug.Assert(i == 7); Debug.Assert(str == "7str"); static void AssignNewValues(ref int i, ref string str) { i = 7; str = "7str"; } |
Interior pointer
The code below shows that a managed pointer can point to fields location that are nested within the layout of an object, and even to a particular slot within an array. Again the GC can modify the in-memory location of the object and then update managed pointers. This kind of managed pointer pointing inside an object or an array is known as interior pointer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
using System.Diagnostics; var foo = new Foo(); AssignNewValues(ref foo.m_I, ref foo.m_Str, ref foo.m_Array[2]); // The value of fields foo.m_I, foo.m_Str and foo.m_Array[2] have been modified!! Debug.Assert(foo.m_I == 7); Debug.Assert(foo.m_Str == "7str"); Debug.Assert(foo.m_Array[2] == 42); static void AssignNewValues(ref int i, ref string str, ref int j) { i = 7; str = "7str"; j = 42; } class Foo { internal int m_I = 6; internal string m_Str = "6Str"; internal int[] m_Array = new int[4]; } |
Keeping managed pointer safe
Because managed pointers are not in unsafe
scopes, the compiler must prevent a managed pointer to point to memory that is no longer valid. In the example below, the variable j
lives on the UpdateRef()
method’s stack. Thus j
doesn’t exist anymore when UpdateRef()
returns. As a consequence, the managed pointer i
, that is located outside the UpdateRef()
method stack (maybe in the caller method stack), cannot point to the j
location that has a narrower escape scope.
Managed pointer vs. managed reference
From now keep in mind that:
- A managed pointer can point to anything on the stack or on the heap. When the GC relocates an object, if a managed pointer points to the object or to its interior, the GC updates the manager pointer.
- A managed pointer always lives on the stack. The rational behind this restriction is that if a managed pointer could live on the heap, this would lead to too complex heuristics at GC relocation time and this would go against the performance expectation of managed pointer.
- While managed references only objects, managed pointers are more flexible. A managed pointer can point to any kind of memory representation including:
- a method local variable
- a method parameter in or out
- a location on the stack
- an object
- a field of an object
- an element of an array
- a string or a location within a string
- unmanaged memory buffer
- Managed pointer is good for performance, because in many scenario it doesn’t require any new object to be created on the heap, which would put pressure on the GC.
If you want to dig deeper into managed pointer and interior pointer concepts (discussion about IL code generated, JITed assembly code, GC implication…) you can read this great article written by Konrad Kokosa.
Managed pointers vs. pointers, unsafe, and pinning
Managed pointer belong to the safe world:
- because the GC takes care of updating them when it relocates an object pointed
- and because the compiler prevents situations where a managed pointer could reference some invalid memory.
Regular pointers do require the keyword unsafe
to be used. And when a pointer points to a location in the managed heap, the object that hold the location must be pinned first. Pinning prevents the GC from relocating the object while some pointers work is performed on the object memory representation. Pinning is thus harmful for performance, exactly what we don’t want when we work with pointers.
1 2 3 4 5 6 7 8 |
using System.Numerics; Vector3 vec = new Vector3(1.1f, 2.2f, 3.3f); unsafe { var buffer = new byte[sizeof(Vector3)]; fixed (byte* pointer = buffer) { // Do anything with float representation bytes of vec! } } |
C# 7.0 ref local and ref return
C# 7.0 extended the usage of the ref
keyword. A local variable can be a managed pointer, this is illustrated by the example below:
1 2 3 4 5 |
using System.Diagnostics; int i = 6; ref int j = ref i; j = 7; Debug.Assert(i == 7); |
Also a method can return a managed pointer:
1 2 3 4 5 6 |
ref int i = ref GetRef(); static ref int GetRef() { int[] arr = new int[6]; return ref arr[2]; } |
This example is not trivial. First notice that the array is allocated on the heap and lives longer than the method GetRef()
that builds it. So it is fine to return an interior pointer to one of its element. Second, the returned interior pointer is then the only GC root of the array. But the array itself became unreachable once the method has returned because there is no way to obtain the array reference from its interior pointer!
C# 7.2 ref struct and Span<T>
C# 7.2 introduced the notion of ref struct
mostly to provide a fast and flexible implementation for Span<T>
. Span<T>
let’s work with a contiguous region of arbitrary memory. For example a Span<char>
can map a substring of a string
object. While the method string.SubString(int start, int length)
returns a new string
object, a substring represented by a Span<char>
is the memory slice within the string
itself. Thus working with substrings through Span<T>
is much more performant because it doesn’t require any new string allocation.
Here is what we will see in this section:
Span<T>
hold internally a managed pointer to the memory slice pointed.- As all managed pointer, the one held by a
Span<T>
must live on the thread stack. ThusSpan<T>
must also live on the thread stack. - This is why
ref struct
was created: to force aSpan<T>
to always live on the thread stack through compile-time restrictions. - Because managed pointers are flexible and can point to virtually anything,
Span<T>
can address any memory scenario through a limited API. - As an added bonus, living on the stack makes
Span<T>
thread-safe. This fact relieves any synchronization need and as a consequence, unleashes some additional performance gain.
Span<T> implementation in .NET 6.0 (and earlier)
I explained in detail Span<T>
in this post Improve C# code performance with Span<T> but here I’d like to focus on the link between Span<T>
and ref struct
. Span<T>
is declared as ref struct
. In .NET 6.0 (and prior) Span<T>
internally held a managed pointer to the memory pointed through a field typed withByReference<T>
(see this source code here):
1 2 3 4 5 6 |
public readonly ref struct Span<T> { /// <summary>A byref or a native ptr.</summary> internal readonly ByReference<T> _pointer; /// <summary>The number of elements this Span contains.</summary> private readonly int _length; ... |
ByReference<T>
is a ref struct
declared as internal
. Thus it couldn’t be used outside of the .NET Base Class Library. Basically it was an internal trick to hold a managed pointer as a field of a ref struct
. ByReference<T>
was only used in the context of Span<T>
and ReadOnlySpan<T>
as shown by the NDepend code query against the .NET 6.0 impl below. TypedReference is also matched which is an internal runtime helper.
Span<T> implementation in .NET 7.0 (and later)
Since C# 11 and .NET 7.0 Span<T>
can rely on the new ref field C# 11 language construct (explained later in the present article) and its new implementation is now:
1 2 3 4 5 6 |
public readonly ref struct Span<T> { /// <summary>A byref or a native ptr.</summary> internal readonly ref T _reference; /// <summary>The number of elements this Span contains.</summary> private readonly int _length; ....<code> |
Btw one super cool fact not underlined yet is that in this code sample, the managed pointer points to a generic T
! ref T
can be pretty much anything: a char
location within a string, an int
location within an array, a byte
location within a stackalloc
buffer. Hat off to the .NET team!!
Span<T> implementation in .NET Framework v4.x
The runtime itself was significantly updated to make Span<T>
faster thanks to its internal managed pointer. The update was important enough that it was not applied to the .NET Framework 4.X as explained here by Microsoft engineers: “Fast Span is too fundamental change to be quirklable in reasonable way.”.
However an implementation of Span<T>
exists for the .NET Framework, it is referred as slow span. To use it the Nuget package System.Memory must be referenced.
1 2 3 4 5 6 |
public readonly ref partial struct Span<T> { private readonly Pinnable<T> _pinnable; private readonly IntPtr _byteOffset; private readonly int _length; ... } |
Span<T> and thread-safety
Finally, the fact that Span<T>
always live on the stack makes it de-facto thread-safe. This is the opportunity for an additional performance gain. Let’s remind that Span<T>
has two fields: _pointer
and _length
. In concurrent scenario, modifying several states requires a lock to be an atomic operation. Without such lock we end up with the struct tearing phenomenon: the possibility of an inconsistent structure state in a concurrent environment. But since Span<T>
is de-facto thread-safe we don’t need such lock.
Memory<T>: a slower Span<T>
Span<T>
is fairly flexible and has almost no overhead. A slower version of memory slice already existed through the struct Memory<T>.
This implementation is conceptually similar to the .NET Fx slow Span<T>
implementation with 3x fields (see above). But since it is a struct
and not a ref struct
, a Memory<T>
can be nested as a field within an object layout on the heap.
1 2 3 4 5 6 |
public readonly struct Memory : IEquatable<Memory> { private readonly object? _object; private readonly int _index; private readonly int _length; ... } |
The stack only restriction of ref struct
Not only the runtime was updated for Span<T>
but also the language concept of ref struct
was introduced. A ref struct
is a struct
that can only live on the thread stack. Let’s remind that there are numerous ways to end up with a struct
instance on the object heap:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
var myStruct = new MyStruct() { Val = 3 }; // myStruct is boxed as a new object on the heap object obj = myStruct; // myStruct is boxed as a new IInterface object on the heap IInterface ii = obj as IInterface; // 7x myStruct instances live on the heap within the array MyStruct[] arr = new MyStruct[7]; struct MyStruct : IInterface { internal int Val {get; init;} } interface IInterface { } class MyClass { MyStruct m_Field; // MyStruct lives on the heap with any instance of MyClass } struct ParentStruct { MyStruct m_Field; // MyStruct lives on the heap when ParentStruct lives on the heap } |
All those are prohibited at compile time as soon as MyStruct
becomes a ref struct
:
A ref struct
instance can only be used as a local variable, an in or out method parameter and as a field of a ref struct
:
1 2 3 4 5 6 |
ReadOnlySpan<char> span = "hello".AsSpan().Slice(1, 2); ref struct SpanHolder<T> { Span<T> Span1 { get; } Span<T> Span2 { get; } } |
I’ve read that ref struct
was misnamed and that stackonly struct
would have been better. Those that wrote that didn’t understand the primary goal of ref struct
which is to hold a managed pointer as a field. In C# the keyword ref
means managed pointer so it is perfectly named.
Interestingly enough, I learned about ref struct
peculiarities when some NDepend users got false positive on the rule Don’t use obsolete types, methods or fields. The compiler tags ref struct
with ObsoleteAttribute
to prevent them being used by older versions of C# that don’t know about the stack-only restrictions of ref struct
. Hence the false positive when using ref struct
that is fixed for the next version
C# 7.2 conditional ref expression
C# 7.2 also added some language facilities to work with managed pointer and the ?:
ternary condition illustrated by the code sample above:
1 2 3 4 5 6 7 8 9 10 11 12 |
using System.Diagnostics; var arr1 = new int[] { 0, 1 }; var arr2 = new int[] { 100, 101, 102 }; for (int i = 0; i < arr2.Length; i++) { ref int refToItemInArray = ref (i < arr1.Length ? ref arr1[i] : ref arr2[i]); refToItemInArray += 3; } Debug.Assert(arr1.SequenceEqual(new int[] { 3, 4 })); Debug.Assert(arr2.SequenceEqual(new int[] { 100, 101, 105 })); |
C# 7.2 readonly ref
C# 7.2 also introduced the in
parameter modifier to prevent modifying the state of a structure passed as argument. With this modifier the compiler can then pass a reference of the structure instance instead of copying it which is the default behavior. Just passing a reference is safe because the compiler prevents the in
structure instance to be modified. This is quite a welcomed performance gain especially when working with larger structures:
The same benefit applies with a returned structure instance with the C# 7.2 ref readonly
modifier. A managed pointer is returned instead of copying back the structure instance.
Also structure copying can be prevented in both in and out scenario with the new C# 7.2 concept of readonly struct
:
See more on this C# 7.2 addition here.
C# 7.3 ref re-assignment
C# 7.3 introduced ref re-assignment illustrated by the short program below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
var items = new[] { new Item { Id = "Item A" }, new Item { Id = "Item B" } }; // itemRefX reference ItemA ref var itemRefX = ref GetItem(items, 0); Debug.Assert(itemRefX.Id == "Item A"); // C# 7.3 ref re-assignment: the managed pointer is updated to ItemB // The array hasen't been changed itemRefX = ref GetItem(items, 1); Debug.Assert(itemRefX.Id == "Item B"); Debug.Assert(items[0].Id == "Item A"); Debug.Assert(items[1].Id == "Item B"); ref var itemRefY = ref GetItem(items, 0); // No use of the ref keyword below. // This is not a C# 7.3 ref re-assignment, // the first item in array is now ItemB itemRefY = GetItem(items, 1); Debug.Assert(ReferenceEquals(items[0], items[1])); Debug.Assert(items[0].Id == "Item B"); Debug.Assert(items[1].Id == "Item B"); static ref Item GetItem(Item[] item, int id) { return ref item[id]; } class Item { public string Id { get; set; } } |
Managed pointer arithmetic
This ref re-assignment feature above opens the door of pointer arithmetic to managed pointer. Also the class System.Runtime.CompilerServices.Unsafe exposes methods to perform pointer arithmetic with managed and unmanaged pointers. It was released with .NET Core 3.0 and is proposed as a NuGet package for .NET Fx 4.x.
Such pointer arithmetic is used for example in the implementation of Span<T>.Slice()
(available here) and also reproduced below. The call to Unsafe.Add()
relies on pointer arithmetic to translate the managed pointer to the proper offset of the slice start:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
/// <summary> /// Forms a slice out of the given span, beginning at 'start'. /// </summary> /// <param name="start">The index at which to begin this slice.</param> /// <exception cref="System.ArgumentOutOfRangeException"> /// Thrown when the specified <paramref name="start"/> index is not in range (<0 or >Length). /// </exception> [MethodImpl(MethodImplOptions.AggressiveInlining)] public Span<T> Slice(int start) { if ((uint)start > (uint)_length) ThrowHelper.ThrowArgumentOutOfRangeException(); return new Span<T>( ref Unsafe.Add(ref _reference, (nint)(uint)start /* force zero-extension */), _length - start); } |
More on pointer arithmetic in safe code can be found in this article: Unsafe array access and pointer arithmetics in C#
C# 8.0 disposable ref structs
As we’ve seen a ref struct
cannot implement an interface and thus cannot implement IDisposable
. However since C# 8.0 a ref struct
can dispose its internal state and resources hold through an accessible void Dispose()
method. Thus the using
pattern can be applied and the Dispose()
method is called when leaving the using
scope.
1 2 3 4 5 6 7 8 9 |
using(new MyStruct()) { Console.WriteLine("Hello"); } ref struct MyStruct { int i; internal void Dispose() { Console.WriteLine("MyStruct disposed"); } } |
C# 11 ref field
Some users asked the .NET runtime team to make the internal ByReference<T>
trick available to all .NET developers. With a bit of astute it was possible to embed a 1-length Span<T>
to have access to it (see here). C# 11 ref field does exactly that: hold a managed pointer within a field. Of course, because of the managed pointer stack requirement a ref field can only be declared as a field of a ref struct
.
The C# language specification for ref field is quite long and verbose and I hope that with everything explained above you now get the point. Of course there are extra care when it comes to ref field.
Unlike other ref
scenario, with ref fields it is easy to end up with a null managed pointer.
It is possible to test for a managed pointer nullity through helpers within the class Unsafe
mentioned above.
1 2 3 4 5 6 7 8 |
RefStruct local = default; ref int i = ref local.Value; if (!System.Runtime.CompilerServices.Unsafe.IsNullRef(ref i)) { string str = i.ToString(); } ref struct RefStruct { public ref int Value; } |
Let’s explore an interesting use-case for ref field. A linked list whose nodes live on the stack (this example comes from the ref field language design discussion):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
ref struct StackLinkedListNode<T> { T _value; ref StackLinkedListNode<T> _next; // This is the ref field! public T Value => _value; public bool HasNext => !Unsafe.IsNullRef(ref _next); public ref StackLinkedListNode<T> Next { get { if (!HasNext) { throw new InvalidOperationException("No next node"); } return ref _next; } } public StackLinkedListNode(T value) { this = default; _value = value; } public StackLinkedListNode(T value, ref StackLinkedListNode<T> next) { _value = value; _next = ref next; } } |
Of course Span<T>
remains the shinier application of managed pointer flexibility and performance gain. But this short example is a good indication that a significant range of algorithms and runtime constructs can benefit from this new possibility. Just keep in mind that a thread stack size is limited: by default 4MB on 64 bits and 1MB on 32 bits. A StackOverflowException
could easily pop from a wrong usage of this StackLinkedListNode<T>
. And keep in mind too that ref field is not just a stack only stuff: often a Span<T>
points to a memory slice that belongs to the heap.
readonly ref readonly
C# 11 ref field can be declared as ref
,ref readonly
,readonly ref
, or readonly ref readonly
. What? This looks complex but is actually straightforward. The readonly
effect does apply to either the reference, or to the referenced value. This is illustrated by the screenshot below:
As we saw earlier, since C# 7.2 the compiler is smart enough to prevent some unnecessary copy thanks to the readonly
usage.
C# 11 structure returning a managed pointer to its field
With C# 11, a structure (not necessarily a ref struct
) can now return a managed pointer to its field. This example of frugal list is also proposed in the ref field language design discussion:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
struct FrugalList<T> { private T _item0; private T _item1; private T _item2; public ref T this[int index] { [UnscopedRef] get { switch (index) { case 0: return ref _item0; case 1: return ref _item1; case 2: return ref _item2; default: throw null; } } } } |
C# 11 scoped keyword
The new keyword scoped
defines runtime guarantee about the lifetime of managed pointers.
Its usage is best explained with these two screenshots below. scoped
is like a contract that a method shows to its callers. It guarantees that a managed pointer passed in parameter won’t escape the scope of the method.
Of course the compiler doesn’t accept to pass a managed pointer to a method that doesn’t use the scoped
keyword.
Finally, the keyword scoped
can also be used on a local managed pointer to prevent it from escaping:
Conclusion
Since the .NET inception managed pointers were here, awaiting some runtime and language improvements to unleash their flexibility and performance gain in user safe code. A few points are still discussed by language designers but hopefully with .NET 7.0 and C# 11 we now get the bulk of these improvements. Let’s remind the key points:
- Managed pointer sits in between pointers and object reference.
- A managed pointer always live on the thread stack. There is no risk it gets relocated by the GC.
- A managed pointer can point toward a locations within an object on the heap (like an element of a
int[]
for example or achar
position with astring
). This is known as interior pointer. The GC updates the interior pointer when it relocates the object. Thus – unlike regular pointers – managed pointer belong to the safe world. - Since C# 7.0 the C# keyword
ref
– which is the keyword for managed pointer – gets used in an increasing number of scenario. The primary motivation was to obtain a fast and generic memory slice implementation based on managed pointer throughSpan<T>
.Span<T>
hold a managed pointer to the pointed memory slice. This makes it faster than something likeMemory<T>
. - Since managed pointer must live on the stack the concept of
ref struct
was added to C# to preventSpan<T>
to live elsewhere than the stack. - A managed pointer can point to any kind of memory representation. This flexibility was an important point when designing a generic memory helper like
Span<T>
. - C# 11 opened what allowed
Span<T>
to hold a managed pointer as a field. This is the new ref field language construct.
Now you can improve the performance of some of your low-level algorithms by re-implementing them with managed pointers.
References
- C# 11 Language design: Low Level Struct Improvements (August 2022)
- Runtime implementation
- Managed pointers in .NET by Konrad Konkosa (January 2019)
- Span by Adam Sitnik (July 2017)
- Unsafe array access and pointer arithmetics in C# by Nicolas Portmann (January 2019)
- Improve C# code performance with Span<T> by Patrick Smacchia (February 2022)
- ref structs in C# 7.2 – .NET Concept of the Week by Gergely Kalapos (August 2018)
- Stackoverflow question: What is ref struct in definition site (January 2018)
Good article overall so far. I’m looking for something i can point less experienced engineers at. It is likely somewhat ill advised commenting before finishing the read, but I’ll take the risk for now.
On one hand i appreciate the fact that you are going with the orthodox line that “Pinning objects is not advised as it can affect performance”… as it even says it is bad in the CLR/GC c++ source code on github.
That’s said, as someone who has used unsafe pointers on high performance/throughput and distributed routines in .NET in anger most of his career (less though these days thanks to all the new stuff) that whilst it is possible to affect performance as stated, you really have to do something pretty stupid or generally ignorant to make that particular predicted outcome a reality – especially given how GC actually works internally (see github/dotnet/runtime in-repo doco/code and a certain 2018 Apress published book for further detail). I guess i’m stating that maybe the message should be more along the lines of ‘this is the official line, but so long as you only use unsafe pointers if all the other options have been exhausted first, then unless you are doing anything particularly esoteric or otherwise ill advised, you shouldn’t have any issues – and you can always profile your changes to verify too.’
I guess I simply dislike the blanket “it’s bad! ‘k?”, when I know that it patently isn’t 95% of the time…