NET 9.0 brings significant improvements to LINQ performance, with some scenarios showing remarkable gains. Let’s take a closer look at what’s driving these enhancements. The lessons learned will be relevant to your code.
Iterating with Span<T> when Possible
Let’s start by running this benchmark on .NET 8 versus .NET 9.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
using BenchmarkDotNet.Configs; using BenchmarkDotNet.Running; using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Jobs; BenchmarkRunner.Run<Benchmarks>(); [MemoryDiagnoser] [HideColumns("StdDev", "Median", "Job", "RatioSD", "Error", "Gen0", "Alloc Ratio")] [SimpleJob(RuntimeMoniker.Net80, baseline: true)] [SimpleJob(RuntimeMoniker.Net90)] public class Benchmarks { private IEnumerable<int> _array = Enumerable.Range(1, 10_000).ToArray(); [Benchmark] public int Count() => _array.Count(i => i > 0); [Benchmark] public bool All() => _array.All(i => i > 500); [Benchmark] public bool Any() => _array.Any(i => i == 9_999); [Benchmark] public int First() => _array.First(i => i > 9_000); [Benchmark] public int Single() => _array.Single(i => i == 9_999); [Benchmark] public int Last() => _array.Last(i => i > 0); } |
1 2 3 4 5 6 7 8 9 10 11 |
<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFrameworks>net8.0;net9.0</TargetFrameworks> <ImplicitUsings>enable</ImplicitUsings> <Nullable>enable</Nullable> </PropertyGroup> <ItemGroup> <PackageReference Include="BenchmarkDotNet" Version="0.14.0" /> </ItemGroup> </Project> |
Here are the results, which clearly speak for themselves.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
| Method | Runtime | Mean | Ratio | Allocated | |----------- |--------- |--------------:|------:|----------:| | LinqCount | .NET 8.0 | 16,198.490 ns | 1.00 | 32 B | | LinqCount | .NET 9.0 | 3,043.563 ns | 0.19 | - | | | | | | | | LinqAll | .NET 8.0 | 10.588 ns | 1.00 | 32 B | | LinqAll | .NET 9.0 | 2.562 ns | 0.24 | - | | | | | | | | LinqAny | .NET 8.0 | 17,096.735 ns | 1.00 | 32 B | | LinqAny | .NET 9.0 | 2,483.927 ns | 0.15 | - | | | | | | | | LinqFirst | .NET 8.0 | 15,289.747 ns | 1.00 | 32 B | | LinqFirst | .NET 9.0 | 2,243.341 ns | 0.15 | - | | | | | | | | LinqSingle | .NET 8.0 | 21,684.114 ns | 1.00 | 32 B | | LinqSingle | .NET 9.0 | 4,884.329 ns | 0.23 | - | | | | | | | | LinqLast | .NET 8.0 | 15.967 ns | 1.00 | - | | LinqLast | .NET 9.0 | 6.918 ns | 0.43 | - | |
The TryGetSpan() Method
In the post C# Array and List Fastest Loop, we demonstrated that using a Span<T>
for iterating over an array is faster than regular for
and foreach
loops. In the benchmark above, the performance enhancement is primarily due to the use of the method TryGetSpan()
. If the enumerable being iterated is an array or list, the method TryGetSpan()
returns a ReadOnlySpan<T>
for faster iteration. Here is the code extracted from TryGetSpan()
to test if the source
to enumerate is an array or a list, and then to obtain the span from the array or the list.
1 2 3 4 5 |
if (source.GetType() == typeof(TSource[])) { span = Unsafe.As<TSource[]>(source); } else if (source.GetType() == typeof(List<TSource>)){ span = CollectionsMarshal.AsSpan(Unsafe.As<List<TSource>>(source)); } |
To me, this code does not look optimized.
source.GetType()
is called twice!- Why not try to cast
source
toTSource[]
orList<TSource>
only once and then test the nullity of the obtained reference and use it?
This code was written by Stephen Toub and his team, who are THE .NET performance experts. They have a deep understanding of the C# compiler and JIT compiler optimizations, so it’s clear that this approach is the optimal one. The good news is that you can reuse this code in your own performance-critical code. And there is a lesson: In today’s highly optimized .NET stack, micro-optimizations in code are not obvious at all. Therefore, the advice to avoid premature optimization has never been more relevant.
One final note: List<TSource>
internally references an array. When the list’s capacity needs to grow or shrink, a new array is created and then referenced. The call to CollectionsMarshal.AsSpan(Unsafe.As<List<TSource>>(source))
retrieves a Span<TSource>
from this internal array. Do you see the risk? If the list’s capacity changes somehow, the array obtained through this method might become invalid.
Definitely, the class System.Runtime.CompilerServices.Unsafe
is well-named.
TryGetSpan() Callers
Now, let’s examine which methods call TryGetSpan()
. Using NDepend, we scanned the assembly located at C:\Program Files\dotnet\shared\Microsoft.NETCore.App\9.0.0-rc.1.24431.7\System.Linq.dll
. From the TryGetSpan()
method, we generated a code query to identify both direct and indirect callers. We then exported the 56 matched methods to the dependency graph. This analysis reveals that many standard Enumerable
methods attempt to iterate over a span when the collection is an array or a list.
However, since holding the internal array of a list obtained via CollectionsMarshal.AsSpan()
is not a safe option (as mentioned earlier), certain Enumerable
operations that defer iteration (like when using the yield
C# keyword) cannot rely on this optimization.
Specialized Iterators
Now let’s run the following benchmark found into this PR: Consolidate LINQ’s internal IIListProvider/IPartition into base Iterator class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
using Perfolizer.Horology; using BenchmarkDotNet.Configs; using BenchmarkDotNet.Running; using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Jobs; BenchmarkRunner.Run<Benchmarks>(); [MemoryDiagnoser] [HideColumns("StdDev", "Median", "Job", "RatioSD", "Error", "Gen0", "Alloc Ratio")] [SimpleJob(RuntimeMoniker.Net80, baseline: true)] [SimpleJob(RuntimeMoniker.Net90)] public class Benchmarks { private IEnumerable<int> _arrayDistinct = Enumerable.Range(0, 1000).ToArray().Distinct(); private IEnumerable<int> _appendSelect = Enumerable.Range(0, 1000).ToArray().Append(42).Select(i => i * 2); private IEnumerable<int> _rangeReverse = Enumerable.Range(0, 1000).Reverse(); private IEnumerable<int> _listDefaultIfEmptySelect = Enumerable.Range(0, 1000).ToList().DefaultIfEmpty().Select(i => i * 2); private IEnumerable<int> _listSkipTake = Enumerable.Range(0, 1000).ToList().Skip(500).Take(100); private IEnumerable<int> _rangeUnion = Enumerable.Range(0, 1000).Union(Enumerable.Range(500, 1000)); private IEnumerable<int> _selectWhereSelect = Enumerable.Range(0, 1000).Select(i => i * 2).Where(i => i % 2 == 0).Select(i => i * 2); [Benchmark] public int DistinctFirst() => _arrayDistinct.First(); [Benchmark] public int AppendSelectLast() => _appendSelect.Last(); [Benchmark] public int RangeReverseCount() => _rangeReverse.Count(); [Benchmark] public int DefaultIfEmptySelectElementAt() => _listDefaultIfEmptySelect.ElementAt(999); [Benchmark] public int ListSkipTakeElementAt() => _listSkipTake.ElementAt(99); [Benchmark] public int RangeUnionFirst() => _rangeUnion.First(); [Benchmark] public int SelectWhereSelectSum() => _selectWhereSelect.Sum(); } |
The performance improvements are even more remarkable! What caused this?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
| Method | Runtime | Mean | Ratio | Allocated | |------------------------------ |--------- |-------------:|------:|----------:| | DistinctFirst | .NET 8.0 | 65.318 ns | 1.00 | 328 B | | DistinctFirst | .NET 9.0 | 11.192 ns | 0.17 | - | | | | | | | | AppendSelectLast | .NET 8.0 | 4,122.007 ns | 1.000 | 144 B | | AppendSelectLast | .NET 9.0 | 2.661 ns | 0.001 | - | | | | | | | | RangeReverseCount | .NET 8.0 | 11.024 ns | 1.00 | - | | RangeReverseCount | .NET 9.0 | 6.134 ns | 0.56 | - | | | | | | | | DefaultIfEmptySelectElementAt | .NET 8.0 | 4,090.818 ns | 1.000 | 144 B | | DefaultIfEmptySelectElementAt | .NET 9.0 | 5.724 ns | 0.001 | - | | | | | | | | ListSkipTakeElementAt | .NET 8.0 | 6.268 ns | 1.00 | - | | ListSkipTakeElementAt | .NET 9.0 | 2.916 ns | 0.47 | - | | | | | | | | RangeUnionFirst | .NET 8.0 | 66.309 ns | 1.00 | 344 B | | RangeUnionFirst | .NET 9.0 | 6.193 ns | 0.09 | - | | | | | | | | SelectWhereSelectSum | .NET 8.0 | 3,959.622 ns | 1.00 | 112 B | | SelectWhereSelectSum | .NET 9.0 | 4,460.008 ns | 1.13 | 112 B | |
The Astute
In summary, the .NET performance team designed the code to recognize common LINQ call chains. When such a chain is detected, some special iterators are created to handle the workflow more efficiently. Some more optimizations can happen when the chain ends up with methods like Count()
, First()
, Last()
, ElementAt()
or Sum()
. For instance, OrderBy(criteria).First()
can be optimized to execute as Min(criteria)
.
The implementation: Iterator<T> and its Derived Class
Let’s have a look at the abstract base class Iterator<T>
and its 40 derivatives. They are all nested in the class Enumerable
.
Iterator<T>
is an abstract class but its methods are virtual. Hence its derivatives only override the required methods.
Here are the derivatives classes listed and exported to the graph:
Case Study: ListWhereSelectIterator<TSource, TResult>
Let’s focus on the iterator ListWhereSelectIterator<TSource, TResult>
.
1 2 |
public override IEnumerable<TResult> Select<TResult>(Func<TSource, TResult> selector) => new ListWhereSelectIterator<TSource, TResult>(_source, _predicate, selector); |
ListWhereIterator<TSource, TResult>
is instantiated within the Enumerable.Where()
method using the following code:
1 2 3 |
if (source is List<TSource> list){ return new ListWhereIterator<TSource>(list, predicate); } |
The ListWhereSelectIterator<TSource, TResult>
doesn’t override methods like TryGetFirst()
or TryGetLast()
, so how does it improve performance? The key optimization is that it acts as a single iterator for the supercommon Where(...).Select(...)
chain on a list, which would typically require two separate iterators. By consolidating both operations into one, it inherently improves efficiency. You can see it in its implementation of MoveNext()
where both delegates _predicate
and _selector
are invoked:
1 2 3 4 5 6 7 |
while (_enumerator.MoveNext()) { TSource item = _enumerator.Current; if (_predicate(item)) { _current = _selector(item); return true; } } |
Case Study: IListSkipTakeIterator<TSource>
Here is the implementation of MoveNext()
in the IListSkipTakeIterator<TSource>
class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
public override bool MoveNext() { // _state - 1 represents the zero-based index into the list. // Having a separate field for the index would be more readable. However, we save it // into _state with a bias to minimize field size of the iterator. int index = _state - 1; if ((uint)index <= (uint)(_maxIndexInclusive - _minIndexInclusive) && index < _source.Count - _minIndexInclusive) { _current = _source[_minIndexInclusive + index]; ++_state; return true; } Dispose(); return false; } |
_minIndexInclusive
and _maxIndexInclusive
range.Conclusion
With .NET 9, LINQ becomes faster in several common scenarios. As with every new version of .NET, you simply need to migrate and recompile to take advantage of these improvements. Additionally, LINQ has been optimized in other ways: SIMD is utilized whenever possible, such as when summing a sequence of integers. Moreover, enumerating empty sequences incurs lower costs due to early detection.
One final note: the web is increasingly inundated with AI-generated crap content. Search engines struggles to differentiate between valuable, handcrafted content and inferior material. If you appreciate this article and others like it that are thoughtfully created, please consider sharing it.