Using Span<T> to improve performance of C# code
In my experience, the main thing to do in order to improve application performance is to reduce the number and duration of IO-calls. However, once this option is exercised another path that developers take is using memory on stack. Stack allows very fast allocation and deallocation although it should be used only for allocating small portions since stack size is pretty small. Also, using stack allows reducing pressure on GC. In order to allocate memory on stack, one uses value types or stackalloc
operator combined with the usage of unmanaged memory.
The second option is rarely used by developers since API for unmanaged memory access is quite verbose.
Span<T>
is a family of value types that arrived in C# 7.2 which is an allocation-free representation of memory from different sources. Span<T>
allows developers to work with regions of contiguous memory in more convenient fashion ensuring memory and type safety.
Span implementation
Ref return
The first step in wrapping head around Span<T>
implementation for those who don’t closely follow updates in C# language is learning about ref returns which were introduced in C# 7.0.
While most of the readers are familiar with passing method argument by reference, now C# allows returning a reference to a value instead of the value itself.
Let us examine how it works. We’ll create a simple wrapper around an array of prominent musicians which exhibits both traditional behavior and new ref return feature.
public class ArtistsStore
{
private readonly string[] _artists = new[] { "Amenra", "The Shadow Ring", "Hiroshi Yoshimura" };
public string ReturnSingleArtist()
{
return _artists[1];
}
public ref string ReturnSingleArtistByRef()
{
return ref _artists[1];
}
public string AllAritsts => string.Join(", ", _artists);
}
Now let’s call those methods
var store = new ArtistsStore();
var artist = store.ReturnSingleArtist();
artist = "Henry Cow";
var allArtists = store.AllAritsts; //Amenra, The Shadow Ring, Hiroshi Yoshimura
artist = store.ReturnSingleArtistByRef();
artist = "Frank Zappa";
allArtists = store.AllAritsts; //Amenra, The Shadow Ring, Hiroshi Yoshimura
ref var artistReference = ref store.ReturnSingleArtistByRef();
artistReference = "Valentyn Sylvestrov";
allArtists = store.AllAritsts; //Amenra, Valentyn Sylvestrov, Hiroshi Yoshimura
Ref structs
As we know value types might be allocated on stack. Also, they not necessarily do depending on the context where the value is used. In order to make sure that the value is always allocated on stack the concept of ref struct was introduced in C# 7.0. Span<T>
is a ref struct
so we are sure that is always allocated on stack.
Span implementation
Span<T>
is a ref struct
which contains a pointer to memory and length of the span similar to below.
public readonly ref struct Span<T>
{
private readonly ref T _pointer;
private readonly int _length;
public ref T this[int index] => ref _pointer + index;
...
}
ref
modifier near the pointer field. Such construct can’t be declared in a plain C# in .NET Core it is implemented via ByReference<T>
.
So as you can see indexing is implemented via ref return
which allows reference-type-like behavior for stack-only struct.
Span limitations
To ensure that the ref struct
is always used on stack it possesses a number of limitations i.e. including that they can’t be boxed, they can’t be assigned to variables of type object
, dynamic
or to any interface type, they can’t be fields in a reference type, and they can’t be used across await
and yield
boundaries. In addition, calls to two methods, Equals
and GetHashCode
, throw a NotSupportedException
. Span<T>
is a ref struct
.
Using Span instead of string
Reworking existing codebase
Let’s examine code that converts Linux permissions to octal representation. You can access it here Here is the original code
internal class SymbolicPermission
{
private struct PermissionInfo
{
public int Value { get; set; }
public char Symbol { get; set; }
}
private const int BlockCount = 3;
private const int BlockLength = 3;
private const int MissingPermissionSymbol = '-';
private readonly static Dictionary<int, PermissionInfo> Permissions = new Dictionary<int, PermissionInfo>() {
{0, new PermissionInfo {
Symbol = 'r',
Value = 4
} },
{1, new PermissionInfo {
Symbol = 'w',
Value = 2
}},
{2, new PermissionInfo {
Symbol = 'x',
Value = 1
}} };
private string _value;
private SymbolicPermission(string value)
{
_value = value;
}
public static SymbolicPermission Parse(string input)
{
if (input.Length != BlockCount * BlockLength)
{
throw new ArgumentException("input should be a string 3 blocks of 3 characters each");
}
for (var i = 0; i < input.Length; i++)
{
TestCharForValidity(input, i);
}
return new SymbolicPermission(input);
}
public int GetOctalRepresentation()
{
var res = 0;
for (var i = 0; i < BlockCount; i++)
{
var block = GetBlock(i);
res += ConvertBlockToOctal(block) * (int)Math.Pow(10, BlockCount - i - 1);
}
return res;
}
private static void TestCharForValidity(string input, int position)
{
var index = position % BlockLength;
var expectedPermission = Permissions[index];
var symbolToTest = input[position];
if (symbolToTest != expectedPermission.Symbol && symbolToTest != MissingPermissionSymbol)
{
throw new ArgumentException($"invalid input in position {position}");
}
}
private string GetBlock(int blockNumber)
{
return _value.Substring(blockNumber * BlockLength, BlockLength);
}
private int ConvertBlockToOctal(string block)
{
var res = 0;
foreach (var (index, permission) in Permissions)
{
var actualValue = block[index];
if (actualValue == permission.Symbol)
{
res += permission.Value;
}
}
return res;
}
}
public static class SymbolicUtils
{
public static int SymbolicToOctal(string input)
{
var permission = SymbolicPermission.Parse(input);
return permission.GetOctalRepresentation();
}
}
The reasoning is pretty straightforward: string
is an array of char
, so why not allocate it on stack instead of heap.
So our first goal is to mark field _value
of SymbolicPermission
as ReadOnlySpan<char>
instead of string
. To achieve this we must declare SymbolicPermission
as ref struct
since field or property cannot be of type Span<T>
unless it’s an instance of a ref struct
.
internal ref struct SymbolicPermission
{
...
private ReadOnlySpan<char> _value;
}
Now we just change every string
within our reach to ReadOnlySpan<char>
. The only point of interest is GetBlock
method since here we replace Substring
with Slice
.
private ReadOnlySpan<char> GetBlock(int blockNumber)
{
return _value.Slice(blockNumber * BlockLength, BlockLength);
}
Evaluation
Let’s measure the outcome
We notice the speed up which accounts for 50 nanoseconds which is about 10% of performance improvement. One can argue that 50 nanoseconds are not that much but it costed almost nothing for us to achieve it!
Now we’re going to evaluate this improvement on permission having 18 blocks of 12 characters each to see whether we can gain significant improvements.
As you can see we’ve managed to gain 0.5 microsecond or 5% performance improvement. Again it may look like a modest achievement. But remember that this was really low hanging fruit.
Using Span instead of arrays
Let’s expand on arrays of other types. Consider the example from ASP.NET Channels pipeline. The reasoning behind the code below is that data often arrives in chunks over the network which means that the piece of data may reside in multiple buffers simultaneously. In the example such data is parsed to int
.
public unsafe static uint GetUInt32(this ReadableBuffer buffer) {
ReadOnlySpan<byte> textSpan;
if (buffer.IsSingleSpan) { // if data in single buffer, it’s easy
textSpan = buffer.First.Span;
}
else if (buffer.Length < 128) { // else, consider temp buffer on stack
var data = stackalloc byte[128];
var destination = new Span<byte>(data, 128);
buffer.CopyTo(destination);
textSpan = destination.Slice(0, buffer.Length);
}
else {
// else pay the cost of allocating an array
textSpan = new ReadOnlySpan<byte>(buffer.ToArray());
}
uint value;
// yet the actual parsing routine is always the same and simple
if (!Utf8Parser.TryParse(textSpan, out value)) {
throw new InvalidOperationException();
}
return value;
}
Let’s break it down a bit about what happens here. Our goal is to parse the sequence of bytes textSpan
into uint
.
if (!Utf8Parser.TryParse(textSpan, out value)) {
throw new InvalidOperationException();
}
return value;
textSpan
. The input parameter is an instance of a buffer that can read a sequential series of bytes. ReadableBuffer
is inherited from ISequence<ReadOnlyMemory<byte>>
which basically means that it consists of multiple memory segments.
In case buffer consists of a single segment we just use the underlying Span
from the first segment.
if (buffer.IsSingleSpan) {
textSpan = buffer.First.Span;
}
Span<byte>
based on it.
var data = stackalloc byte[128];
var destination = new Span<byte>(data, 128);
buffer.CopyTo(destination)
wich iterates over each memory segment of a buffer and copies it to a destination Span
. After that we just slice a Span
of buffer’s length.
textSpan = destination.Slice(0, buffer.Length);
Span<T>
API allows us to work with memory manually allocated on a stack in a much more convenient fashion than prior to its arrival.
Using Span instead of List
Let’s get back to a section where we’ve introduced Span<cahr>
instead of string. As you might recall static factory of SymbolicPermission
class expects ReadOnlySpan<char>
as input.
public static SymbolicPermission Parse(ReadOnlySpan<char> input)
{
...
return new SymbolicPermission(input);
}
We in turn provide string
to the factory and everything compiles smoothly.
public static int SymbolicToOctal(string input)
{
var permission = SymbolicPermission.Parse(input);
return permission.GetOctalRepresentation();
}
string
is an array of char
we might expect similar behavior from List<char>
. After all they both are just indexed collections that utilize array
under the hood.
But in fact things are not so rosy
The way to combat this is to use CollectionsMarshal.AsSpan
helper method
public static int SymbolicToOctal(List<char> input)
{
var permission = SymbolicPermission.Parse(CollectionsMarshal.AsSpan(input));
return permission.GetOctalRepresentation();
}
F# support
.NET Core is not limited to C# only. In one of my previous blogposts I’ve mentioned some reasons why you might consider F#. Since 4.5 version F# also supports Span<T>
so let’s delegate some of the converting Linux permissions functionality to F# helper and see if it can keep up with C# in terms of performance.
We’ll declare Helpers
type which will calculate octal representation.
[<Struct>]
type PermissionInfo(symbol: char, value: int) =
member x.Symbol = symbol
member x.Value = value
type Helpers =
val private Permissions : PermissionInfo[]
new () = {
Permissions =
[|PermissionInfo('r', 4);
PermissionInfo('w', 2);
PermissionInfo('x', 1); |]
}
member x.ConvertBlockToOctal (block : ReadOnlySpan<char>) =
let mutable acc = 0
for i = 0 to x.Permissions.Length - 1 do
if block.[i] = x.Permissions.[i].Symbol then
acc <- acc + x.Permissions.[i].Value
else
acc <- acc
acc
Permissions
array is marked as val
. As documentation states it allows declaring a location to store a value in a class or structure type, without initializing it.
Calling it in C# is seamless.
var block = GetBlock(i);
res += new Helpers().ConvertBlockToOctal(block) * (int)Math.Pow(10, BlockCount - i - 1);
Although F# version allocates more memory execution time difference is quite impressive.
Conclusion
Span<T>
provides safe and easy to use alternative to stackalloc
which allows easy to get performance improvement. While gain from each usage of it is relatively small the consistent usage of it allows to avoid what is known as a death by thousand cuts. Span<T>
is widely used across .NET Core 3.0 codebase which allowed to get a perfomance improvement comparing to the previous version.
Here are some things you might consider when you decide whether you should use Span<T>
:
If your method accepts an array of data and doesn’t change its size. If you don’t modify an input you might consider
ReadOnlySpan<T>
.If your method accepts a string to count some statistics or to perform a syntactical analysis you should accept
ReadOnlySpan<char>
.If your method returns a short array of data you can return
Span<T>
with the help ofSpan<T> buf = stackalloc T[size]
. Remember thatT
should be a value type.