How we got 14% performance boost to Uno Platform 2.2 over 2.1 release

The performance improvements made in Uno Platform 2.2 when compared with 2.1 are spread across multiple categories:

Memory pressure, with 40% of arrays reuse
try/finally optimizations for WebAssembly, with 10x improvements
GC Handles pressure, with 2.8x over reusing handles
JavaScript tweaking, ranging with 20% and 10x faster operations
Finalizers cost, with 4x improvement in objects creation
UP NEXT IN 2/3 : Last night we tweeted about a WebAssembly improvement which we believe will give further 35% boost to interpreted mode performance. We will include this improvement in Uno Platform 2.3 or later.

For the Windows Community Toolkit DataGrid, this translates to a 14% increase in load time performance under WebAssembly.

We’ve been looking at performance in order to optimize runtime performance for all platforms, and in particular for WebAssembly. There are lots left analyze in different parts of the framework to enhance the experience for the DataGrid, specifically.

Memory Pressure Updates (40% of array reuse)

Memory pressure is an important part of the performance characteristics, and in recent iterations of the .NET BCL, the System.Memory package has provided many tools to help.

One of tool is the ArrayPool class, which allows the renting and releasing of arrays of at least at certain size. This technique is very useful when objects tend to create and release lots of arrays, which is the case for the DependencyObject mechanism.

For example, each instance of DependencyObject maintains a list of its own DependencyProperties as a sorted array. During the initialization of the sample application Uno uses for its own unit testing, here’s how the arrays from the pool were used:

This means that 40% of those arrays do not need to be recreated or garbage collected, and less time for the GC to spend doing that.

try/finally optimizations for WebAssembly (10x faster)

As of the current specification of WebAssembly, exceptions are not supported and need to be emulated through JavaScript. While this may seem like an anecdotic part, try/finally blocks are used pretty much everywhere, and in many cases through syntactic sugar.

For example, the using keyword creates a try/finally block to ensure a disposable is invoked even in case of an exception. Similar uses can be found for foreach with enumerables, lock for monitors and pretty much any Linq operator.

When dealing with try/finally blocks like this one:

try { MyMethod(); } finally { Console.WriteLine(“finally”); }

the .NET runtime is doing something like this:

try { WrapJavaScript(() => MyMethod()); } finally { Console.WriteLine(“”); }

Here’s what it looks like when benchmarking, using the excellent Benchmark.NET for a hundred empty method calls:

TryFinallyTesting.SingleCall (Interpreter):
Mean = 2.2655 us, StdErr = 0.0215 us (1.04%); N = 5, StdDev = 0.0481 us
TryFinallyTesting.WithTryFinally (Interpreter):
Mean = 22.7151 us, StdErr = 0.2735 us (1.20%); N = 5, StdDev = 0.6115 us
TryFinallyTesting.SingleCall (AOT):
Mean = 2.2474 us, StdErr = 0.0194 us (0.79%); N = 5, StdDev = 0.0434 us
TryFinallyTesting.WithTryFinally (AOT)::
Mean = 9.8859 us, StdErr = 0.2506 us (2.53%); N = 5, StdDev = 0.5602 us

The performance difference is very significant when using the interpreter, somehow less without AOT, yet considering try/finally blocks a sprinkled all around the BCL and Uno, altering some code paths can give some interesting benefits.

To work around this, we’re made some changes where try/finally blocks were not necessary, or in some cases were not required all the time. In other cases, at present time, try/finally blocks are simply conditionally removed at the expense of a more precise error handling in Uno, particularly in DependencyObject.

GC Handles Pressure (2.8x faster)

GC Handles are an important part of the Uno infrastructure to make sure that memory does not leak, but those are also interesting when used in conjunction with the new Span and read-only struct C# 8.0 features.

A while back, we added an optimization in the Grid control, for which lots of the computations were built around using Spans and Memory instead of arrays. At the time, this brought a good deal of performance improvements.

The move to span based computation brought all sorts of improvement like this one (using AOT on WebAssembly):

SpanTesting.EnumerableSum: [Items=20]
Mean = 8.1537 us, StdErr = 0.2720 us (3.34%); N = 5, StdDev = 0.6083 us

SpanTesting.SpanSum: [Items=20]
Mean = 2.6413 us, StdErr = 0.0522 us (1.98%); N = 5, StdDev = 0.1168 us

But we also had to work around an issue where we had to put an object reference in a read-only struct, something that is not supported by the .NET runtime.
To work around this issue, we had to create a GCHandle to an instance, a type that can be placed in a struct.

Something that was missed at the time of writing that code, is that all the object instances that need a GCHandle are DependencyObjects, and all dependency object have a WeakReference already created that the Grid control can reuse.

As such, here’s a performance comparison of code that creates GCHandles then gets the instance from the handle 100 times, and one that only gets the instance from an existing handle, with the interpreter:

GCHandleTesting.NewHandles (Interpreter):
Mean = 150.5836 us, StdErr = 2.6788 us (1.78%); N = 5, StdDev = 5.9899 us
GCHandleTesting.ReuseHandle(Interpreter)::
Mean = 33.3696 us, StdErr = 0.6236 us (1.87%); N = 5, StdDev = 1.3944 us
GCHandleTesting.NewHandles (AOT):
Mean = 98.8505 us, StdErr = 0.8746 us (0.88%); N = 5, StdDev = 1.9557 us
GCHandleTesting.ReuseHandle (AOT):
Mean = 19.5913 us, StdErr = 0.2486 us (1.27%); N = 5, StdDev = 0.5558 us

This is a significant difference, considering Grids are a fundamental part of the layout techniques used in XAML.

Javascript Tweaking (20% and 10x improvements)

On the JavaScript side, Uno uses a few object maps to associate DOM objects with their XAML counterparts, or map .NET methods to a JavaScript functions. Originally in Uno, those tables were built around using integers for .NET and number types, assuming that mapping number would be the fastest, but not in JavaScript land.

First optimization, .NET provides a number as an ID for DOM elements, and converting that number to a string makes for a faster lookup, by about 23%.
Those strings also have to be created from a number, and while javascript supports myString = String(myNumber), it seems that it is lot faster to do myString = “” + myNumber, and it’s about 10 times faster.

Those optimizations are used in small but very high traffic portions of the Uno code.

Finalizers Cost (4x improvement)

This one is an adage from .NET, where if you don’t need a finalizer, don’t add one. If you add one, you’d better have a very good reason for they are very expensive.
In order to manage ArrayPools renting and releasing, internal structures of the DependencyObject class had successive improvements that required the use of finalizers. Yet in the end, managing the lifetime of the individual items of the object graph ended up being unnecessary, and moving the finalizers to a single finalizer for a graph is also giving interesting results.

For instance, here’s the difference for the creation of instance with and without a finalizer, using WebAssembly and the interpreter:

FinalizerBenchmark.WithFinalizer (Interpreter):
Mean = 57.6490 us, StdErr = 1.3018 us (2.26%); N = 5, StdDev = 2.9110 us
FinalizerBenchmark.WithoutFinalizer (Interpreter):
Mean = 17.2081 us, StdErr = 0.4114 us (2.39%); N = 5, StdDev = 0.9200 us
FinalizerBenchmark.WithFinalizer (AOT):
Mean = 62.3575 us, StdErr = 3.7519 us (6.02%); N = 5, StdDev = 8.3896 us
FinalizerBenchmark.WithoutFinalizer (AOT):
Mean = 16.9974 us, StdErr = 0.2383 us (1.40%); N = 5, StdDev = 0.5329 us

Those results are interesting, as the difference between AOT and the interpreter with objects with a finalizer, is in the error margin. This shows that the time spent creating a class is almost entirely done by the runtime, which is always built using WebAssembly code.

Trying it out with BenchmarkDotNet

You can give a try to some of the benchmarks mentioned here by running the app of this repo:

https://github.com/unoplatform/Uno.Samples/tree/master/UI/Benchmark

This application uses a custom version of Benchmark.NET compatible with WebAssembly, to allow for tests to run by blocking the UI Thread, but this custom build won’t be needed once Threading is widely available in browsers.

Wrapping up

Every little update count, particularly when dealing with hot execution paths, and we’re going to be continuing to work on improving Uno across the board.

About Uno Platform
For those new to Uno Platform – it enables for creation of single-source C# and XAML apps which run natively on iOS and Android, macOS and Web via WebAssembly. (or #WinUIEverywhere). Uno Platform is Open Source (Apache 2.0) and available on GitHub. To learn more about Uno Platform, see how it works, or create a small sample app.

Jerome Laban, on behalf of Uno Platform team.

How we got 14% performance boost to Uno Platform 2.2 over 2.1 release

Memory Pressure Updates (40% of array reuse)

try/finally optimizations for WebAssembly (10x faster)

GC Handles Pressure (2.8x faster)

Javascript Tweaking (20% and 10x improvements)

Finalizers Cost (4x improvement)

Trying it out with BenchmarkDotNet

Wrapping up

Tools

Platform

Migration

Resources

Support