.NET 8 Update: Hardware Intrinsics#1744
.NET 8 Update: Hardware Intrinsics#1744pCYSl5EDgo wants to merge 28 commits intoMessagePack-CSharp:developfrom
Conversation
|
My environment. All benchmarks were done without dynamic PGO. Report column descriptionMethod
Job
Setting
Baseline is ulong[] serialize benchmark report (x5 in .NET8, x3 in .NET6)
long[] serialize benchmark report (x2 in .NET8, x2 in .NET6)
float[] serialize benchmark report (x5 in .NET8, x2.5 in .NET6)
double[] serialize benchmark report (x5 in .NET8, x1.5 in .NET6)
bool[] deserialize benchmark report (x16~33 in .NET8, x1.4 in .NET6)
bool[] serialize benchmark report (x10~50 in .NET8, x1.5 in .NET6)
short[] serialize benchmark report (x1.5~10 in .NET8, x2 in .NET6)
ushort[] serialize benchmark report (x3~4 in .NET8, x2.5 in .NET6)
int[] serialize benchmark report (x2~4 in .NET8, x1.2~2 in .NET6)
uint[] serialize benchmark report (x5 in .NET8, x4 in .NET6)
All benchmarks are done. |
AArnott
left a comment
There was a problem hiding this comment.
I will prepare a clean commit log Pull Request when you give the go-ahead.
I'll wait for this.
|
I've rewrited formatters in C# 9 and moved them to in main Unsafe.AsRef is silently changed from I found this PR is dependent on #1734. |
SIMD does not support integer division and 64bit integer multiplication So DateTime serialization is very poor
|
This comment is written for later developers and describes why DateTimeArrayFormatter does not use SIMD.
Reference SharpLab assembly code. DateTime serialize without SIMD benchmark report (x2 in both)
|
|
@pCYSl5EDgo I haven't merged this as it's still marked Draft. I'm curious what your intention is for this PR going forward. |
This is a follow up pull request of #988.
Goals
Improve
(U?Int16|32|64)|Single|Double|BooleanArrayFormatterwith SIMD instruction and make them about twice as fast in .NET 8.List<T>,ArraySegment<T>and(ReadOnly)?Memory<T>s can also be accelerated by this.Without SIMD in .NET 6, the performance is generally improved by this proposal.
History
3 years have passed and .NET 7 introduced many convenient SIMD Hardware Intrinsics as I explained in this Japanese article.
SIMD Intrinsics between .NET Core3.1 and .NET 6 required fixed statement and unsafe pointer operation.
In .NET 7,
Vector.LoadUnsafe(ref T source)emerged which requires reference of type T.No, well, you end up having to go through the pseudo-pointer operations with the
Unsafeclass, but it is an advantage that fixed statement is unnecessary..NET 7 also added a lot of crossplatform SIMD instructions. There is no need to write a lot of platform dependent branches any more!
Changes
Finally, I am now able to write code that is (and I hope (must be)) more understandable to others than it was before.
I ran BenchmarkDotnet on my machine and found that this SIMD improvement performed about the same as the previous implementation on short arrays, which SIMD does not do well, and 2 to 5 times better on long arrays, which SIMD does well.
Annotation
This Draft Pull Request is for performance measurement and is not intended to be actually merged.
I will prepare a clean commit log Pull Request when you give the go-ahead.