Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/5.0-rc2] Disable arm64 intrinsics in UTF-16 validation code paths #42064

Merged
merged 1 commit into from
Sep 10, 2020

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Sep 10, 2020

Backport of #42052 to release/5.0-rc2 to fix #41699, which is a performance regression in UTF8/UTF-16 code paths that was introduced during our ARM64 Optimization effort during Preview 8 and discovered during our 5.0 perf sign-off effort.

Customer Impact

Disables the ARM64-intrinsicified UTF-16 validation code paths since they ended up having poorer performance characteristics for non-ASCII data compared to their 3.x counterparts. With this PR, we skip the ARM64 intrinsics entirely but stay within vectorized code as much as possible. The codegen and perf should be on-par with what shipped in 3.x.

Here are the regressions that were found in Utf8Encoding.GetByteCount that this PR would resolve:

Method Runtime Input Mean Ratio
GetByteCount .NET Core 3.1 EnglishAllAscii 38.00 us 1.00
GetByteCount .NET Core 5.0 EnglishAllAscii 40.66 us 1.07
GetByteCount .NET Core 3.1 EnglishMostlyAscii 117.50 us 1.00
GetByteCount .NET Core 5.0 EnglishMostlyAscii 221.40 us 1.88
GetByteCount .NET Core 3.1 Chinese 53.34 us 1.00
GetByteCount .NET Core 5.0 Chinese 90.21 us 1.69
GetByteCount .NET Core 3.1 Cyrillic 45.35 us 1.00
GetByteCount .NET Core 5.0 Cyrillic 76.01 us 1.68
GetByteCount .NET Core 3.1 Greek 58.36 us 1.00
GetByteCount .NET Core 5.0 Greek 97.41 us 1.67

Testing

We have extensive unit test coverage of this area and all tests are passing. We still need to complete performance runs with this fix merged into master, but we will collect that data to verify that the regressions are eliminated before merging this into RC2.

Risk

Low. We are bypassing the intrinsics code path and falling back to the previous code path used, with good unit test coverage across the area.

Once this change is in, we can re-introduce tweaked versions of the ARM64 code path, validate the perf numbers, and consider a cherry-pick of the "real" fix into the 5.x servicing branches if we see performance improvements that would justify the servicing.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@danmoseley
Copy link
Member

danmoseley commented Sep 10, 2020

@GrabYourPitchforks or @adamsitnik could one of you please fill in template?

@danmoseley danmoseley added Servicing-consider Issue for next servicing release review area-System.Runtime labels Sep 10, 2020
@jeffhandley
Copy link
Member

I'll fill in the template....

@danmoseley
Copy link
Member

@dotnet/dnceng this is a new one:

.packages\microsoft.dotnet.helix.sdk\5.0.0-beta.20453.7\tools\Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(54,5): error : (NETCORE_ENGINEERING_TELEMETRY=Helix) RestApiException: The response contained an invalid status code 429 

Body: {"Message":"API rate limit exceeded. Maximum allowed 90 per 1m.","ActivityId":"e924d70d316431438a6526b0eeb243c1"}
   at Microsoft.DotNet.Helix.Client.Job.OnPassFailFailed(Request req, Response res) in /_/src/Microsoft.DotNet.Helix/Client/CSharp/generated-code/Job.cs:line 322
   at Microsoft.DotNet.Helix.Client.Job.PassFailAsync(String job, CancellationToken cancellationToken) in /_/src/Microsoft.DotNet.Helix/Client/CSharp/generated-code/Job.cs:line 302
   at Microsoft.DotNet.Helix.Sdk.WaitForHelixJobCompletion.WaitForHelixJobAsync(String jobName, String queueName, CancellationToken cancellationToken) in /_/src/Microsoft.DotNet.Helix/Sdk/WaitForHelixJobCompletion.cs:line 38
   at Microsoft.DotNet.Helix.Sdk.WaitForHelixJobCompletion.ExecuteCore(CancellationToken cancellationToken) in /_/src/Microsoft.DotNet.Helix/Sdk/WaitForHelixJobCompletion.cs:line 26
   at Microsoft.DotNet.Helix.Sdk.HelixTask.Execute() in /_/src/Microsoft.DotNet.Helix/Sdk/HelixTask.cs:line 51

https://github.com/dotnet/runtime/pull/42064/checks?check_run_id=1096236869

@riarenas
Copy link
Member

@danmosemsft This was fallout from switching to the new service fabric API which was reverted today. New runs should not see this: https://github.com/dotnet/core-eng/issues/10759

@danmoseley
Copy link
Member

Thanks @riarenas

@danmoseley
Copy link
Member

Presumably this means there was a hole in our microbenchmarks (for non ASCII) -- is that tracked by dotnet/performance#1512?

@jeffhandley
Copy link
Member

Presumably this means there was a hole in our microbenchmarks (for non ASCII) -- is that tracked by dotnet/performance#1512?

That is correct; thanks for calling that out, @danmosemsft.

@jeffhandley jeffhandley changed the title [release/5.0-rc2] Temporarily disable arm64 intrinsics in UTF-16 validation code paths [release/5.0-rc2] Disable arm64 intrinsics in UTF-16 validation code paths Sep 10, 2020
@jeffhandley jeffhandley added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Sep 10, 2020
@jeffhandley
Copy link
Member

@GrabYourPitchforks / @adamsitnik - This backport was approved, but before we merge into RC2, we need to re-run the benchmarks and verify that we got the expected results. Could one of you do that and post the results here please?

@jeffhandley
Copy link
Member

I overlooked @adamsitnik's comment on the PR for master.

@carlossanlop - are you set up to be able to build this and run the benchmarks?

@carlossanlop
Copy link
Member

@jeffhandley I can verify this in my WSL2. I'll run the performance tests before and after the changes in this PR.

@carlossanlop
Copy link
Member

carlossanlop commented Sep 10, 2020

I built the runtime repo in my WSL2 using release/5.0-rc2 as the base, and release/5.0-rc2 + this PR's change cherry-picked as the diff. Results:

root@calopearm:/home/calope/performance/src/tools/ResultsComparer# /home/calope/runtime/.dotnet/dotnet run --base /home/calope/pitchforks_before/ --diff /home/calope/pitchforks_after/ --threshold 0.01%
summary:
better: 8, geomean: 1.520
worse: 1, geomean: 1.156
total diff: 9

| Slower                                                             | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------------------------------ | ---------:| ----------------:| ----------------:| --------:|
| System.Text.Perf_Utf8Encoding.GetByteCount(Input: EnglishAllAscii) |      1.16 |         29047.75 |         33568.81 |         |

| Faster                                                                | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| --------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Text.Perf_Utf8Encoding.GetByteCount(Input: Cyrillic)           |      2.08 |         67897.12 |         32697.20 | several?|
| System.Text.Perf_Utf8Encoding.GetByteCount(Input: Chinese)            |      1.96 |         63074.59 |         32121.14 |         |
| System.Text.Perf_Utf8Encoding.GetByteCount(Input: Greek)              |      1.79 |         69171.05 |         38577.66 |         |
| System.Text.Perf_Utf8Encoding.GetByteCount(Input: EnglishMostlyAscii) |      1.71 |        157169.23 |         91817.25 |         |
| System.Text.Perf_Utf8Encoding.GetBytes(Input: EnglishMostlyAscii)     |      1.32 |        287968.96 |        218270.16 |         |
| System.Text.Perf_Utf8Encoding.GetString(Input: Chinese)               |      1.28 |        390277.00 |        305710.81 |         |
| System.Text.Perf_Utf8Encoding.GetBytes(Input: Chinese)                |      1.27 |        240803.25 |        190089.55 |         |
| System.Text.Perf_Utf8Encoding.GetBytes(Input: Greek)                  |      1.07 |        268047.54 |        250588.14 | several?|

@GrabYourPitchforks
Copy link
Member

Thank you for the confirmation, Carlos! :)

@jeffhandley
Copy link
Member

Thanks, @carlossanlop! That data is super clear and matches expectations based on what we previously observed. I'll go ahead and merge.

@jeffhandley jeffhandley merged commit 9e70a06 into release/5.0-rc2 Sep 10, 2020
@jeffhandley jeffhandley deleted the backport/pr-42052-to-release/5.0-rc2 branch September 10, 2020 23:53
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants