Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform specific prefetcher #396

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Platform specific prefetcher #396

wants to merge 11 commits into from

Conversation

Scooletz
Copy link
Contributor

@Scooletz Scooletz commented Sep 26, 2024

This PR migrates the prefetching from SSE based to an OS specific solution. The prefetch happens on a separate thread that accumulates pages to be prefetched in a channel and schedules them in bulks. Similar implementation could be provided for not needing pages anymore, but it would require either nop implementation on Windows or a better understanding of advising on Win. DiscardVirtualMemory cannot be used as it destroys the content which is not what is needed here. What is needed is a hint that the given area won't be used anymore.

The following methods are used to hint that given addresses will be used:

  1. PrefetchVirtualMemory on Windows
  2. madvise on Linux

While testing stats gathering, issuing prefetches when .Accept on a page is called increased the IO throughput significantly.

If only we had dotnet/runtime#59776 in .NET itself 😭

@Scooletz Scooletz added operating system OS related issue, that may require intrinsic knowledge of low level OS stuff 🐌 performance Perofrmance related issue labels Sep 26, 2024
@@ -24,6 +27,18 @@ public sealed class MemoryMappedPageManager : PointerPageManager
private readonly MemoryMappedViewAccessor _whole;
private readonly unsafe byte* _ptr;

// Prefetcher
private const int PrefetcherCapacity = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it better to make this a configurable parameter instead of a hardcoded value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could that configuration parameter be used to turn-off prefetching altogether in case it is not required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, will add such capability.

@@ -24,6 +27,18 @@ public sealed class MemoryMappedPageManager : PointerPageManager
private readonly MemoryMappedViewAccessor _whole;
private readonly unsafe byte* _ptr;

// Prefetcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New to Github PR reviews, can't find a way to comment on the existing line of code.
But anyway, the comment above describing that "prefetching is futile" needs to updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it could be made more specific as this is about the Sequential flag as we don't do Sequential because clearly it's a random access all over the place. Will amend.


while (await reader.WaitToReadAsync())
{
PrefetchImpl(reader, this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A basic question: Since the prefetcher runner thread is running asynchronously, is it theoretically possible that for a certain address it executes the prefetch after the actual read on that address is already done? Prefetcher will be able to find that the address is already in memory, but still these calls can effectively double the number of IOs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that it will run after the page is fetched. And yes, it's possible that it will be fetching over pages that are already there. We could then discuss whether to leave SSE-based soft prefetch and leave this implementation for heavier use.

The place that I'd still see it as is, would be the AbandonedList that prepares the next page that will be copied to. In this case, the page is returned, then copied to etc while the next is scheduled for a prefetch. This should give it enough time.

The next place that it works (just following the raw execution tiem) is visiting. We schedule prefetching for all the children up front and as it walks in the preorder, beside the first child the rest most likely will have time to be prefetched. Again, raw execution time shows a very nice boost (also, disk throughput is much higher)

The one place that makes it less easy to reason about is the DataPage.TryGet. Maybe, in this case we should use SSE prefetch. The page, if it faults, it will take more time anyway than a search across page (now ~10ns). So it might be the case that we'd like to skip issuing a costly OS call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying! Make sense to me to use two different variants based on the usage 😄

@Scooletz
Copy link
Contributor Author

@dipkakwani Summing up: maybe SSE prefetch shoudl be still available as a soft version and should be able to be used depending on the call site (DataPage)

Copy link

Code Coverage

Package Line Rate Branch Rate Health
Paprika 84% 79%
Summary 84% (4743 / 5631) 79% (1577 / 1986)

Minimum allowed line rate is 75%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operating system OS related issue, that may require intrinsic knowledge of low level OS stuff 🐌 performance Perofrmance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants