-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] Add best practices/advice with respect to using pool allocators #1694
Comments
Side thought: Maybe we should experiment with replacing the |
Maybe, though we'd have the usual static destruction problems, so we'd never explicitly free that memory pool. It might also be problematic in the multiple library case where one library is not configured with a specific pool, so makes an allocation from the |
I was thinking by default the async resource uses the default pool, which we would not own. Maybe I'm misremembering how it's implemented. |
The async resource will use the pool managed by the CUDA driver, which we do not own and would probably be fine. Ideally everyone would use that and then all pooling would be handled by the driver. If we use the async mr by default and a different library does not but constructs their own pool manually using a different underlying allocation routine (e.g. cudaMalloc instead of cudaMallocAsync), then we could conflict. |
In |
My mistake, I didn't realize that we were allocating from a specific pool that we created. The failure mode should still be relatively graceful if two processes both use the async allocation routines and one pool blocks another's growth. I don't think it will be as graceful if you mix and match async with non-async allocation, but I could be wrong there. |
I believe the reason that Perhaps, however, we should wait to make this default change until we can start using the |
RMM has multiple pool-like allocators:
pool_memory_resource
that wraps a coalescing best fit suballocator around an upstream resource;arena_memory_resource
that similarly wraps around an upstream resource but divides the global allocation into size-binned arenas to mitigate fragmentation when allocating/deallocating;cuda_async_memory_resource
that uses the memory pool implementation provided bycudaMallocAsync
. This one can avoid fragmentation because it is in control of the virtual address space.Since these are all composable, one can happily wrap a
pool_memory_resource
around acuda_async_memory_resource
(or an arena, ...). But should one?It would be useful if the documentation provided some guidance on which combinations make sense, and what typical allocation scenarios best fit a particular pool.
We should also recommend best practices for picking initial pool sizes: a bad choice here can lead to overfragmentation.
The text was updated successfully, but these errors were encountered: