-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate rearchitecture of the native memory circuit breaker #1582
Comments
One more thing I would like to see in this reach. would be we tracking the native memory usage during indexing and training steps too. I understand that indexing is more CPU intensive but if we are redesigning CB from scratch we should look cover all the places where we use native memory. |
@navneet1v we actually do track memory during training https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/training/TrainingJob.java#L120-L142, but yes this will include tracking during indexing too |
Thanks for clarification. Lets track memory during indexing. |
@jmazanec15 on this item do you have any thoughts or it is still in ideation phase. |
Right, I think we need to replace guava cache with our own implementation that enforces capacity more strictly. In other words, whats reflected as allocated in the cache, is very consistent with what we have allocated. The guava cache does not seem to be designed for such a use case. A big challenge around this will be maintaining existing functionality:
|
How about |
The issue is if we have maintenance threads, then it ends up leaving room for inconsistency (i.e. load before free). In the case of large allocations from graphs, this can lead to node drops. |
Which one is it and did we verified it?
I think as we are using |
I think I was referring to https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/LocalCache.java#L109-L133.
I cant remember on this. I do know we run free via executor so it will be async. I thought that if you invalidate an entry manually, it will not explicitly trigger. But it needs to be verified. |
Description
Wanted to put out an issue to track potentially rearch our current circuit breaker/native memory management system. Currently, our circuit breaking logic has a couple flaws and can be confusing to users. I think that we should consider re-architecting it to be more in line with the JVM heap based circuit breaker that OpenSearch uses (https://github.com/opensearch-project/OpenSearch/blob/main/libs/core/src/main/java/org/opensearch/core/common/breaker/CircuitBreaker.java)
A couple issues I have with our circuit breaker:
Proposed Solution
I think we should investigate migrating towards the approach OpenSearch has taken for the jvm circuit breaker limits. This would allow:
The text was updated successfully, but these errors were encountered: