-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cuvs v24.12 #523
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) - rhdong (https://github.com/rhdong) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #352
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Contributes to rapidsai/build-planning#94 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #392
1) Replace the domain name from `raft` to `cuvs` to avoid confusion when using tools such as NSYS to analyze the program timeline. 2) Use C++17 feature `__has_include` instead of a CMake script to find out if NVTX available in the benchmark executable. It turns out our CMake check has been not reliable due to not finding include directories correctly. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Micka (https://github.com/lowener) URL: #400
Branch 24.12 merge 24.10
Contributes to rapidsai/build-planning#106 Proposes specifying the RAPIDS version in `conda install` calls that install CI artifacts, to reduce the risk of CI jobs picking up artifacts from other releases. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #406
Add a command-line option to disable the CUDA event/stream synchronization on each iteration. Only one sync is done per benchmark loop in this case instead. As a result, the measured QPS is observed due to: 1) A small `cudaEventSynchronize` is removed from the benchmark loop; 2) If a GPU algorithm has little to no sync between the GPU and CPU, the kernel launch latency and other CPU overheads are completely hidden. The new option is experimental and disabled by default. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #405
This package is available in `dependencies.yaml`, but due to an oversight was not added to conda metas. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - James Lamb (https://github.com/jameslamb) URL: #408
Remove a collection of unneccesarily complex CMake logic. Major change is that we explicitly opt-in to building the C API bindings by default since it is a hard requirement for our python bindings, and the project has numerous conditions to disable it. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #416
This attempts to rename `cuvs_bench` to `cuvs-bench` and similarly for the CPU package. This follows from this thread: rapidsai/docker#715 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) - Dante Gama Dessavre (https://github.com/dantegd) URL: #417
Fixes an issue in CI computations of sccache hit rates. See #414 for details.
Follow-up PR to address feedback: rapidsai/raft#2474 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #422
I noticed it was missing while switching Milvus to cuVS Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #461
`.view()` is required Authors: - tsuki (https://github.com/enp1s0) Approvers: - Micka (https://github.com/lowener) URL: #484
This PR adds an option to build the full HNSW hierarchy on the CPU when converting a CAGRA index to an hnswlib index. This lets us enable an `extend()` API. For hnswlib: 1. Update to `v0.7.0` 2. Remove dependency as symbols are compiled within DSO Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) - Corey J. Nolet (https://github.com/cjnolet) URL: #465
Wheel-building CI jobs are failing like this: > Checking final_dist/cuvs_cu12-25.2.0a26-cp310-cp310-manylinux_2_28_aarch64.whl: FAILED due to warnings > WARNING `long_description` missing. > Error: Process completed with exit code 1. ([build link](https://github.com/rapidsai/cuvs/actions/runs/12133882036)) Looks like the root cause is a combination of the following: * there was a new `twine` release (6.x) 3 days ago: https://pypi.org/project/twine/#history * it contains pypa/twine#1168, which makes `twine check --strict` fail if the wheel's `long_description` is empty * the `cuvs` wheel README (used as the wheel `long_description`) is empty This proposes adding a small README, with just 2 sentences copied from the project's root-level README, to get past that check. ## Notes for Reviewers The `long_description` becomes the project homepage when a project is hosted on PyPI. The wheels produced from this repo aren't currently being published to pypi.org so this change won't be seen there, but a more user-friendly README should be added if/when we decide to publish `cuvs-cu{11,12}` to pypi.org. ref: rapidsai/build-planning#70
This notebook is adapting the Question Retrieval nb to use Milvus. It can serve as a good example on how to do Bulk ingest, how to use cuVS, and especially CAGRA+HNSW on Milvus Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #451
Ref : #472 ## The cause of the bug The bitonic sort was used on an array that was not a power of 2 long. In the current search implementation, the bitonic sort is used to move the invalid elements to the end of the buffer as: https://github.com/rapidsai/cuvs/blob/5062594138a40231475299c7bac61083b0669fd1/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L758-L763 https://github.com/rapidsai/cuvs/blob/5062594138a40231475299c7bac61083b0669fd1/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L644-L649 The problem is that the (max) array length (=`MAX_ITOPK + MAX_CANDIDATES`) is not always the power of two. These bitonic sorts are called even if no elements are filtered out unless `cuvs::neighbors::filtering::none_sample_filter` is specified as the filter, so #472 occurs. ## Fix This PR changes the filtering process so that the bitonic sort is not used to move the invalid elements to the end of the buffer. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Artem M. Chirkin (https://github.com/achirkin) URL: #489
Let me know if this is out of scope for cuVS! Authors: - Ajit Mistry (https://github.com/ajit283) - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) URL: #311
Non-blocking / stream-ordered dynamic batching as a new index type. ## API This PR implements dynamic batching as a new index type, mirroring the API of other indices. * [_building is wrapping_] Building the index means creating a lightweight wrapper on top of an existing index and initializing necessary components, such as IO batch buffers and synchronization primitives. * [_type erasure_] The underlying/upstream index type is erased once the dynamic_batching wrapper is created, i.e. there's no way to recover the original search index type or parameters. * [_explicit control over batching_] To allow multiple user requests group into a dynamic batch request, the users must use copies of the same dynamic batching index (the user-facing index type is a thin wrapper on top of a shared pointer, hence the copy is shallow and cheap). The search function is thread-safe. ## Feature: stream-ordered dynamic batching Non-blocking / stream-ordered dynamic batching means the batching does not involve synchronizing with a GPU stream. The control is returned to the user as soon as the necessary work is submitted to the GPU. This entails a few good-to-know features: 1. The dynamic batching index has the same blocking properties as the upstream index: if the upstream index does not involve stream sync during search, that the dynamic batching index does not involve it as well (otherwise, the dynamic batching search obviously waits till the upstream search synchronizes under the hood). 2. It's responsibility of the user to synchronize the stream before getting the results back - even if the upstream index search does not need it (the batch results are scattered back to the request threads in a post-processing kernel). 3. If the upstream index does not synchronize during search, the dynamic batching index can group the queries even in a single-threaded application (_try it with --no-lap-sync option in the ann-bench benchmarks_). Overall, stream-ordered dynamic batching makes it easy to modify existing cuVS indexes, because the wrapped index has the same execution behavior as the upstream index. ## Work-in-progress TODO - [x] Add dynamic batching option to more indices in ann-bench - [x] Add tests - [x] **(postponed to 25.02)** Do proper benchmarking and possibly fine-tune the inter-thread communication - [x] Review the API side (`cpp/include/cuvs/neighbors/dynamic_batching.hpp`) [ready for review CC @cjnolet] - [x] Review the algorithm side (`cpp/src/neighbors/detail/dynamic_batching.cuh`) [ready for preliminary review: requests for algorithm docsting/clarifications are especially welcome] Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #261
First draft for scalar quantization. WIP status: * only int8_t target type * quantile computation inefficient (via sampling & sorting) Authors: - Malte Förster (https://github.com/mfoerste4) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #494
Skip some checks involving hard-coded offsets into the data when the number of records in the checked PQ list is smaller than needed. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #512
Due to the tight integration between cuvs and raft, we need to ensure that cuvs is updated for rapidsai/raft#2503 or builds of cuvs that rely on cloning raft will get an incompatible version of cutlass due to raft's update. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Ben Frederickson (https://github.com/benfred) URL: #516
Partially addresses #455 Authors: - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) Approvers: - Ben Frederickson (https://github.com/benfred) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) URL: #463
…466) PR allows calculating ground truth for cuvs-bench on CPU systems. Current version uses a simple NumPy brute force, perhaps we should consider using faiss? cc @cjnolet @divyegala Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #466
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-24.12
and v24.12 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-24.12
until release (merging of this PR).What is the purpose of this PR?
branch-24.12
intomain
for the release