Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Failed to find native CUDA module #33

Open
scutcsq opened this issue Jan 11, 2024 · 10 comments
Open

RuntimeError: Failed to find native CUDA module #33

scutcsq opened this issue Jan 11, 2024 · 10 comments

Comments

@scutcsq
Copy link

scutcsq commented Jan 11, 2024

RuntimeError: Failed to find native CUDA module, make sure that you compiled the code with K2_WITH_CUDA.

@csukuangfj
Copy link

Could you describe how you installed fast_rnnt?

@scutcsq
Copy link
Author

scutcsq commented Jan 11, 2024

Could you describe how you installed fast_rnnt?

I used pip to install fast_rnnt. Now I have installed the k2 and the problem is solved by using the function in k2.

@bene-ges
Copy link

Hi, we had the same error after the successful building fast_rnnt for AMD using Rocm 5.4 with correct installed pytorch 2.0.1 and torchaudio 0.15.2

File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/rnnt_loss.py", line 533, in rnnt_loss
    scores_and_grads = mutual_information_recursion(
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/mutual_information.py", line 294, in mutual_information_recursion
    scores = MutualInformationRecursionFunction.apply(
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/mutual_information.py", line 157, in forward
    ans = _fast_rnnt.mutual_information_forward(px, py, boundary, p)
RuntimeError: Failed to find native CUDA module, make sure that you compiled the code with K2_WITH_CUDA.

We want to use only fast_rnnt without k2. We installed it via build from source

git clone https://github.com/danpovey/fast_rnnt.git
cd fast_rnnt
export FT_MAKE_ARGS="-j32"
pip install --verbose fast_rnnt

@bene-ges
Copy link

It seems that Rocm isn't supported in the build.
-- No NVCC detected. Disable CUDA support

@pkufool
Copy link
Contributor

pkufool commented Jan 19, 2024

@bene-ges Basically if pytorch can run on Rocm, fast_rnnt can also run on it. Will have a look at this issue. Thanks!

@danpovey
Copy link
Collaborator

But the core of fast_rnnt is the CUDA code, no? And I believe Rocm does not use cuda? So would require rewrite to support that??

@bene-ges
Copy link

bene-ges commented Jan 20, 2024

@danpovey, rocm can compile CUDA code into the amd binary. Most of projects just add the rocm compile commands like Pytorch does. So the Pytorch build system can be an example of right solution Docs

Example of conversion of CUDA code to ROCm code and its compilation (matrix-cuda is just example of cuda code)
on ubuntu:
git clone https://github.com/lzhengchun/matrix-cuda
cd matrix-cuda
/opt/rocm-5.3.0/bin/hipify-clang matrix_cuda.cu
After this a file matrix_cuda.cu.hip will appear which is source code for ROCm.
Then it can be compiled with HIPCC
/opt/rocm-5.3.0/bin/hipсс matrix_cuda.cu.hip
After this file a.out will appear

@bene-ges
Copy link

another useful link on porting CUDA (all notations almost identical)
https://www.lumi-supercomputer.eu/preparing-codes-for-lumi-converting-cuda-applications-to-hip/

@bene-ges
Copy link

I can help with testing on amd if needed

@danpovey
Copy link
Collaborator

danpovey commented Feb 18, 2024

OK that's interesting. If it's possible for you to add support for ROCM into our build system (which is I think not entirely trivial), then I think we'd appreciate that very much. This kind of thing will no doubt be used more frequently in the future.
(Also: apologies for the very late response.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants