Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical Dr.Jit compiler failure: cuda_check(): API error 0718 (CUDA_ERROR_INVALID_PC): "invalid program counter" in D:\a\drjit\drjit\ext\drjit-core\src\eval.cpp:395 #296

Open
aantg opened this issue Oct 8, 2024 · 8 comments

Comments

@aantg
Copy link

aantg commented Oct 8, 2024

Hello. Got this error after updating NVIDIA driver to latest version 565.90 even with "Hello World" example from Mitsuba3 documentation (using 'cuda_ad_rgb' variant of course). Rolling back to previous driver version (561.09) make this error disappear.

Looks like there's some incompatibility?

DrJit 0.4.6 + mitsuba 3.5.2

@njroussel
Copy link
Member

Hi @aantg

What OS and GPU model are you using?

In my personal experience these type of errors are often related to a faulty driver installation.

@tatue64
Copy link

tatue64 commented Nov 9, 2024

Can confirm this on Manjaro Linux, NVIDIA driver 565.57.01, cuda 12.6.2 (but downgrading cuda did not help).

This error occurs with drjit 0.4.6 and Mitsuba 3.5.2, but also with drjit 1.0.0 in the current development version Mitsuba 3.6 when compiled in Release mode (clang 18.1.8 ). "llmv_ad_rgb" always works. Somewhat surprisingly, the error does not occur if the program is compiled in Debug mode.

In the latter case (Mitsuba 3.6) the scene from the tutorial "editing_a_scene" notebook works with "cuda_ad_rgb", but fails, for example, when the "diffuse" material is replaced with "roughplastic".

This may be a driver problem (as far as I know, this is a beta driver), but the dependence of the error on the compilation mode and the dependence on scene parameters suggest a problem with drjit.

===================================

2024-11-09 18:31:12 INFO  main  [mitsuba.cpp:334] Mitsuba version 3.6.0 (master[a8a03722], Linux, 64bit, 64 threads, 8-wide SIMD)
2024-11-09 18:31:12 INFO  main  [mitsuba.cpp:335] Copyright 2022, Realistic Graphics Lab, EPFL
2024-11-09 18:31:12 INFO  main  [mitsuba.cpp:336] Enabled processor features: cuda llvm avx2 avx fma f16c sse4.2 x86_64
2024-11-09 18:31:12 INFO  main  [xml.cpp:1380] Loading XML file "scenes/test.xml" with variant "cuda_ad_rgb"..
2024-11-09 18:31:13 INFO  main  [Scene] Building scene in OptiX ..
2024-11-09 18:31:13 INFO  main  [Scene] OptiX ready. (took 54ms)
2024-11-09 18:31:13 INFO  main  [xml.cpp:1398] Done loading XML file "scenes/test.xml" (took 1.346s).
2024-11-09 18:31:13 INFO  main  [SamplingIntegrator] Starting render job (1028x516, 528 samples)
2024-11-09 18:31:13 INFO  main  [SamplingIntegrator] Computation graph recorded. (took 4ms)
2024-11-09 18:31:13 INFO  main  [SamplingIntegrator] Code generation finished. (took 16ms)

Dr.Jit encountered an unrecoverable error and will now shut
down. Please re-run your program in debug mode to check for
out-of-bounds reads, writes, and other sources of undefined
behavior. You can do so by calling

   dr.set_flag(dr.JitFlag.Debug, True)

at the beginning of the program. If these additional checks
fail to pinpoint the problem, then you have likely found a
bug. We are happy to help investigate and fix the problem if
you can you create a self-contained reproducer and submit it
at https://github.com/mitsuba-renderer/drjit.

The error message of this specific failure is as follows:
>>> cuda_check(): API error 0718 (CUDA_ERROR_INVALID_PC): "invalid program counter" in /home/xxx/progs/mitsuba_x/mitsuba36/ext/drjit/ext/drjit-core/src/init.cpp:462.

@Kyriota
Copy link

Kyriota commented Nov 25, 2024

I have also encountered this problem:

Critical Dr.Jit compiler failure: cuda_check(): API error 0718 (CUDA_ERROR_INVALID_PC): "invalid program counter" in D:\a\drjit\drjit\ext\drjit-core\src\eval.cpp:395.

Env:

Platform: Windows 11
Python Ver: 3.10.0
Mitsuba Ver: 3.5.0
Dr Jit Ver: 0.4.4

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Oct_30_01:18:48_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.90                 Driver Version: 565.90         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080      WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   46C    P8             14W /  320W |    1097MiB /  16376MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

@devmdxac
Copy link

devmdxac commented Nov 25, 2024

Hello.

I also have the issue. I tried rolling back to a previous version of Nvidia driver but that did not fix the issue.

I am on Fedora 41
Nvidia Driver Version: 565.57.01
Python 3.13.0
DrJit 1.0.1
Mitsuba 3.6 (very last stable release)

I used pip install mitsuba to install the mitsuba renderer. I also tried to compile it, but I end up with the same error message (even though I must say that it has not been easy to compile Mitsuba on the last Fedora release, and I do not know how well I can rely on my installation) .

Would it be relevant to try Mitsuba 3.5 and if so could you let me know if I need to roll back to drjit 0.4.6 and how to do so. Everything seems to work well when I use the 'llvm_ad_rgb' variant or any other variant in general.

Best,
Happy to provide with any further details if needed.

@tatue64
Copy link

tatue64 commented Nov 25, 2024

This problem is related to the NVIDIA driver version 565, not to the mitsuba 3.x version, cuda version, compiler, linux kernel etc. For me, only downgrading this driver (in my case to 550.135) helped. Version 560 does probably also work.

Maybe simply a bug in the beta driver, but maybe also an incompatibility of drjit with new procedure implemented in newer driver.

@devmdxac
Copy link

Hello,

It indeed seems to work now (even if I did not test many tutorials, the one that did not run correctly before downgrading are now working as expected). Thank you very much.

I for sure did not downgrade the driver correctly before and probably introduce issues when trying to build Mitsuba.

The Nvidia driver version I am currently using is 560.35.03.

Thank you again for your help.

Best wishes

@xacond00
Copy link

This is a critical issue, yet nobody talks about it. Downgrading drivers isn't always an option.

@wjakob
Copy link
Member

wjakob commented Dec 18, 2024

Dear all -- this is a serious problem indeed. But I don't think this is our fault—a change made between NVIDIA driver version 561 and 565.9 causes a miscompilation of very simple programs.

I filed a reproducer with NVIDIA. But even if this is fixed in an upcoming release, I don't think that we will be able to work around it for the existing drivers. If so, we will likely push out a new release of Dr.Jit that just errors out with an error message telling the user to install a newer or older driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants