Numba v0.60.0 not compiling on GPUs #276

jpmorgan98 · 2025-01-21T18:55:16Z

This issue I am seeing specifically on Lassen but @alexandermote also reported it on Tioga. When in the current version of dev minimum numba versioning is moving to 0.60.0. This means we don't need to patch numba going forward for the vector atomic issue. It seems that they might have changed the syntax for generating IR code.

To replicate install current dev of mcdc and main of harmonize. Then on CUDA or AMD run any problem with:

python input.py --mode=numba --target=gpu

The pertinent Python trackback is the first chunk which reports:

Traceback (most recent call last):
  File "/usr/WS1/morgan83/pp-mcdc/kobyashi/mcdc_gpu/input.py", line 78, in <module>
    mcdc.run()
  File "/usr/WS1/morgan83/MCDC/mcdc/main.py", line 67, in run
    data_arr, mcdc_arr = prepare()
                         ^^^^^^^^^
  File "/usr/WS1/morgan83/MCDC/mcdc/main.py", line 1132, in prepare
    build_gpu_progs(input_deck, config.args)
  File "/usr/WS1/morgan83/MCDC/mcdc/loop.py", line 843, in build_gpu_progs
    src_spec = gpu_sources_spec()
               ^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/MCDC/mcdc/loop.py", line 422, in gpu_sources_spec
    return adapt.harm.RuntimeSpec("mcdc_source", adapt.state_spec, base_fns, async_fns)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/harmonize/harmonize/python/runtime.py", line 277, in __init__
    self.generate_code(gpu_platform)
  File "/usr/WS1/morgan83/harmonize/harmonize/python/runtime.py", line 826, in generate_code
    self.generate_async_ptx(RuntimeSpec.cache_path,suffix,gpu_platform)
  File "/usr/WS1/morgan83/harmonize/harmonize/python/runtime.py", line 753, in generate_async_ptx
    ir_text  = extern_device_ir(fn,self.type_map,suffix,platform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/harmonize/harmonize/python/codegen.py", line 99, in extern_device_ir
    ir_text, res_type = device_ir(func,platform)
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/harmonize/harmonize/python/codegen.py", line 86, in device_ir
    ptx, res_type = config.cuda.compile_ptx_for_current_device(func,fn_arg_ano(func),device=True,debug=config.DEBUG,opt=(not config.DEBUG))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/workspace/morgan83/conda_lassen/envs/mcdc_env/lib/python3.11/site-packages/numba/cuda/compiler.py", line 391, in compile_ptx_for_current_device
    return compile_ptx(pyfunc, sig, debug=debug, lineinfo=lineinfo,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/workspace/morgan83/conda_lassen/envs/mcdc_env/lib/python3.11/site-packages/numba/cuda/compiler.py", line 380, in compile_ptx
    return compile(pyfunc, sig, debug=debug, lineinfo=lineinfo, device=device,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

and the very end

File "/usr/workspace/morgan83/conda_lassen/envs/mcdc_env/lib/python3.11/site-packages/numba/core/untyped_passes.py", line 125, in run_pass
    raise TypeError("Signature mismatch: %d argument types given, "
TypeError: Signature mismatch: 4 argument types given, but function takes 5 arguments
^CERROR:  One or more process (first noticed rank 0) terminated with signal 2

The full traceback is attached but probably not important. trace_back.txt

The Numba Side

So here is the documentation for the compile_ptx_for_current_device() from Numba/0.58.0

numba.cuda.compile_ptx_for_current_device(pyfunc, sig, debug=False, lineinfo=False, device=False, fastmath=False, opt=True)

and here's the same documentation from Numba/0.60.0

numba.cuda.compile_ptx_for_current_device(pyfunc, sig, debug=False, lineinfo=False, device=False, fastmath=False, opt=True, abi='numba', abi_info=None)

The options definitely changed but this shouldn't be an issue as they are not required. @alexandermote can you confirm you are using numba>=0.60.0

The text was updated successfully, but these errors were encountered:

alexandermote · 2025-01-21T18:58:32Z

Correct; I am on numba==0.61.0. This is the exact error I was getting.

jpmorgan98 · 2025-01-21T19:02:04Z

Ok actually I think this is an issue with the current branch of dev no numba. I reverted to main (release) and it seems to have fixed it so some PR did this. Good lord I need to get GPU CI working again.

@ilhamv any thoughts?

ilhamv · 2025-01-22T01:42:55Z

@jpmorgan98 Can you try tracing back the commits and see at which point it starts to break the GPU mode?

jpmorgan98 added bug Something isn't working as intended gpu numba Numba-related issues labels Jan 21, 2025

jpmorgan98 assigned alexandermote, braxtoncuneo and jpmorgan98 Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numba v0.60.0 not compiling on GPUs #276

Numba v0.60.0 not compiling on GPUs #276

jpmorgan98 commented Jan 21, 2025 •

edited

Loading

alexandermote commented Jan 21, 2025

jpmorgan98 commented Jan 21, 2025

ilhamv commented Jan 22, 2025

Numba v0.60.0 not compiling on GPUs #276

Numba v0.60.0 not compiling on GPUs #276

Comments

jpmorgan98 commented Jan 21, 2025 • edited Loading

The Numba Side

alexandermote commented Jan 21, 2025

jpmorgan98 commented Jan 21, 2025

ilhamv commented Jan 22, 2025

jpmorgan98 commented Jan 21, 2025 •

edited

Loading