Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[:bug: bug report] Incompatible function arguments (probably related to int32 variable indexing) #108

Open
DoeringChristian opened this issue Dec 8, 2022 · 9 comments

Comments

@DoeringChristian
Copy link
Contributor

DoeringChristian commented Dec 8, 2022

Hi,
I think I have found a bug related to the recent change allowing more than 4 billion variables.
When optimizing a large scene I get the error after 910 optimization steps.

File "/home/doeringc/workspace/python/zed/ddtar/zed/test01/out/final/../../difr.py", line 95, in step
    img = mi.render(
  File "/home/doeringc/workspace/cpp/mitsuba3-test/build/python/mitsuba/python/util.py", line 518, in render
    return dr.custom(_RenderOp, scene, sensor, params, integrator,
  File "/home/doeringc/workspace/cpp/mitsuba3-test/build/python/drjit/router.py", line 5613, in custom
    Type.ad_add_edge_(index, tmp_in.index_ad)
TypeError: ad_add_edge_(): incompatible function arguments. The following argument types are supported:
    1. (src_index: int, dst_index: int, cb: handle = None) -> None

Invoked with: 2148013987, 2148014840

This seems pretty ironic since 2148013987 and 2148014840 are both integers in the python sense.
In the ad_add_edge interface the arguments are however defined as int32_t C++ types here or here. I'm not quite sure which definition is the correct one for this issue but I imagine that since the 4 billion variables update not all bindings where updated to use int64_t types.
Currently I don't have the time for it but otherwise I would try to fix it myself since it seems to easy at first glance.
Thanks for your Help.

@Speierers
Copy link
Member

Hi @DoeringChristian ,

Variable indices are still stored using uint32_t. Moreover, the indices passed to ad_add_edge_ are indices of variables in the AD graph.

Could you please provide a minimal reproducer script so we can take a look at this issue on our side?

@DoeringChristian
Copy link
Contributor Author

Hi @Speierers,
unfortunately i cannot reproduce the error with a simpler setup at the moment. I'll try some things to be sure that this is not an issue in my code. This could take a while though since the error only appears after 910 steps and each of them takes about 3min.

@vid8687
Copy link

vid8687 commented May 16, 2024

@DoeringChristian did you find an alternate reason for this ? I've noticed the same issue that shows up only after a large number of iterations.

@DoeringChristian
Copy link
Contributor Author

No unfortunately I haven't het gotten around to testing this yet.

@merlinND
Copy link
Member

I believe this will be fixed in the upcoming version of DrJit, where the numbering scheme has been changed.

@vid8687
Copy link

vid8687 commented May 16, 2024

Thanks! Will watch out for that drjit update.

@vid8687
Copy link

vid8687 commented May 22, 2024

I notice the above numbering issue when the iterations are high even when I have a sequence of fairly small optimization iterations. What is the "correct" way to run something like this, is there a way to "reset" any graph numbering since each internal loop is independent. This would fail with the error mentioned above at some value of loop count but enough of them run okay.
For e.g.:

for i in range(loop_count): #loop count ~ 200
    #setup parameters
    #differentiating over 3 textures ~150^2 pixels
    opt = mi.ad.Adam(..)
    for it in range(opt_iteration_count): #opt_iteration_count ~30
        #render, backward, step
        #update parameters
    #this probably doesn't help, but delete and garbage collect
    del opt 
    gc.collect() 

@vid8687
Copy link

vid8687 commented Jun 3, 2024

@merlinND any suggestions on the above?

@merlinND
Copy link
Member

merlinND commented Jun 3, 2024

Hello @vid8687,

Until the release of the next version, I believe that the simplest way would be to start iterations of your outer for loop in different processes.
I am not aware of a way to reset the variable counter in the current version (@njroussel, please let us know if you know a way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants