Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix index tensor device mismatch by moving to CPU #195

Closed
wants to merge 1 commit into from

Conversation

recisic
Copy link

@recisic recisic commented Jan 13, 2025

Description

This PR ensures the index tensor is sent to the CPU for full unrolling SCF mode during batched computations on a CUDA device. The tensor conv = self.mixer.converged may be a CUDA tensor as it is derived from deltas in the mixer class:

conv = self.mixer.converged
if conv.any():
# Simultaneous convergence does not require culling.
# Occurs if batch size equals amount of True in `conv`.
if guess.shape[0] == conv.count_nonzero():
q_converged = q
converged[:] = True
culled = False
break
# save all necessary variables for converged system
iconv = idxs[conv]

@property
def converged(self) -> Tensor:
"""
Tensor of bools indicating convergence status of the system(s).
A system is considered to have converged if the maximum absolute
difference between the current and previous systems is less than
the ``tolerance`` value.
"""
# Check that mixing has been conducted
if self.delta is None:
raise RuntimeError("Nothing has been mixed")
if self._batch_mode == 0:
delta_norm = torch.norm(self.delta)
else:
# norm goes over all dims except first (batch dimension)
dims = tuple(range(-(self.delta.ndim - 1), 0))
delta_norm = torch.norm(self.delta, dim=dims)
return delta_norm < self.options["x_tol"]

Fixes #194.

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.04%. Comparing base (92a7e77) to head (43fabc2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #195   +/-   ##
=======================================
  Coverage   84.04%   84.04%           
=======================================
  Files         200      200           
  Lines        9846     9846           
  Branches     1125     1125           
=======================================
  Hits         8275     8275           
  Misses       1216     1216           
  Partials      355      355           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marvinfriede
Copy link
Member

Thanks for reporting and fixing the issue!

However, instead of moving to the CPU, I prefer to initialize everything on the correct device. Usually, all tensors receive the device keyword argument, but in the full SCF, there some tensors without. I found more instances in the full SCF code, which is why I created a separate PR (#196).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index device error for batched GPU calculations
2 participants