Skip to content

Commit

Permalink
Update paper.md
Browse files Browse the repository at this point in the history
  • Loading branch information
AndySAnker authored Feb 20, 2024
1 parent fc437da commit 9d27527
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ CLASS `DebyeCalculator`:

In order to benchmark our implementation, we compare simulated scattering patterns from `DebyeCalculator` against DiffPy-CMI [@juhas2015complex], which is a widely recognised software for scattering pattern computations. Here, our implementation obtains the same scattering patterns as DiffPy-CMI (\autoref{fig:figure_S1}), while being faster on CPU for structures up to ~20,000 atoms (\autoref{fig:figure_1}). Both calculations are run on a x86-64 CPU with 64GB of memory and a batch size of 10,000.
Running the calculations on the GPU provides another notable boost in speed (\autoref{fig:figure_1}). This improvement primarily stems from the distribution of the double sum calculations across a more extensive set of cores than is feasible on the CPU. With smaller atomic structures, an overhead associated with initiating GPU calculations results in the NVIDIA RTX A3000 Laptop GPU computations being slower than DiffPy-CMI and our CPU implementation. Once the atomic structure size exceeds ~14 Å in diameter (~300 atoms), we observe a ~5 times speed-up using an NVIDIA RTX A3000 Laptop GPU with 6GB of memory and a batch size of 10,000.
The choice of GPU hardware has a substantial influence on this speed advantage. As demonstrated in \autoref{fig:figure_1}, using an NVIDIA Titan RTX GPU, which offers 24GB of memory, the speed benefits become even more evident. The NVIDIA Titan RTX GPU delivers a performance that is ~10 times faster, seemingly across all structure sizes, underlining the significant role of the hardware. With the advancements of GPUs like NVIDIA's Grace Hopper Superchip [@NVIDIA], which boasts 576GB of fast-access to memory, there is potential for `DebyeCalculator` to achieve even greater speeds in the future.
The choice of GPU hardware has a substantial influence on this speed advantage. As demonstrated in \autoref{fig:figure_1}, using an NVIDIA Titan RTX GPU, which offers 24GB of memory, the speed benefits become even more evident. The NVIDIA Titan RTX GPU delivers a performance that is ~10 times faster, seemingly across all structure sizes, underlining the significant role of the hardware. With the advancements of GPUs like NVIDIA's Grace Hopper Superchip [@NVIDIA], which boasts 624GB of fast-access to memory, there is potential for `DebyeCalculator` to achieve even greater speeds in the future.

![Computation-time comparison of the $G(r)$ calculation using our CPU- and GPU-implementations against DiffPy-CMI [@juhas2015complex]. For the CPU-implementation, a batch size of 10,000 was chosen (x86-64 CPU with 6GB). Both the GPU implementations were run with a batch size of 10,000 (NVIDIA RTX A3000 Laptop GPU with 6GB of memory and NVIDIA Titan RTX GPU with 24GB of memory). The mean and standard deviation of the PDF simulation times are calculated from 10 runs. Note that, due to limited memory, the Laptop GPU was unable to handle structures larger than ~24,000 atoms. A [CIF](https://github.com/FrederikLizakJohansen/DebyeCalculator/blob/main/data/AntiFluorite_Co2O.cif) from an antifluorite structure was used to generate this data.\label{fig:figure_1}](../figures/figure_1.png)

Expand Down

0 comments on commit 9d27527

Please sign in to comment.