You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to monitor the completion of jobs submitted to Slurm, we use files and filesystem polling.
Depending on the polling frequency, this introduces some performance cost (delay between the end of the task and the time when the computation manager identifies it as completed), and some load on the underlying filesystem, in particular when multiple processes using a computation manager are running.
What is the expected behavior?
We could be able to configure the way the completion monitoring is performed.
Polling will be one implementation of this functionality.
Other interesting implementations would be :
A very simple in house networking protocol, for example implemented with netty.
Using a message broker (kafka, rabbitmq ...) : this should probably be left for implementation by client projects
What is the motivation / use case for changing the behavior?
Improving perceived performances while relieving the filesystem.
Please tell us about your environment:
powsybl-hpc version: 2.7.0
The text was updated successfully, but these errors were encountered:
Yes, but the problem is that even in "local" mode, there are good chances that the flag dir is actually on a shared filesystem, for instance a nfs mount, so that slurm nodes can access it. In that case, the watch service will probably not work (or be implemented with polling).
Feature
In order to monitor the completion of jobs submitted to Slurm, we use files and filesystem polling.
Depending on the polling frequency, this introduces some performance cost (delay between the end of the task and the time when the computation manager identifies it as completed), and some load on the underlying filesystem, in particular when multiple processes using a computation manager are running.
We could be able to configure the way the completion monitoring is performed.
Polling will be one implementation of this functionality.
Other interesting implementations would be :
Improving perceived performances while relieving the filesystem.
The text was updated successfully, but these errors were encountered: