Skip to content

Commit

Permalink
Fix bug in slurm jobid recovery code.
Browse files Browse the repository at this point in the history
A missing return statement caused incorrect text to be assigned to the jobid.
  • Loading branch information
christopherwharrop-noaa committed May 30, 2019
1 parent 8e2985f commit 59f51d7
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions lib/workflowmgr/slurmbatchsystem.rb
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,10 @@ def submit(task)
queued_jobs=""
errors=""
exit_status=0

# Wait a few seconds for information to propagate before trying to look if job was still submitted
sleep(5)

begin

# Get the username of this process
Expand Down Expand Up @@ -316,9 +320,9 @@ def submit(task)
# Look for a job that matches the randomID we inserted into the comment
queued_jobs.split("\n").each { |job|

# Skip headers
next if job=~/CLUSTER/
next if job=~/JOBID/
# Skip headings
next if job[0..4] == 'JOBID'
next if job[0..7] == 'CLUSTER:'

# Extract job id
jobid=job[0..39].strip
Expand All @@ -331,6 +335,10 @@ def submit(task)
end
}

WorkflowMgr.stderr("WARNING: Unable to retrieve jobid after sbatch failed with socket time out when submitting #{task.attributes[:name]}",1)

return nil,output

else
WorkflowMgr.stderr("WARNING: job submission failed: #{output}", 1)
return nil,output
Expand Down

0 comments on commit 59f51d7

Please sign in to comment.