Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[URGENT] Power issues #1581

Closed
psyhtest opened this issue Jan 25, 2024 · 5 comments · Fixed by #1584 or mlcommons/power-dev#325
Closed

[URGENT] Power issues #1581

psyhtest opened this issue Jan 25, 2024 · 5 comments · Fixed by #1584 or mlcommons/power-dev#325

Comments

@psyhtest
Copy link
Contributor

psyhtest commented Jan 25, 2024

We've stumbled upon new submission checker issues with the latest power workflow.

Whitespace lines at the end of spl.txt files

For some reasons, spl.txt files now terminate with a whitespace line (not entirely blank, probably a space there, as len(line) == 1). This causes exceptions like exception in check_power_dir: list index out of range.

We can work around this with the following patch:

krai@aus655-pci-bowie:~/work_collection/mlperf_inference_git_master$ git diff
diff --git a/tools/submission/power/power_checker.py b/tools/submission/power/power_checker.py
index 5adcd19..ac9cb0a 100755
--- a/tools/submission/power/power_checker.py
+++ b/tools/submission/power/power_checker.py
@@ -408,6 +408,7 @@ def phases_check(

         with open(spl_fname) as f:
             for line in f:
+                if not line.startswith("Time"): continue
                 timestamp = (
                     datetime.strptime(line.split(",")[1], datetime_format)
                 ).replace(tzinfo=timezone.utc)
diff --git a/tools/submission/submission_checker.py b/tools/submission/submission_checker.py
index e61590e..3b2ffe6 100755
--- a/tools/submission/submission_checker.py
+++ b/tools/submission/submission_checker.py
@@ -2230,6 +2230,7 @@ def get_power_metric(config, scenario_fixed, log_path, is_valid, res):
     power_list = []
     with open(spl_fname) as f:
         for line in f:
+            if not line.startswith("Time"): continue
             timestamp = (
                 datetime.datetime.strptime(line.split(",")[1], datetime_format)
                 + server_timezone

I suggest to make a PR with this change, because we've collected a number of logs with this issue and they are checksum'ed to include this line.

tools/submission/power/sources_checksums.json is 6 months old

Despite multiple changes to mlcommons/power-dev since the last round, the checksum file has not been updated since 2 August 2023.

@arjunsuresh
Copy link
Contributor

@psyhtest Can you please give a PR for this in the power-dev repository? Sources checksum is updated here - can be copied to the inference repository. Since the change won't break any collected result, its safe to merge it.

@arjunsuresh
Copy link
Contributor

This PR should copy automatically copy the checksum file to the inference repo from now on.

@psyhtest
Copy link
Contributor Author

Thank you, @arjunsuresh. I've created PR #1584 for the workaround. @pgmpablo157321, ideally, it should be merged when the Submission UI opens for testing.

I will also open an issue in the power-dev repo to investigate how trailing whitespace lines have appeared in spl.txt files recently.

@arjunsuresh
Copy link
Contributor

Thank you @psyhtest. I believe this change introduced the extra new line. Unfortunately no one else has tested a power submission so far for 4.0

@psyhtest
Copy link
Contributor Author

As Seymour Cray famously said: "I hate to be a pioneer" :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants