Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc cleanup for HELM Capabilities #3274

Merged
merged 6 commits into from
Jan 15, 2025
Merged

Conversation

yifanmai
Copy link
Collaborator

@yifanmai yifanmai commented Jan 15, 2025

  • Switch aggregation to mean
  • Use rescaled rather than raw wildbench score
  • Move MMLU-Pro from lite_run_specs.py to capabilities_run_specs.py
  • Clean up run spec names
  • Change some default arguments
  • Add revisions for Hugging Face datasets
  • Remove unused parameter in wildbench
  • Switch BigCodeBench version from v0.1.2 to v0.1.3

@@ -103,6 +108,11 @@ metrics:
short_display_name: WB Score
description: Score of the AI output judged by GPT-4o.
lower_is_better: false
- name: wildbench_score
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be _rescaled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks for catching. This pull request was missing a commit. Should be fixed now.

@yifanmai yifanmai merged commit 6d70e98 into main Jan 15, 2025
8 checks passed
@yifanmai yifanmai deleted the yifanmai/fix-capabilities-cleanup branch January 15, 2025 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants