Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{ai,bio}[foss/2023a] Geneformer v0.1.0-20241204, accelerate v0.33.0 #21994

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

PetrKralCZ
Copy link
Collaborator

@PetrKralCZ PetrKralCZ commented Dec 9, 2024

(created using eb --new-pr)
resolves vscentrum/vsc-software-stack#472

Copy link

github-actions bot commented Dec 9, 2024

Updated software accelerate-0.33.0-foss-2023a.eb

Diff against accelerate-0.33.0-foss-2023a-CUDA-12.1.1.eb

easybuild/easyconfigs/a/accelerate/accelerate-0.33.0-foss-2023a-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/a/accelerate/accelerate-0.33.0-foss-2023a-CUDA-12.1.1.eb b/easybuild/easyconfigs/a/accelerate/accelerate-0.33.0-foss-2023a.eb
index 7f6d595a7c..65b13726cb 100644
--- a/easybuild/easyconfigs/a/accelerate/accelerate-0.33.0-foss-2023a-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/a/accelerate/accelerate-0.33.0-foss-2023a.eb
@@ -2,7 +2,6 @@ easyblock = 'PythonBundle'
 
 name = 'accelerate'
 version = '0.33.0'
-versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://github.com/huggingface/accelerate'
 description = """A simple way to launch, train, and use PyTorch models on almost any device and 
@@ -15,8 +14,7 @@ dependencies = [
     ('Python', '3.11.3'),
     ('Python-bundle-PyPI', '2023.06'),
     ('SciPy-bundle', '2023.07'),
-    ('CUDA', '12.1.1', '', SYSTEM),
-    ('PyTorch-bundle', '2.1.2', versionsuffix),
+    ('PyTorch-bundle', '2.1.2'),
     ('PyYAML', '6.0'),
     ('Safetensors', '0.4.3'),
 ]

@PetrKralCZ
Copy link
Collaborator Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@PetrKralCZ: Request for testing this PR well received on login1

PR test command 'EB_PR=21994 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21994 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 14840

Test results coming soon (I hope)...

- notification for comment with ID 2530428431 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
cns2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/c958e80a5153f8c8ae4334e80b62da62 for a full test report.

@PetrKralCZ
Copy link
Collaborator Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@PetrKralCZ: Request for testing this PR well received on login1

PR test command 'EB_PR=21994 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21994 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 14846

Test results coming soon (I hope)...

- notification for comment with ID 2531124373 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/38182fa8ef4a885b2f35dde44435a911 for a full test report.

@PetrKralCZ
Copy link
Collaborator Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@PetrKralCZ: Request for testing this PR well received on login1

PR test command 'EB_PR=21994 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21994 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 14847

Test results coming soon (I hope)...

- notification for comment with ID 2531522556 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

boegelbot commented Dec 10, 2024

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/6736ae05c9a0f7bd9d54edd88ae5c8a6 for a full test report.

edit:

git-lfs filter-process: git-lfs: command not found

@PetrKralCZ PetrKralCZ marked this pull request as draft December 10, 2024 12:46
@PetrKralCZ
Copy link
Collaborator Author

Test report by @PetrKralCZ
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4014.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 545.23.08, Python 3.6.8
See https://gist.github.com/PetrKralCZ/67c85ed2d7fc2d8dfd5fdaecc783b4f9 for a full test report.

@PetrKralCZ
Copy link
Collaborator Author

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@PetrKralCZ: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21994 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21994 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5455

Test results coming soon (I hope)...

- notification for comment with ID 2548437996 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.19
See https://gist.github.com/boegelbot/bc3c99ca0e801de33f01a6fa488485c4 for a full test report.

@boegel
Copy link
Member

boegel commented Dec 17, 2024

@PetrKralCZ Manual clone os generoso fails:

[boegelbot@login1 tmp]$ git clone https://huggingface.co/ctheodoris/Geneformer
Cloning into 'Geneformer'...
remote: Enumerating objects: 1013, done.
remote: Counting objects: 100% (845/845), done.
remote: Compressing objects: 100% (626/626), done.
remote: Total 1013 (delta 617), reused 214 (delta 214), pack-reused 168 (from 1)
Receiving objects: 100% (1013/1013), 6.34 MiB | 19.98 MiB/s, done.
Resolving deltas: 100% (617/617), done.
git-lfs filter-process: git-lfs: command not found
fatal: the remote end hung up unexpectedly
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

[boegelbot@login1 tmp]$ echo $?
128
[boegelbot@login1 tmp]$ git --version
git version 2.39.3

On jsc-zen3, works fine:

[boegelbot@jsczen3l1 tmp]$ git clone https://huggingface.co/ctheodoris/Geneformer
Cloning into 'Geneformer'...
remote: Enumerating objects: 1013, done.
remote: Counting objects: 100% (845/845), done.
remote: Compressing objects: 100% (626/626), done.
remote: Total 1013 (delta 618), reused 214 (delta 214), pack-reused 168 (from 1)
Receiving objects: 100% (1013/1013), 6.34 MiB | 27.06 MiB/s, done.
Resolving deltas: 100% (618/618), done.
[boegelbot@jsczen3l1 tmp]$ git --version
git version 2.43.5

Manual clone also fails on donphan (RHEL8):

$ git clone https://huggingface.co/ctheodoris/Geneformer
Cloning into 'Geneformer'...
remote: Enumerating objects: 1013, done.
remote: Counting objects: 100% (108/108), done.
remote: Compressing objects: 100% (76/76), done.
remote: Total 1013 (delta 67), reused 32 (delta 32), pack-reused 905 (from 1)
Receiving objects: 100% (1013/1013), 5.68 MiB | 13.11 MiB/s, done.
Resolving deltas: 100% (624/624), done.
git-lfs filter-process: git-lfs: command not found
fatal: the remote end hung up unexpectedly
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
$ echo $?
128

I suspect newer version of Git in RHEL9 (like on jsc-zen3) has some kind of built-in support for git-lfs?

@PetrKralCZ PetrKralCZ marked this pull request as ready for review December 18, 2024 07:53
@boegel
Copy link
Member

boegel commented Jan 7, 2025

@PetrKralCZ generoso has this:

$ cat ~/.gitconfig
[credential]
	helper = cache
[filter "lfs"]
	clean = git-lfs clean -- %f
	smudge = git-lfs smudge -- %f
	process = git-lfs filter-process
	required = true

If I remove the filter "lfs" part, then the manual clone works on generoso...

@boegel
Copy link
Member

boegel commented Jan 7, 2025

@PetrKralCZ On donphan, the manual clone now works (without having lfs stuff in .gitconfig), but that may be because of a bug fix in Git:

$ git --version
git version 2.39.5

git-lfs is really needed, otherwise the model.safetensors file isn't what it should be, it's just info for git-lfs to actually download the real file:

$ cat model.safetensors
version https://git-lfs.github.com/spec/v1
oid sha256:4365ba23e393fcfa0e65a94ac64a0983cd788bd23a8d4914f4ab66f85cfe043c
size 152012980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Geneformer
3 participants