Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Failed to pull data from the cloud: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied #2

Open
mkoohim opened this issue Jul 12, 2021 · 3 comments

Comments

@mkoohim
Copy link

mkoohim commented Jul 12, 2021

Hello,
I tried to pull the dvc file. I made my own aws configuration by my own aws access key id and secret id. Also I've set the S3 policy to access to the s3:ListBucket but still get the following error when run dvc pull:

Error: Failed to pull data from the cloud: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

I think you need to set some parameters in your bucket in S3 to make it accessible for public. Please see the following video for more information:
https://www.youtube.com/watch?v=_dOBPpeBAxs

Could you please check things and advice how we can solve the problem.

Regards,
Mohamad

@prihoda
Copy link
Contributor

prihoda commented Jul 12, 2021

Hi @mkoohim, the DVC files cannot be used to download the actual files, they are only here to document the commands used. If you want to retrain DeepBGC on new BGC data, please refer to the DeepBGC repository: https://github.com/Merck/deepbgc#train-deepbgc-on-your-own-data

Training and validation data can be downloaded from release 0.1.0 and release 0.1.5.

@mkoohim
Copy link
Author

mkoohim commented Jul 12, 2021

Hello. Thanks for your reply.
May I ask if there is anyway to access to the corpus data?
https://github.com/Merck/bgc-pipeline/tree/main/data/bacteria/corpus

I couldn't find it in the previous versions.

Thanks,

@prihoda
Copy link
Contributor

prihoda commented Jul 12, 2021

Hi @mkoohim, I added the missing file to the 0.1.0 release: https://github.com/Merck/deepbgc/releases/tag/v0.1.0

Please keep in mind that it was detected using Pfam 31.0, the current Pfam version is 34.0. To generate an updated corpus, you would have to run HMMSCAN using Pfam 34.0 on thousands of genomes, which would be very computationally intensive. We might do it at some point in the future, but not in the near term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants