Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does IGV support non amazon s3 buckets? #1636

Open
mrvollger opened this issue Jan 7, 2025 · 19 comments
Open

Does IGV support non amazon s3 buckets? #1636

mrvollger opened this issue Jan 7, 2025 · 19 comments
Assignees
Milestone

Comments

@mrvollger
Copy link

Hi Jim,

I have been trying to determine whether IGV can work with non-AWS S3 buckets. For example, the University of Washington hosts its own s3 endpoint (endpoint_url = https://s3.kopah.orci.washington.edu), on which we host a lot of data we want to view with IGV.

I can go into my .aws/credentials and .aws/config and set my defaults such that I can see into these buckets by default with the aws cli (or any other s3 cli):

$ aws s3 ls s3://stergachis/data/
                           PRE UDN/
                           PRE assemblies/
                           PRE bulk/
                           PRE iso-seq/

But when I try opening up IGV and viewing these buckets, nothing happens when I click the "Load from S3 bucket" menu item.

Any advice would be much appreciated! And if IGV doesn't support this it would be awesome if it could be added. I would think the change should be small, though I don't know much about the API for s3.

Thanks,
Mitchell

@jrobinso
Copy link
Contributor

jrobinso commented Jan 7, 2025

I don't have enough information to help you debug your problem. For starters what version of IGV are you using?

@ohofmann
Copy link

ohofmann commented Jan 7, 2025

And what is running that S3 server - MinIO or something similar? We have some on-prem S3-compatible object stores that we could test against if needed.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 7, 2025

@mrvollger If you could setup a test instance and give me credentials I will look further, but first verify you are using the latest IGV. There were S3 issues with some versions of 2.18.x

@mrvollger
Copy link
Author

mrvollger commented Jan 7, 2025

Thanks for the reply!

And sorry, I wanted first to establish that this should work before troubleshooting, but it sounds like it should.

I downloaded the latest version of IGV MacOS Apple yesterday during my test, 2.19.1.

I am not sure about the details of what is running the s3 server. It is a service provided by UW, so I will inquire about it with them and get back to you. They also administer credentials, so I will request those to establish a test case.
https://hyak.uw.edu/docs/storage/kopah

Thanks,
Mitchell

@jrobinso
Copy link
Contributor

jrobinso commented Jan 7, 2025

If you're able to create test credentials you can email them privately to [email protected]. I don't know what I will be able to determine but I will look.

There might also be information in your igv log file (usually named igv0.log in /igv.

@mrvollger
Copy link
Author

mrvollger commented Jan 8, 2025

Hi Jim,

I just sent an email with the credentials!

This is an example error log I get when I try doing this:

INFO [Jan 06,2025 17:26] [Main] Startup  IGV Version 2.19.1 12/04/2024 02:15 PM
INFO [Jan 06,2025 17:26] [Main] Java 21.0.5 (build 21.0.5+11-LTS) 2024-10-15
INFO [Jan 06,2025 17:26] [Main] Java Vendor: Eclipse Adoptium https://adoptium.net/
INFO [Jan 06,2025 17:26] [Main] JVM: OpenJDK 64-Bit Server VM Temurin-21.0.5+11
INFO [Jan 06,2025 17:26] [Main] OS: Mac OS X 15.1.1 aarch64
INFO [Jan 06,2025 17:26] [Main] IGV Directory: /Users/mrvollger/igv
INFO [Jan 06,2025 17:26] [Main] Resoluction scale = 0.0
INFO [Jan 06,2025 17:26] [OAuthUtils] Loading Google oAuth properties
INFO [Jan 06,2025 17:26] [CommandListener] Listening on port 60151
INFO [Jan 06,2025 17:26] [AmazonUtils] AWS default credentials found. AWS support enabled.
INFO [Jan 06,2025 17:26] [GenomeManager] Loading genome: /Users/mrvollger/igv/genomes/hg38.json
INFO [Jan 06,2025 17:26] [TrackLoader] Loading resource:  https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeq.txt.gz
SEVERE [Jan 06,2025 17:26] [DefaultExceptionHandler] Unhandled exception
SEVERE [Jan 06,2025 17:26] [DefaultExceptionHandler] software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: 1YD1BWPNB43A8ZBP, Extended Request ID: LFAY19Xg5oNJl0qlTAPQZdv/zeJz8QNNrltcBwNL7tCuO2RKed3+vOSSLAmzWPlMWxamM16RhbmXVjOLgucj4gYMm/9exK+dyTkPoF0T9xs=)
	at [email protected]/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
	at [email protected]/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
	at [email protected]/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
	at [email protected]/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
	at [email protected]/software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:93)
	at [email protected]/software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:279)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:50)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:38)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:79)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:41)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.executeRequest(RetryableStage2.java:93)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.execute(RetryableStage2.java:56)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.execute(RetryableStage2.java:36)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at [email protected]/software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
	at [email protected]/software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at [email protected]/software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at [email protected]/software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
	at [email protected]/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at [email protected]/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
	at [email protected]/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
	at [email protected]/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
	at [email protected]/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
	at [email protected]/software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at [email protected]/software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
	at [email protected]/software.amazon.awssdk.services.s3.DefaultS3Client.listBuckets(DefaultS3Client.java:6786)
	at org.igv/org.broad.igv.util.AmazonUtils.ListBucketsForUser(AmazonUtils.java:275)
	at org.igv/org.broad.igv.ui.IGVMenuBar.lambda$createAWSMenu$13(IGVMenuBar.java:1018)
	at java.desktop/javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
	at java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
	at java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
	at java.desktop/javax.swing.DefaultButtonModel.setPressed(Unknown Source)
	at java.desktop/javax.swing.AbstractButton.doClick(Unknown Source)
	at java.desktop/com.apple.laf.ScreenMenuItem.actionPerformed(Unknown Source)
	at java.desktop/java.awt.MenuItem.processActionEvent(Unknown Source)
	at java.desktop/java.awt.MenuItem.processEvent(Unknown Source)
	at java.desktop/java.awt.MenuComponent.dispatchEventImpl(Unknown Source)
	at java.desktop/java.awt.MenuComponent.dispatchEvent(Unknown Source)
	at java.desktop/java.awt.EventQueue.dispatchEventImpl(Unknown Source)
	at java.desktop/java.awt.EventQueue$4.run(Unknown Source)
	at java.desktop/java.awt.EventQueue$4.run(Unknown Source)
	at java.base/java.security.AccessController.doPrivileged(Unknown Source)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.desktop/java.awt.EventQueue$5.run(Unknown Source)
	at java.desktop/java.awt.EventQueue$5.run(Unknown Source)
	at java.base/java.security.AccessController.doPrivileged(Unknown Source)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.desktop/java.awt.EventQueue.dispatchEvent(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEvents(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEvents(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.run(Unknown Source)

Finally, our IT team shared that this s3 endpoint is hosted by a "Ceph RADOS Gateway"; hopefully, that is useful information, but I can always ask for more details if this does not help.

I really appreciate the help.

Thanks,
Mitchell

@brainstorm
Copy link
Contributor

AFAICT, currently non-Amazon buckets shouldn't work since there's no provision for so-called "custom endpoints" in the current S3Client builder:

s3Client = S3Client.builder().credentialsProvider(s3CredsProvider).region(region).build();

The change could potentially be relatively straightforward by (conditionally via oauth-config.json) using the .endpointOverride() builder method, see the following as an example:

aws/aws-sdk-java-v2#4996

BUT, as you can see in the issue above, this usecase doesn't seem to be fully supported by the Java2 AWS SDK, so caution must be exercised to not hit the wrong buttons (i.e avoiding parsing non-S3 URIs)... and/or find suitable workarounds.

I'll leave Jim putting those bits together, but let me know if you need assistance, @jrobinso ;)

@jrobinso
Copy link
Contributor

jrobinso commented Jan 8, 2025

Thanks @brainstorm, but they are not configuring oAuth. They are using .aws/credentials.

@brainstorm
Copy link
Contributor

Thanks @brainstorm, but they are not configuring oAuth. They are using .aws/credentials.

Oh, I didn't know the S3Client code paths became independent (imho they shouldn't), I will revisit the implementation... anyway, here's something that could help you as well, Jim:

https://stackoverflow.com/questions/52494196/is-there-any-way-to-specify-endpoint-url-in-aws-cli-config-file

@jrobinso
Copy link
Contributor

jrobinso commented Jan 9, 2025

@mrvollger Thanks for the test credentials and file. No success yet. The link @brainstorm includes above seemed somewhat promising, I tried setting the endpoint in the "config" file as suggested there but no luck, I got an exception that "region" was not set. I'm not sure what the region should be for a non-AWS endpoint, but setting it to us-east-1 did not work either (not surprisingly). There are probably IGV bugs as well, or implicit assumptions that the provider is AWS. So this is going to take more time than I have this evening, or even the next couple of days, but keep the issue open. I'm hopeful there will be a solution, but not certain. If you could ask the powers-that-be to leave those test credentials in place for a while that would be helpful.

@brainstorm I'm not sure what you are referring to wrt "S3Client code paths became independent" but its probably not relevant at the moment given other issues I've found.

@jrobinso jrobinso self-assigned this Jan 9, 2025
@jrobinso jrobinso added this to the 2.19.2 milestone Jan 9, 2025
@jrobinso
Copy link
Contributor

jrobinso commented Jan 9, 2025

@mrvollger One more question, it might be relevant or at least helpful to know what s3 compatible server you are running (e.e. minio).

Putting this here as possibly relevant, for myself mostly when I return to this. https://stackoverflow.com/questions/76780500/how-to-using-aws-s3-java-v2-sdk-to-talk-to-s3-compatible-storage-minio

@mrvollger
Copy link
Author

Thanks so much for helping! We are really glad that you are willing to put time into this.

We will keep the credentials valid as long as you need them, and if something happens, we can get another set up.

I think when using a custom endpoint, the region doesn't matter or isn't used. But I am only guessing that based of this AWS CLI test:

[07:34:35 AM]➜  aws s3 ls  s3://userprod/web/private/hashed.PacBio-Fiber-seq/PS00272/d194a9281237ccb68a548092fb434216/hg38/phased/PS00272.phased.bam --profile k_stergachislab --region 'us-west-2'
2025-01-01 07:47:48 93166395386 PS00272.phased.bam
2025-01-01 07:40:15   23788032 PS00272.phased.bam.bai
mvollger in n3459 in fire-figures on  main [!?]
[07:34:45 AM]➜  aws s3 ls  s3://userprod/web/private/hashed.PacBio-Fiber-seq/PS00272/d194a9281237ccb68a548092fb434216/hg38/phased/PS00272.phased.bam --profile k_stergachislab --region 'us-east-2'
2025-01-01 07:47:48 93166395386 PS00272.phased.bam
2025-01-01 07:40:15   23788032 PS00272.phased.bam.bai
mvollger in n3459 in fire-figures on  main [!?]
[07:34:49 AM]➜  aws s3 ls  s3://userprod/web/private/hashed.PacBio-Fiber-seq/PS00272/d194a9281237ccb68a548092fb434216/hg38/phased/PS00272.phased.bam --profile k_stergachislab --region 'us-east-1'
2025-01-01 07:47:48 93166395386 PS00272.phased.bam
2025-01-01 07:40:15   23788032 PS00272.phased.bam.bai

Last time, they told us the underlying vendor was Ceph RADOS Gateway, but I will ask more specifically what server is running the s3 instance and get back to you.

Thanks again!

@mrvollger
Copy link
Author

CCing @sjneph.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 9, 2025

@mrvollger A workaround for the interim, if its allowed there, would be to create signed URLs with the AWS command line tools. The signed URLs should work.

@mrvollger
Copy link
Author

Thanks for the suggestion! We have been doing that for things in a real-time crunch, but we have a couple of applications where we cannot do that since the URLs are posted publicly. (I also hate when the links in my painfully constructed IGV sessions (XMLs) die).

We are still waiting for our IT to share details about what runs the s3 server.

@mrvollger
Copy link
Author

https://stackoverflow.com/questions/68005239/how-do-you-configure-the-endpoint-for-amazon-s3-by-using-the-aws-sdk-v2

Another potentially relevant stackoverflow but it does look like stuff covered already by the other two linked here.

@jrobinso
Copy link
Contributor

@mrvollger Thanks, that looks like potentially new info, and from a vendor (Cloudfare). Amazon is not interested in fixing bugs on this or making this easy, understandably. I'm in the middle of a block of work I must finish before concentrating on this, hopefully next week.

Feel free to continue posting links here.

@jrobinso
Copy link
Contributor

BTW it would be really helpful, possibly essential, to know what backend you are running. Cloudfare has some instructions here but I won't waste my time with this if you are running something else. https://developers.cloudflare.com/r2/examples/aws/aws-sdk-java/

@mrvollger
Copy link
Author

mrvollger commented Jan 11, 2025

We have a Ceph RADOS Gateway and I believe that is built on a service called librados (https://docs.ceph.com/en/latest/radosgw/) but I am still waiting to get official confirmation...

If I am right about this, I just found some docs that look relevant:
https://docs.ceph.com/en/latest/radosgw/s3/java/

UW also has some docs on programmatically accessing data on kopah (our s3 server), but it is focused on an API called boto3:
https://hyak.uw.edu/docs/storage/boto3

I will send a reminder email to make sure they follow up with me.

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants