-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tutorial on relationship matrices #1072
Add tutorial on relationship matrices #1072
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great addition.
I read through the text and found a few minor typos:
It is also common to use a 0 to indicate an unknown parent pedigree files.
Change to "an unknown parent in pedigree files".
to ensure they are note mistaken
"not"
dimensions named "samples" and "parent"
"parents"
Using out pedigree dataset
"our"
Thanks! |
70d45c6
to
02a87ef
Compare
02a87ef
to
11b09a4
Compare
Where is the data being downloaded from and how big is it? In the GWAS tutorial we download data from Google Cloud - perhaps we could do the same here? |
@tomwhite I was pulling the data from PMC supplementary material (not ideal). I can cut the data down to a pair of files as 1.1MB and 0.3MB. Should we upload these to the same bucket as the GWAS example? |
That sounds like a good idea. I just checked and I don't see the GCS bucket in my GCP account, which is odd. @hammer did you create the gs://sgkit-gwas-tutorial bucket for #463? |
Maybe? Let me check! |
I do not see that bucket on my personal or Hammer Lab account. I may have created it with my Related Sciences account but I can no longer access that. @eric-czech do you happen to see that bucket on any RS-owned accounts? |
Might have been me too but either way, I'd rather use the project and bucket we have specifically for sgkit. I added you all (@tomwhite, @jeromekelleher, @hammer, @timothymillar) as storage object admins for https://console.cloud.google.com/storage/browser/sgkit-data. There is a @timothymillar can you upload this to |
Thanks @eric-czech, I was able to create the Edit: Ahh, it's because the whole bucket is set up as "requester pays" which I think means it won't work for tutorials anyway. |
Ok, I disabled that. Try again? Everybody I mentioned above should have full control over the bucket now. |
11b09a4
to
9307c98
Compare
Thanks, I've uploaded the data now. The only remaining issue is that the bucket grants read access to "allAuthenticatedUsers" rather than "allUsers":
This means a user can't just grab the data with the URL only. I should be able to make this change but thought I should check with whoever is paying the bills! This would also apply to the validation data which is a bit more substantial. This also outdates some of the documentation |
Ah, didn't realize that. It's "allUsers" now so there should be no more restrictions.
Good catch! That last line could be removed and it would be better if that path to the file was |
c4d8d65
to
ee3687d
Compare
ee3687d
to
d168264
Compare
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #1072 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 49 50 +1
Lines 4990 5023 +33
=========================================
+ Hits 4990 5023 +33 see 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I think this is ready to go unless there are any suggestions? I imagine it'll get updated as relevant features are added. |
It's still a bit rough, but I think the key parts are there. This covers working with pedigree data, calculating relationship matrices from pedigree and molecular data, and combining them into a hybrid relationship matrix. It also showcases the flexibility of having sgkit based on xarray.