Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use case with microhaplotypes? #39

Open
btmartin721 opened this issue Nov 7, 2023 · 2 comments
Open

Use case with microhaplotypes? #39

btmartin721 opened this issue Nov 7, 2023 · 2 comments

Comments

@btmartin721
Copy link

Hi,

I was wondering what steps might need to be taken in order to use Locator with GTSeq microhaplotypes? The genotypes would basically be haplotype numbers. There could be multiple haplotypes (e.g., 4 or 5), so I didn't know exactly how that might affect the model's learning.

Thanks!

-Bradley

@cjbattey
Copy link
Collaborator

cjbattey commented Nov 7, 2023

Hi Bradley,

Unfortunately locator's model expects biallelic data only, so can't handle multiallelic sites. If the haplotypes are generated from sequencing or genotyping data, you could enter the individual variant-level data (though this will discard any phase information).

For haplotypes with only 2 alleles you could encode the data as a matrix with sample on rows and haplotypes on columns and entries giving the count of the minor allele (the less common haplotype), using the "--matrix" input option. But if most haplotypes are multiallelic that will throw away much of the information as well.

It's definitely doable to run a locator-like model on haplotype data though -- this just isn't the implementation for it.

CJ

@btmartin721
Copy link
Author

Ok that makes sense. Thanks for your input.

If I were to modify the Locator code to accommodate the microhaplotypes, generally what steps would be involved? I am curious because I have a need for this use case, and am considering modifying the Locator code to do so.

Are the steps generally just modifying the input matrix as well as the model architecture to accommodate multiple haplotypes? There may of course be a lot of little things to change, but I guess what I am asking is, given the Locator code, is this feasible or would it be easier to just create a whole new model/ code base?

Thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants