This project is inspired by and builds upon Maria Nattestad's project: https://youtu.be/-PCKK_nwFdA
The limited representation of Asian populations in genomic datasets hinders the study of genetic variation, population structures, and disease manifestations in these population groups. This project aims to visualize genotypic variation across different Asian population groups through PCA and t-SNE analysis of single nucleotide variation data (phased VCF) of chromosome 22 from the 1000 Genomes Project (http://www.internationalgenome.org).
Link to the colab notebook: https://colab.research.google.com/drive/1JUnItMIRTAJtMbbkLRTFJe-EbUOWOMgK?usp=sharing
References:
- GenomeAsia100K Consortium (2019). The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature, 576(7785), 106-111. doi: 10.1038/s41586-019-1793-z.
- The 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature, 526, 68–74. https://doi.org/10.1038/nature15393.
- McVean, G. (2009). A genealogical interpretation of principal components analysis. PLoS Genet, 5(10). doi: 10.1371/journal.pgen.1000686.