Fasta parser and Multiple sequence alignments
Created on Thu Oct 19 22:11:07 2023
@author: MingHeng Hsiung
The entire project consists of two parts. The first part is a FASTA parser, which reads specific genetic data provided in the package and formats the output for the second part of the code. The second part performs multiple sequence alignment, aligning the sequences and calculating the score based on the parameters provided. This project can assist in conducting further phylogenetic analysis.
The Fasta parser This is a program that writes a parser that reads the GeneticData –1.txt file and outputs two fasta files, one for mtDNA and one for the Y chromosome. Usage: python FastaParser.py text_file output_fasta_file
The results will show as follows:
Multiple sequence alignments This is a program that 1. Reads an aligned fasta file. 2. Evaluate the alignment and output two measures of the alignment quality: the identity score and the alignment score of all possible pairs. 3. It also allows the user to determine the weight matrix.
The default values: “match score: 1; gap penalty: -1; transition: -1, transversion: -2”,the values commonly used in the alignment of closely related DNA sequences, are also the values we used in this analysis.
The Multiple sequence aligner is as follows and was written to run on the command line:
Usage: python MSA.py -f fasta_file(change the name according to the input data) -p weight_parameters(input from the user) -o output_file
Usage:
python MSA.py -f fasta_file(change the name according to the input data) -p weight_parameters(input from the user) -o output_file(don't forget to change name according to the infile')
The results will show as follows: