-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requesting help with some issues with PsiCLASS #1
Comments
Thank you for the feedback and examples!
Thank you! |
Hello, Thank you for the fast reply! Here are my comments.
Let me know if you need any more files from my end. I will be happy to provide it to you. |
Thanks. |
Here is the link to the multiple sequence alignment of their corresponding protein sequences. And as expected, PsiCLASS does not report the gene from any of the 3 loci. Thank you! |
Even if the aligner reports one alignment as primary alignment, it can have the same alignment quality as the secondary alignments. I think the plant genome is more repetitive than human genome, Since the sequences of the genes are so similar with each other, it is difficult to infer which genes are really expressed, and PsiCLASS is a bit conservative on this and report none of them. No matter what, I will add an option to retain the gene if most of the alignments in it are primary. |
Yes, I had STAR consider all alignments with the highest same score as primary. So my bam file can potentially have several same score primary alignments. It would be great if you could add an option to output all such genes. Thanks! |
I was trying different settings and I found that eliminating the NH and HI from the STAR alignments helps to retain genes that were constructed from multi-mapping reads. So that solves one of the issues I was having! I just wanted to run it by you to make sure this "fix" isn't breaking anything. Thanks. |
Yes, this "fix" is definitely fine. I just uploaded the code to add the option "--primaryParalog", so it will use primary alignment instead of the proportion of unique alignments to filter those paralog genes. This essentially is equivalent to what you did. I'm working on other issues. |
Thank you so much for taking care of these issues! |
What is the length of those falsely merged exons you saw? I think I can add some more tests before merging those exons of medium size. |
Hello, I looked at a few examples. For most of the cases, the length of the exons was not the problem. The main issue was with coverage which formed a trough. I have noticed very small exons being merged and also quite large ones. Hopefully, that helps. Thank you. |
Hi @mourisl, Thanks for creating a conda package. I was wondering if you could modify psiclass to include options to extend the end exons. I am currently modifying the code and compiling it. Due to this reason I have to provide the whole package within my pipeline. I will be nice to have an option to generate full-length end exons and not the median as the default. Thank you. |
Yes, I think I can add an option to specify which rank to determine the end of the exon, such as 0 for the shortest exon from all the samples, 0.5 uses median, and 1 for longest end exon. Is this what you need? |
Exactly, That's what I need. |
Hi @mourisl, Did you get around to implementing this feature in the conda package? Thanks. |
Sorry for the late reply. I just updated the feature with the new option "--tssTesQuantile". The default value is 0.5, which corresponds to the median previously. I guess conda will automatically update some time? |
Hello @mourisl, Thank you for attending to this. I will check it out. I don't think conda will automatically update it. You will need to create a new version of the Thank you. |
Hello,
I am running PsiCLASS on 43 RNA-Seq samples from Arabidopsis Thaliana. The whole run completes very fast and also produces quite good results. I compared the assemblies from PsiCLASS with those I obtained from StringTie and Scallop. Though PsiCLASS performs better on a genome-wide scale I noticed a few issues with some transcripts. I am highlighting each issue below. For each case, I have attached an example of IGV screenshot of the locus. In the images, you will find that there are 4 transcriptomic tracks. The first track is the TAIR10 annotation representing the ground truth. The second track represents transcripts merged from StringTie and Scallop. The third track is the transcriptome generated from ONE of the 43 samples. The fourth track represents transcripts from the consensus of all the 43 samples.
Most of the transcripts have the end exons trimmed (image attached). I know that some assemblers like StringTie have parameters to disable trimming. I was wondering if PsiCLASS offers any such parameters.
Some transcripts have low coverage but have spliced junctions. Those transcripts are not assembled. I have tried reducing the depth coverage cutoff to 0.5 but it did not help.
Transcripts located very closely on the same strand are being merged. Sometimes the two transcripts are of high coverage - separated by a region of low coverage. In other cases, one of the transcripts has a high coverage but the other one has low coverage. There are several examples of this kind in the assembly.
For some overlapping transcripts on opposite strands, the first exon of one transcript is exactly the same as the last exon of the other transcript. There is a noticeable dip in the coverage in that exon which could be used to correctly determine the extent of each exon.
There are a few loci where the transcript seems to have been abruptly truncated in spite of there being read coverage.
It would be great if you could help me with these cases.
Thank you.
The text was updated successfully, but these errors were encountered: