Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategy Question for genome annotation #20

Open
sanyalab opened this issue Aug 9, 2021 · 4 comments
Open

Strategy Question for genome annotation #20

sanyalab opened this issue Aug 9, 2021 · 4 comments

Comments

@sanyalab
Copy link

sanyalab commented Aug 9, 2021

Hello,

This is not an issue, but a question on recommendation of strategy.

I am building transcripts for genome annotation in plants. Which of the following two strategies does PSICLASS better respond to?

  1. Align reads from all samples to the genome, and then use the combined bam file to build transcripts using psiclass
  2. Align reads from each sample to the genome, build transcripts using psiclass by providing the sample bam files

Thanks
Abhijit

@mourisl
Copy link
Collaborator

mourisl commented Aug 9, 2021

Since different samples may have varied sequencing depth and capture different biological events, using a directly combined bam may cause some issues and lose some features. For example, one deeply sequenced sample can have a lot of alignments for one transcript, and the other alternative transcripts supported by other samples will be filtered as noise.

I think the approaches such as PsiCLASS or treating samples independently and combine later, can resolve the normalization issue and have a better opportunity to capture biological features that showed up only in a few samples.

@sanyalab
Copy link
Author

Thank you @mourisl. That was very helpful and makes total sense. Since there is a probabilistic distribution of transcripts at any given condition, it is fair to expect more support for one than others at any given condition. Aligning reads from several samples in a single step will lead to a conservative estimate of transcripts perhaps losing isoforms.

Thanks
Abhijit

@sanyalab
Copy link
Author

Hi mourisl,

A query regarding PSICLASS

If I give the tool a splice-site file, like the one that HISAT2 generates, using the "-s" option, will PSICLASS ONLY use those splice junctions? In other words, if the bam data suggests novel or modified junctions, will the provided junction details get corrected in light of the BAM data?

Further if novel junctions are found, will that also be retained?

Thanks
Abhijit

@sanyalab sanyalab reopened this Aug 16, 2021
@mourisl
Copy link
Collaborator

mourisl commented Aug 16, 2021

When -s is given, PsiCLASS will ONLY use those splice junctions. More accurately, each sample will only use the intersection of its own splice junctions and the sites provided in the "-s" file. And PsiCLASS will not correct the junctions from the "-s" file and will not use novel junctions. Note that "-s" file is only for splice sites, so a novel intron utilizing two alternative trusted splice junctions can still be discovered.

The motivation for the "-s" file is that PsiCLASS utilizes a relatively simple way to infer the trusted splice junctions/sites across all the samples, which might not be good enough with the development of other approaches, such as https://github.com/splicebox/JULiP .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants