-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a QC condition for phenotype_imputation.ipynb #1061
Conversation
rl3328
commented
Sep 17, 2024
- create a rm index showing less than 75% of PSI values >=0.05, combine with the other two conditions
- Keep a cleaner version that also works
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
I dont think this would work. 0.05 is a number that is very arbitrary and is scale dependent. In this case you are basically removing an event if >25% have small values such as 0.05 and less. That is too much data removed. We can talk about it when we meet how you should approach this kind of tasks. But for now, just remove anything that has >95% data equal to zero |
@rfeng2023 @rl3328 i just realized that the step to remove rare events ended up in the imputation code ... do we have the normalization step? I think it is conceptually clearer if this QC goes to that step. What do you think? |
@gaow I agree, in principle we should add those qc to normalization section, which would be clearer. While we have different molecular phenotype, which could have different normalization pipeline, so we need to that qc part to each normalization pipeline. And we are aiming to do imputation first then do normalization as we discussed. so I think it might be better to still put it here? |
then it is indeed more convenient to do it here but conceptually it is somewhat confusing. I would argue that conceptual clarity precedes that of engineering convenience. So I am still in favor of throwing in this line for every molecular phenotype at their own QC part. |
As we discussed today, we can keep it in imputation for now and move it in future, which is also posted as a ticket #1073 (comment) Another thing is, this filtering should not only be set in |
|