Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The result of NA12878 and HX1 #35

Open
Smart-zhi opened this issue Jun 30, 2020 · 13 comments
Open

The result of NA12878 and HX1 #35

Smart-zhi opened this issue Jun 30, 2020 · 13 comments

Comments

@Smart-zhi
Copy link

Hello Liu:
I've been working on analyzing CpG methylation in human genome. I tried to run DeepMod on NA12878 and HX1 nanopore data by myself, but I can't get the expected results.
So is it convenient for you to send me the DeepMod results of NA12878 and HX1?

@liuqianhn
Copy link
Collaborator

@Smart-zhi Thanks for being interested in DeepMod. Your message is received. I am working on it. At the same time, it would be great if you can share your running commands and what you have done so that I might reproduce the results if there is any issue.

@Smart-zhi
Copy link
Author

Thank you.
Since the NA12878 data set is relatively large, I divided it into several groups and ran Albacore and DeepMod.py detect in sequence.

read_fast5_basecaller.py -i raw/ -r -t 20 -s na12878.${i}.albacore/ -f FLO-MIN106 -k SQK-LSK108 -o fast5 

python ${DeepMod}/bin/DeepMod.py detect --wrkBase na12878.${i}.albacore/workspace/pass --Ref ${ref} --FileID Notts_group1 --modfile ${DeepMod}/train_mod/rnn_conmodC_P100wd21_f7ne1u0_4/mod_train_conmodC_P100wd21_f3ne1u0 --threads 20 --outFolder ${out_folder}

And I get these in ${out_folder}

Bham_group0               Norwich_group1_1         UBC_group1_2
Bham_group0.done          Norwich_group1_1.done    UBC_group1_2.done
Bham_group1_1             Norwich_group2           UBC_group1_3
Bham_group1_1.done        Norwich_group2.done      UBC_group1_3.done
Bham_group1_2             Notts_group0             UBC_group1_4
Bham_group1_2.done        Notts_group0.done        UBC_group1_4.done
Bham_group1_3             Notts_group1             UBC_group2
Bham_group1_3.done        Notts_group1_1           UBC_group2.done
Bham_group2               Notts_group1_1.done      UCSC_group0
Bham_group2.done          Notts_group1.done        UCSC_group0.done
Norwich_group0            UBC_group1_1             
Norwich_group0.done       UBC_group1_1.done        

Then,

python ${DeepMod}/tools/sum_chr_mod.py ${out_folder}/ C na12878_C

I got

na12878_C.chr10.C.bed  na12878_C.chr14.C.bed  na12878_C.chr18.C.bed  na12878_C.chr21.C.bed  na12878_C.chr4.C.bed  na12878_C.chr8.C.bed  na12878_C.chrY.C.bed
na12878_C.chr11.C.bed  na12878_C.chr15.C.bed  na12878_C.chr19.C.bed  na12878_C.chr22.C.bed  na12878_C.chr5.C.bed  na12878_C.chr9.C.bed
na12878_C.chr12.C.bed  na12878_C.chr16.C.bed  na12878_C.chr1.C.bed   na12878_C.chr2.C.bed   na12878_C.chr6.C.bed  na12878_C.chrM.C.bed
na12878_C.chr13.C.bed  na12878_C.chr17.C.bed  na12878_C.chr20.C.bed  na12878_C.chr3.C.bed   na12878_C.chr7.C.bed  na12878_C.chrX.C.bed

I ran

python ${DeepMod}/tools/generate_motif_pos.py ${ref} ${genome_motif}/C C CG 0
python ${DeepMod}/tools/hm_cluster_predict.py ${out_folder}/na12878_C ${genome_motif}/C

and got:

na12878_C_clusterCpG.chr10.C.bed  na12878_C_clusterCpG.chr15.C.bed  na12878_C_clusterCpG.chr1.C.bed   na12878_C_clusterCpG.chr3.C.bed  na12878_C_clusterCpG.chr8.C.bed
na12878_C_clusterCpG.chr11.C.bed  na12878_C_clusterCpG.chr16.C.bed  na12878_C_clusterCpG.chr20.C.bed  na12878_C_clusterCpG.chr4.C.bed  na12878_C_clusterCpG.chr9.C.bed
na12878_C_clusterCpG.chr12.C.bed  na12878_C_clusterCpG.chr17.C.bed  na12878_C_clusterCpG.chr21.C.bed  na12878_C_clusterCpG.chr5.C.bed  na12878_C_clusterCpG.chrM.C.bed
na12878_C_clusterCpG.chr13.C.bed  na12878_C_clusterCpG.chr18.C.bed  na12878_C_clusterCpG.chr22.C.bed  na12878_C_clusterCpG.chr6.C.bed  na12878_C_clusterCpG.chrX.C.bed
na12878_C_clusterCpG.chr14.C.bed  na12878_C_clusterCpG.chr19.C.bed  na12878_C_clusterCpG.chr2.C.bed   na12878_C_clusterCpG.chr7.C.bed  na12878_C_clusterCpG.chrY.C.bed

During the analysis, I merged the na12878_C_clusterCpG.chr* files to one file named total.bed (cat na12878_C_clusterCpG.chr* >total.bed). Then use a simple shell script to merge the positive and negative chains of CpG into the same site.
In the end I got a file similar to the following (location is based on 1):

chr_pos1	coverage	met	rmet
chr17_19342304	10	2	0.2000
chr11_64368472	15	8	0.5333
chr9_70171213	17	1	0.0588
chr2_101126946	17	6	0.3529
chr7_92826868	40	8	0.2000
chr5_137781115	22	4	0.1818
chr17_4922691	15	1	0.0667
chr14_39170567	29	3	0.1034
chrX_32494303	24	2	0.0833

@liuqianhn
Copy link
Collaborator

@Smart-zhi Sorry for the late reply, since I want to update more because one of our lab members has been working on the whole evaluation process now. However, I do not have more results now, and I might have more updates later.

Right now, I checked all the positions you listed above, and found that the coverages you have for different positions are different from what I have. This might be due to the different versions of the basecalling of Nanopore data. My methylation percentage are thus significantly different from yours. I would like to share my DeepMod results with you, but it is several GB. Let me figure a way to share the results with you later. Thanks.

@Smart-zhi
Copy link
Author

Smart-zhi commented Jul 10, 2020

Thank you for your reply,
Thank you very much for sharing data with me. I can receive data from any location such as goole drive, onedrive, baidu drieve, etc. Any way is ok. And my email address is [email protected].

@Smart-zhi
Copy link
Author

@liuqianhn I used DeepMod to analyze CpG methylation on HX1 recently. I calculated the Pearson correlation coefficient between DeepMod / nanopolish and bisulfite result(Bismark). My result is as follows:

NA12878

The number of intersections(CpG) with Bismark Pearson correlation coefficient
nanopolish 26,733,082 0.9023
DeepMod 19,936,625 0.4325

HX1

The number of intersections(CpG) with Bismark Pearson correlation coefficient
nanopolish 27,303,077 0.9092
DeepMod 26,303,675 0.7708

I can't explain the performance of DeepMod on the NA12878 dataset. Could you please share me the results of NA12878 and HX1? I want to compare them to check these problems. Thank you.

@liuqianhn
Copy link
Collaborator

@Smart-zhi , Thanks for sharing your results. I will summarize the files and share them with you. According to your previous sharing on NA12878, there are significant differences caused by basecaller. No sure for HX1 data yet.

@Smart-zhi
Copy link
Author

Hello @liuqianhn ,
I followed the instructions provided in Supplementary Table 5 to reproduce the chrY on HX1. The results are as follows:

Un-meth Meth Prec Rec
Supplementary Table 5 1,338 30,825 0.989 0.967
my test 1,320 34,729 0.994 0.967

Coverage>=3, threshold = 0.5


However, when I followed the instructions in Supplementary Table 4 to experiment on the chrX on NA12878, I couldn't get the same conclusion. In my results, precision  is close to Supplementary Table 4, but the recall is very low. I guess that some of the sites are lost due to the reduced coverage.

I urgently need the results on NA12878 so that my work can continue. If you have this part of the data and can share with me, I would be very grateful.
Thank you.

@liuqianhn
Copy link
Collaborator

@Smart-zhi You are right, there might be coverage issue for newly basecalled NA12878. I am sorry that I do not have the data ready for you, because one of the lab members who partially worked on this left. I will try to finish my work in hand and prepare the data for you soon. May I know when is your deadline?

@Smart-zhi
Copy link
Author

@liuqianhn Thank you, I hope to get the NA12878 result before August 31st. During this time, I plan to run DeepMod again, but I need to basecall before that. At the same time, I am very worried that the results are not satisfactory.
I am very fortunate and honored that I can get your help.

@liuqianhn
Copy link
Collaborator

@Smart-zhi could you please try to see whether you can access the na12878 data from the link? I tested the performance for binary classification rather than correlation.

@Smart-zhi
Copy link
Author

Smart-zhi commented Sep 1, 2020

@liuqianhn , thank you very much. I have received the data. I find that chromosome 22 seems to be missing from the data.

In the next steps, I will test the classification effect. Again, I would like to express my warm thanks to you!

@liuqianhn
Copy link
Collaborator

Thanks for sharing, @Smart-zhi. It seems that I need to see how to improve deepmod for correlation testing. Thanks.

@Pardeeskumar
Copy link

I want to....gamil hacking tool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants