Merge pull request #54 from hiruna72/dev

Dev
hiruna72 · Mar 5, 2024 · 10b26be · 10b26be
2 parents c4ddfe5 + d2f07fd
commit 10b26be
Show file tree

Hide file tree

Showing 18 changed files with 725 additions and 387 deletions.
diff --git a/README.md b/README.md
@@ -10,6 +10,10 @@ Watch [the video](https://youtu.be/kClYH4KpOjk) to learn a few tricks to get the
 ![PyPI Downloads](https://img.shields.io/pypi/dm/squigualiser?label=pypi%20downloads)
 [![PyPI](https://img.shields.io/pypi/v/squigualiser.svg?style=flat)](https://pypi.python.org/pypi/squigualiser)
 [![Snake CI](https://github.com/hiruna72/squigualiser/actions/workflows/snake.yml/badge.svg)](https://github.com/hiruna72/squigualiser/actions/workflows/snake.yml)
+[![GitHub Downloads](https://img.shields.io/github/downloads/hiruna72/squigualiser/total?logo=GitHub)](https://github.com/hiruna72/squigualiser/releases)
+<!---
+[![BioConda Install](https://img.shields.io/conda/dn/bioconda/slow5tools.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/slow5tools)
+-->
 
 Squigualiser preprint - https://www.biorxiv.org/content/10.1101/2024.02.19.581111v2
 
@@ -21,11 +25,11 @@ Squigualiser preprint - https://www.biorxiv.org/content/10.1101/2024.02.19.58111
 # Table of Contents
 1. [Quickstart](#quickstart)
 2. [Advanced Setup](#advanced-setup)
-3. [Signal to read visualisation](#signal-to-read-visualisation)
+3. [Signal-to-read visualisation](#signal-to-read-visualisation)
    1. [Option 1 - Using f5c resquiggle](#option-1---f5c-resquiggle)
    2. [Option 2 - Using basecaller move table](#option-2---basecaller-move-table)
    3. [Option 3 - Using squigulator signal simulation](#option-3---squigulator-signal-simulation)
-5. [Signal to reference visualisation](#signal-to-reference-visualisation)
+5. [Signal-to-reference visualisation](#signal-to-reference-visualisation)
    1. [Option 1 - Using f5c eventalign](#option-1-f5c-eventalign)
    2. [Option 2 - Using basecaller move table](#option-2---basecaller-move-table-1)
    3. [Option 3 - Using squigulator signal simulation](#option-3---squigulator-signal-simulation-1)
@@ -179,13 +183,13 @@ SLOW5 files compressed with *zstd* offer smaller file size and better performanc
 </div>
 </details>
 
-## Signal to read visualisation
+## Signal-to-read visualisation
 
 This section explains how you can use squigualiser to visualise a raw signal alignment against its basecalled read. Click on the arrow to expand the revalent method.
 
 #### Option 1 - f5c resquiggle
 <details>
-<summary>Steps for using f5c resquiggle signal-read alignment</summary>
+<summary>Steps for using f5c resquiggle signal-to-read alignment</summary>
 <div markdown=1>
 
 1. Install f5c [v1.3 or higher](https://github.com/hasindu2008/f5c/releases) as explained in [f5c documentation](https://github.com/hasindu2008/f5c/#quick-start).
@@ -202,7 +206,7 @@ f5c resquiggle -c ${FASTQ} ${SIGNAL_FILE} -o ${ALIGNMENT}
 * Refer [Note(2)](#notes) for more information about `--kmer-model [KMER_MODEL]`, which is optional.
 * Refer [Note(3)](#notes) for more information about RNA.
 
-3. Plot signal to read alignment
+3. Plot signal-to-read alignment
 
 ````
 OUTPUT_DIR=output_dir
@@ -244,7 +248,7 @@ squigualiser reform --sig_move_offset 0 --kmer_length 1 -c --bam basecalls.sam -
 * Refer [Note(5)](#notes) for a description about `sig_move_offset`.
 * Refer [Note(6)](#notes) for handling a potential SAM/BAM error.
 
-3. Plot signal to read alignment
+3. Plot signal-to-read alignment
 
 ````
 FASTA_FILE=read.fasta
@@ -272,13 +276,13 @@ squigualiser plot --file ${FASTA_FILE} --slow5 ${SIGNAL_FILE} --alignment ${ALIG
 ````
 REF=ref.fasta          #reference
 READ=sim.fasta         #simulated reads
-ALIGNMENT=sim.paf      #contains signal-read alignment
+ALIGNMENT=sim.paf      #contains signal-to-read alignment
 SIGNAL_FILE=sim.blow5  #simultated raw signal data
 
 squigulator -x dna-r10-prom ${REF} -n 1 -o ${SIGNAL_FILE} -q ${READ} -c ${ALIGNMENT} # instead of dna-r10-prom, you can specify any other profile
 ````
 
-3. Plot signal to read alignment.
+3. Plot signal-to-read alignment.
 
 ````
 OUTPUT_DIR=output_dir
@@ -288,7 +292,7 @@ squigualiser plot -f ${READ} -s ${SIGNAL_FILE} -a ${ALIGNMENT} -o ${OUTPUT_DIR}
 </div>
 </details>
 
-## Signal to reference visualisation
+## Signal-to-reference visualisation
 
 This section explains how you can use squigualiser to visualise a raw signal alignment against a reference. Click on the arrow to expand the relevant method.
 
@@ -392,7 +396,7 @@ REALIGN_BAM=realign_output.bam
 squigualiser realign --bam ${MAPP_SAM} --paf ${REFORMAT_PAF} -o ${REALIGN_BAM}
 ````
 
-5. Plot signal to reference alignment
+5. Plot signal-to-reference alignment
 
 ````
 REGION=chr1:6811404-6811443
@@ -423,7 +427,7 @@ NUM_READS=50 #number of reads to simulate
 squigulator -x dna-r10-prom ${REF} -o ${SIGNAL_FILE} -a sim.sam -n ${NUM_READS} && samtools sort sim.sam -o ${ALIGNMENT} && samtools index ${ALIGNMENT}
 ```
 
-3. Plot signal to reference alignment.
+3. Plot signal-to-reference alignment.
 ````
 OUTPUT_DIR=output_dir
 REGION=chr1:6811404-6811443
@@ -466,10 +470,10 @@ tabix -0 -b 9 -e 8 -s 6 ${ALIGNMENT}
 ![image](docs/figures/pileup/pileup_plot.png)
 
 Similar to IGV pileup view now you can view the signal pileup view. To create a pileup view the following conditions should be met.
-1. The plot is a signal to reference visualisation, not a signal to read.
+1. The plot is a signal-to-reference visualisation, not a signal-to-read.
 2. A genomic region should be specified using the argument `--region`
 
-First, create an alignment file by following the steps mentioned in [Signal to reference visualisation](#signal-to-reference-visualisation)
+First, create an alignment file by following the steps mentioned in [Signal-to-reference visualisation](#signal-to-reference-visualisation)
 
 ````
 REGION=chr1:6811011-6811198
@@ -541,7 +545,7 @@ However, the user is adviced to use `--profile` (documented [here](docs/profiles
 For more information please refer [base_shift and eventalignment](docs/base_shift_and_eventalignment.md) and [base_shift and reverse mapped reads](docs/base_shift_of_reverse_mapped_reads.md).
 
 ### Signal scaling
-The commands `plot` and `plot_pileup` can take the argument `--sig_scale`. By providing the argument `--sig_scale znorm` or `--sig_scale medmad` the signals will be zscore or median MAD normalized respectively.
+The commands `plot` and `plot_pileup` can take the argument `--sig_scale`. Provide the argument `--sig_scale znorm` to zscore normalize, `--sig_scale medmad` to median MAD normalize, and `--sig_scale scaledpA` to scale the raw signal to the pore model.
 
 
 ## Plot Conventions
@@ -553,15 +557,15 @@ The commands `plot` and `plot_pileup` can take the argument `--sig_scale`. By pr
 
 
 ## Calculate alignment statistics
-Calculate basic statistics of read/reference - signal alignments.
+Calculate basic statistics of signal-to-read/reference alignments.
 Check [here](docs/commands.md/#metric) for the command.
 Check [here](docs/different_alignments.md) for an example.
 
 ## Notes
 
 1. If your FASTQ file is a multi-line file (not to confuse with multi-read), then install [seqtk](https://github.com/lh3/seqtk) and use `seqtk seq -l0 in.fastq > out.fastq`  to convert multi-line FASTQ to 4-line FASTQ.
 2. The optional argument `--kmer-model KMER_MODEL` can be used to specify a custom k-mer model if you wish.
-3. To plot RNA signal-read alignment use the alignment file created using `f5c resquiggle --rna -c ${FASTQ} ${SIGNAL_FILE} -o ${ALIGNMENT}`. Also, provide the argument `--rna` to the visualising command. Currently, there exists no RNA kmer model for r10.4.1 chemistry.
+3. To plot RNA signal-to-read alignment use the alignment file created using `f5c resquiggle --rna -c ${FASTQ} ${SIGNAL_FILE} -o ${ALIGNMENT}`. Also, provide the argument `--rna` to the visualising command. Currently, there exists no RNA kmer model for r10.4.1 chemistry.
 4. The input alignment format accepted by `squigualiser plot` is explained [here](https://hasindu2008.github.io/f5c/docs/output#resquiggle). This standard format made plotting a lot easier.
 5. The argument `sig_move_offset` is the number of moves `n` to skip in the signal to correct the start of the alignment. This will not skip bases in the fastq sequence. For example, to align the first move with the first kmer `--sig_move_offset 0` should be passed. To align from the second move onwards, `--sig_move_offset 1` should be used.
 6. Pysam does not allow reading SAM/BAM files without a `@SQ` line in the header. Hence, `squigualiser reform` script might error out with `NotImplementedError: can not iterate over samfile without header`. Add a fake `@SQ` header line with a zero length reference as follows,
@@ -577,12 +581,12 @@ Check [here](docs/different_alignments.md) for an example.
 
 ![image](docs/figures/preview.png)
 
-1. The first read is a signal-read alignment using guppy_v.6.3.7 move table annotation ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.1.html)).
-2. The second read is a signal-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-2.1.html)).
-3. The third read is a signal-read alignment using the squigulator's simulated output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.11.html)).
-4. The fourth read (RNA) is a signal-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-3.2.html)).
+1. The first read is a signal-to-read alignment using guppy_v.6.3.7 move table annotation ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.1.html)).
+2. The second read is a signal-to-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-2.1.html)).
+3. The third read is a signal-to-read alignment using the squigulator's simulated output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.11.html)).
+4. The fourth read (RNA) is a signal-to-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-3.2.html)).
 
-* [This](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_reference/testcase-8.1.html) signal-reference alignment aligns a signal to the region `chr1:4270161-4271160`.
+* [This](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_reference/testcase-8.1.html) signal-to-reference alignment aligns a signal to the region `chr1:4270161-4271160`.
 * [This](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_reference/testcase-8.2.html) is the same plot with a fixed base width.
 
 These examples were generated using the testcases - `1.1, 2.1, 1.11,` and `3.2` respectively in [test_plot_signal_to_read.sh](test/test_plot_signal_to_read.sh).

diff --git a/docs/commands.md b/docs/commands.md
@@ -108,9 +108,10 @@ Plot read/reference - signal alignments.
 * `--sig_scale`
 	(optional) Plot the scaled signal.
 	By default, the signal is not scaled but converted to pA values.
-	Supported scalings are: [medmad, znorm].
+	Supported scalings are: [medmad, znorm, scaledpA].
 	`medmad` is median absolute deviation scaling.  
 	`znorm` is zscore normalization scaling.
+	`scaledpA` uses `sc` and `sh` tags to scale the raw signal to the pore model.
 	The implementation of each method can be found at `src/plot_utils.py/scale_signal()`
 * `--no_pa`
 	(optional) Do not convert the signal to pA levels. By default, the raw signal is converted to pA levels [default value: false].
@@ -132,6 +133,14 @@ Plot read/reference - signal alignments.
 	(optional) The maximum number of signal samples to draw on a plot [default value: 20000].
 * `--bed FILE`
 	(optional) The bed file with annotations.
+* `--no_colours`
+	(optional) hide base colours [default value: false].
+* `--no_samples`
+	(optional) hide sample points [default value: false].
+* `--save_svg`
+	(optional) save as svg. tweak --region and --xrange to capture the necessary part of the plot [default value: false].
+* `--xrange`
+	(optional) initial x range [default value: 350].
 
 ### plot_pileup
 
@@ -169,9 +178,10 @@ Plot reference - signal alignment pileups.
 * `--sig_scale`
 	(optional) Plot the scaled signal.
 	By default, the signal is not scaled but converted to pA values.
-	Supported scalings are: [medmad, znorm].
+	Supported scalings are: [medmad, znorm, scaledpA].
 	`medmad` is median absolute deviation scaling.  
 	`znorm` is zscore normalization scaling.
+	`scaledpA` uses `sc` and `sh` tags to scale the raw signal to the pore model.
 	The implementation of each method can be found at `src/plot_utils.py/scale_signal()`
 * `--no_pa`
 	(optional) Do not convert the signal to pA levels. By default, the raw signal is converted to pA levels [default value: false].
@@ -203,6 +213,12 @@ Plot reference - signal alignment pileups.
 	(optional) Create a log file using python cprofile profiler [default value: false].
 * `--return_plot`
 	(optional) Return the plot object without saving to output. This is used in `plot_tracks` tool [default value: false]. Cannot be used in conjunction with `--output_dir`.
+* `--no_colours`
+	(optional) hide base colours [default value: false].
+* `--save_svg`
+	(optional) save as svg. tweak --region and --xrange to capture the necessary part of the plot [default value: false].
+* `--xrange`
+	(optional) initial x range [default value: 350].
 
 ### plot_tracks
 
@@ -297,9 +313,10 @@ Instead of generating figures `metric` will generate statistics after parsing th
 * `--sig_scale`
 	(optional) Plot the scaled signal.
 	By default, the signal is not scaled but converted to pA values.
-	Supported scalings are: [medmad, znorm].
+	Supported scalings are: [medmad, znorm, scaledpA].
 	`medmad` is median absolute deviation scaling.  
 	`znorm` is zscore normalization scaling.
+	`scaledpA` uses `sc` and `sh` tags to scale the raw signal to the pore model.
 	The implementation of each method can be found at `src/plot_utils.py/scale_signal()`
 * `--no_pa`
 	(optional) Do not convert the signal to pA levels. By default, the raw signal is converted to pA levels [default value: false].

diff --git a/docs/figures/variants/chr22/fig3.png b/docs/figures/variants/chr22/fig3.png
diff --git a/docs/move_table.md b/docs/move_table.md
@@ -24,16 +24,32 @@ The second move corresponds with `2 x stride` signal points. The third with `4 x
 
 Different models have different strides. Strides from some of the guppy models are listed below.
 
-| Model                            | Stride |
-|----------------------------------|--------|
-| dna_r10.4.1_e8.2_400bps_fast.cfg | 5      |
-| dna_r10.4.1_e8.2_400bps_hac.cfg  | 5      |
-| dna_r10.4.1_e8.2_400bps_sup.cfg  | 5      |
-| dna_r9.4.1_450bps_fast_prom.cfg  | 5      |
-| dna_r9.4.1_450bps_hac_prom.cfg   | 5      |
-| dna_r9.4.1_450bps_sup_prom.cfg   | 5      |
-| rna_r9.4.1_70bps_hac_prom.cfg    | 10     |
-| rna_r9.4.1_70bps_fast_prom.cfg   | 12     |
+| Model                                 | Stride | Guppy version |
+|---------------------------------------|--------|---------------|
+| dna_r10.4.1_e8.2_400bps_fast.cfg      | 5      | v6.3.7        |
+| dna_r10.4.1_e8.2_400bps_hac.cfg       | 5      | v6.3.7        |
+| dna_r10.4.1_e8.2_400bps_sup.cfg       | 5      | v6.3.7        |
+| dna_r9.4.1_450bps_fast_prom.cfg       | 5      | v6.3.7        |
+| dna_r9.4.1_450bps_hac_prom.cfg        | 5      | v6.3.7        |
+| dna_r9.4.1_450bps_sup_prom.cfg        | 5      | v6.3.7        |
+| rna_r9.4.1_70bps_fast_prom.cfg        | 12     | v6.3.7        |
+| rna_r9.4.1_70bps_hac_prom.cfg         | 10     | v6.3.7        |
+|                                       |        |               |
+| dna_r10.4.1_e8.2_400bps_fast.cfg      | 5      | v6.5.7        |
+| dna_r10.4.1_e8.2_400bps_fast_prom.cfg | 5      | v6.5.7        |
+| dna_r10.4.1_e8.2_400bps_hac.cfg       | 5      | v6.5.7        |
+| dna_r10.4.1_e8.2_400bps_hac_prom.cfg  | 5      | v6.5.7        |
+| dna_r10.4.1_e8.2_400bps_sup.cfg       | 5      | v6.5.7        |
+| dna_r9.4.1_450bps_fast.cfg            | 5      | v6.5.7        |
+| dna_r9.4.1_450bps_fast_prom.cfg       | 5      | v6.5.7        |
+| dna_r9.4.1_450bps_hac.cfg             | 5      | v6.5.7        |
+| dna_r9.4.1_450bps_hac_prom.cfg        | 5      | v6.5.7        |
+| dna_r9.4.1_450bps_sup.cfg             | 5      | v6.5.7        |
+| dna_r9.4.1_450bps_sup_prom.cfg        | 5      | v6.5.7        |
+| rna_r9.4.1_70bps_fast_prom.cfg        | 12     | v6.5.7        |
+| rna_r9.4.1_70bps_fast.cfg             | 12     | v6.5.7        |
+| rna_r9.4.1_70bps_hac.cfg              | 10     | v6.5.7        |
+| rna_r9.4.1_70bps_hac_prom.cfg         | 10     | v6.5.7        |
 
 As of [dorado_v0.4.0](https://github.com/nanoporetech/dorado/tree/v0.4.0) stride is used in three stages.
 1. In the process of creating chunks to feed into the NN model ([code](https://github.com/nanoporetech/dorado/blob/b2af8e828a03d85448bb39ee5629660e6ef5e74f/dorado/read_pipeline/BasecallerNode.cpp#L78)).