Skip to content

Commit

Permalink
Merge pull request #54 from hiruna72/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
hiruna72 authored Mar 5, 2024
2 parents c4ddfe5 + d2f07fd commit 10b26be
Show file tree
Hide file tree
Showing 18 changed files with 725 additions and 387 deletions.
46 changes: 25 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ Watch [the video](https://youtu.be/kClYH4KpOjk) to learn a few tricks to get the
![PyPI Downloads](https://img.shields.io/pypi/dm/squigualiser?label=pypi%20downloads)
[![PyPI](https://img.shields.io/pypi/v/squigualiser.svg?style=flat)](https://pypi.python.org/pypi/squigualiser)
[![Snake CI](https://github.com/hiruna72/squigualiser/actions/workflows/snake.yml/badge.svg)](https://github.com/hiruna72/squigualiser/actions/workflows/snake.yml)
[![GitHub Downloads](https://img.shields.io/github/downloads/hiruna72/squigualiser/total?logo=GitHub)](https://github.com/hiruna72/squigualiser/releases)
<!---
[![BioConda Install](https://img.shields.io/conda/dn/bioconda/slow5tools.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/slow5tools)
-->

Squigualiser preprint - https://www.biorxiv.org/content/10.1101/2024.02.19.581111v2

Expand All @@ -21,11 +25,11 @@ Squigualiser preprint - https://www.biorxiv.org/content/10.1101/2024.02.19.58111
# Table of Contents
1. [Quickstart](#quickstart)
2. [Advanced Setup](#advanced-setup)
3. [Signal to read visualisation](#signal-to-read-visualisation)
3. [Signal-to-read visualisation](#signal-to-read-visualisation)
1. [Option 1 - Using f5c resquiggle](#option-1---f5c-resquiggle)
2. [Option 2 - Using basecaller move table](#option-2---basecaller-move-table)
3. [Option 3 - Using squigulator signal simulation](#option-3---squigulator-signal-simulation)
5. [Signal to reference visualisation](#signal-to-reference-visualisation)
5. [Signal-to-reference visualisation](#signal-to-reference-visualisation)
1. [Option 1 - Using f5c eventalign](#option-1-f5c-eventalign)
2. [Option 2 - Using basecaller move table](#option-2---basecaller-move-table-1)
3. [Option 3 - Using squigulator signal simulation](#option-3---squigulator-signal-simulation-1)
Expand Down Expand Up @@ -179,13 +183,13 @@ SLOW5 files compressed with *zstd* offer smaller file size and better performanc
</div>
</details>

## Signal to read visualisation
## Signal-to-read visualisation

This section explains how you can use squigualiser to visualise a raw signal alignment against its basecalled read. Click on the arrow to expand the revalent method.

#### Option 1 - f5c resquiggle
<details>
<summary>Steps for using f5c resquiggle signal-read alignment</summary>
<summary>Steps for using f5c resquiggle signal-to-read alignment</summary>
<div markdown=1>

1. Install f5c [v1.3 or higher](https://github.com/hasindu2008/f5c/releases) as explained in [f5c documentation](https://github.com/hasindu2008/f5c/#quick-start).
Expand All @@ -202,7 +206,7 @@ f5c resquiggle -c ${FASTQ} ${SIGNAL_FILE} -o ${ALIGNMENT}
* Refer [Note(2)](#notes) for more information about `--kmer-model [KMER_MODEL]`, which is optional.
* Refer [Note(3)](#notes) for more information about RNA.

3. Plot signal to read alignment
3. Plot signal-to-read alignment

````
OUTPUT_DIR=output_dir
Expand Down Expand Up @@ -244,7 +248,7 @@ squigualiser reform --sig_move_offset 0 --kmer_length 1 -c --bam basecalls.sam -
* Refer [Note(5)](#notes) for a description about `sig_move_offset`.
* Refer [Note(6)](#notes) for handling a potential SAM/BAM error.

3. Plot signal to read alignment
3. Plot signal-to-read alignment

````
FASTA_FILE=read.fasta
Expand Down Expand Up @@ -272,13 +276,13 @@ squigualiser plot --file ${FASTA_FILE} --slow5 ${SIGNAL_FILE} --alignment ${ALIG
````
REF=ref.fasta #reference
READ=sim.fasta #simulated reads
ALIGNMENT=sim.paf #contains signal-read alignment
ALIGNMENT=sim.paf #contains signal-to-read alignment
SIGNAL_FILE=sim.blow5 #simultated raw signal data
squigulator -x dna-r10-prom ${REF} -n 1 -o ${SIGNAL_FILE} -q ${READ} -c ${ALIGNMENT} # instead of dna-r10-prom, you can specify any other profile
````

3. Plot signal to read alignment.
3. Plot signal-to-read alignment.

````
OUTPUT_DIR=output_dir
Expand All @@ -288,7 +292,7 @@ squigualiser plot -f ${READ} -s ${SIGNAL_FILE} -a ${ALIGNMENT} -o ${OUTPUT_DIR}
</div>
</details>

## Signal to reference visualisation
## Signal-to-reference visualisation

This section explains how you can use squigualiser to visualise a raw signal alignment against a reference. Click on the arrow to expand the relevant method.

Expand Down Expand Up @@ -392,7 +396,7 @@ REALIGN_BAM=realign_output.bam
squigualiser realign --bam ${MAPP_SAM} --paf ${REFORMAT_PAF} -o ${REALIGN_BAM}
````

5. Plot signal to reference alignment
5. Plot signal-to-reference alignment

````
REGION=chr1:6811404-6811443
Expand Down Expand Up @@ -423,7 +427,7 @@ NUM_READS=50 #number of reads to simulate
squigulator -x dna-r10-prom ${REF} -o ${SIGNAL_FILE} -a sim.sam -n ${NUM_READS} && samtools sort sim.sam -o ${ALIGNMENT} && samtools index ${ALIGNMENT}
```

3. Plot signal to reference alignment.
3. Plot signal-to-reference alignment.
````
OUTPUT_DIR=output_dir
REGION=chr1:6811404-6811443
Expand Down Expand Up @@ -466,10 +470,10 @@ tabix -0 -b 9 -e 8 -s 6 ${ALIGNMENT}
![image](docs/figures/pileup/pileup_plot.png)

Similar to IGV pileup view now you can view the signal pileup view. To create a pileup view the following conditions should be met.
1. The plot is a signal to reference visualisation, not a signal to read.
1. The plot is a signal-to-reference visualisation, not a signal-to-read.
2. A genomic region should be specified using the argument `--region`

First, create an alignment file by following the steps mentioned in [Signal to reference visualisation](#signal-to-reference-visualisation)
First, create an alignment file by following the steps mentioned in [Signal-to-reference visualisation](#signal-to-reference-visualisation)

````
REGION=chr1:6811011-6811198
Expand Down Expand Up @@ -541,7 +545,7 @@ However, the user is adviced to use `--profile` (documented [here](docs/profiles
For more information please refer [base_shift and eventalignment](docs/base_shift_and_eventalignment.md) and [base_shift and reverse mapped reads](docs/base_shift_of_reverse_mapped_reads.md).

### Signal scaling
The commands `plot` and `plot_pileup` can take the argument `--sig_scale`. By providing the argument `--sig_scale znorm` or `--sig_scale medmad` the signals will be zscore or median MAD normalized respectively.
The commands `plot` and `plot_pileup` can take the argument `--sig_scale`. Provide the argument `--sig_scale znorm` to zscore normalize, `--sig_scale medmad` to median MAD normalize, and `--sig_scale scaledpA` to scale the raw signal to the pore model.


## Plot Conventions
Expand All @@ -553,15 +557,15 @@ The commands `plot` and `plot_pileup` can take the argument `--sig_scale`. By pr


## Calculate alignment statistics
Calculate basic statistics of read/reference - signal alignments.
Calculate basic statistics of signal-to-read/reference alignments.
Check [here](docs/commands.md/#metric) for the command.
Check [here](docs/different_alignments.md) for an example.

## Notes

1. If your FASTQ file is a multi-line file (not to confuse with multi-read), then install [seqtk](https://github.com/lh3/seqtk) and use `seqtk seq -l0 in.fastq > out.fastq` to convert multi-line FASTQ to 4-line FASTQ.
2. The optional argument `--kmer-model KMER_MODEL` can be used to specify a custom k-mer model if you wish.
3. To plot RNA signal-read alignment use the alignment file created using `f5c resquiggle --rna -c ${FASTQ} ${SIGNAL_FILE} -o ${ALIGNMENT}`. Also, provide the argument `--rna` to the visualising command. Currently, there exists no RNA kmer model for r10.4.1 chemistry.
3. To plot RNA signal-to-read alignment use the alignment file created using `f5c resquiggle --rna -c ${FASTQ} ${SIGNAL_FILE} -o ${ALIGNMENT}`. Also, provide the argument `--rna` to the visualising command. Currently, there exists no RNA kmer model for r10.4.1 chemistry.
4. The input alignment format accepted by `squigualiser plot` is explained [here](https://hasindu2008.github.io/f5c/docs/output#resquiggle). This standard format made plotting a lot easier.
5. The argument `sig_move_offset` is the number of moves `n` to skip in the signal to correct the start of the alignment. This will not skip bases in the fastq sequence. For example, to align the first move with the first kmer `--sig_move_offset 0` should be passed. To align from the second move onwards, `--sig_move_offset 1` should be used.
6. Pysam does not allow reading SAM/BAM files without a `@SQ` line in the header. Hence, `squigualiser reform` script might error out with `NotImplementedError: can not iterate over samfile without header`. Add a fake `@SQ` header line with a zero length reference as follows,
Expand All @@ -577,12 +581,12 @@ Check [here](docs/different_alignments.md) for an example.

![image](docs/figures/preview.png)

1. The first read is a signal-read alignment using guppy_v.6.3.7 move table annotation ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.1.html)).
2. The second read is a signal-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-2.1.html)).
3. The third read is a signal-read alignment using the squigulator's simulated output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.11.html)).
4. The fourth read (RNA) is a signal-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-3.2.html)).
1. The first read is a signal-to-read alignment using guppy_v.6.3.7 move table annotation ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.1.html)).
2. The second read is a signal-to-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-2.1.html)).
3. The third read is a signal-to-read alignment using the squigulator's simulated output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-1.11.html)).
4. The fourth read (RNA) is a signal-to-read alignment using f5c resquiggle output ([link](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_read/testcase-3.2.html)).

* [This](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_reference/testcase-8.1.html) signal-reference alignment aligns a signal to the region `chr1:4270161-4271160`.
* [This](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_reference/testcase-8.1.html) signal-to-reference alignment aligns a signal to the region `chr1:4270161-4271160`.
* [This](https://hiruna72.github.io/squigualiser/docs/figures/sig_to_reference/testcase-8.2.html) is the same plot with a fixed base width.

These examples were generated using the testcases - `1.1, 2.1, 1.11,` and `3.2` respectively in [test_plot_signal_to_read.sh](test/test_plot_signal_to_read.sh).
Expand Down
23 changes: 20 additions & 3 deletions docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,10 @@ Plot read/reference - signal alignments.
* `--sig_scale`
(optional) Plot the scaled signal.
By default, the signal is not scaled but converted to pA values.
Supported scalings are: [medmad, znorm].
Supported scalings are: [medmad, znorm, scaledpA].
`medmad` is median absolute deviation scaling.
`znorm` is zscore normalization scaling.
`scaledpA` uses `sc` and `sh` tags to scale the raw signal to the pore model.
The implementation of each method can be found at `src/plot_utils.py/scale_signal()`
* `--no_pa`
(optional) Do not convert the signal to pA levels. By default, the raw signal is converted to pA levels [default value: false].
Expand All @@ -132,6 +133,14 @@ Plot read/reference - signal alignments.
(optional) The maximum number of signal samples to draw on a plot [default value: 20000].
* `--bed FILE`
(optional) The bed file with annotations.
* `--no_colours`
(optional) hide base colours [default value: false].
* `--no_samples`
(optional) hide sample points [default value: false].
* `--save_svg`
(optional) save as svg. tweak --region and --xrange to capture the necessary part of the plot [default value: false].
* `--xrange`
(optional) initial x range [default value: 350].

### plot_pileup

Expand Down Expand Up @@ -169,9 +178,10 @@ Plot reference - signal alignment pileups.
* `--sig_scale`
(optional) Plot the scaled signal.
By default, the signal is not scaled but converted to pA values.
Supported scalings are: [medmad, znorm].
Supported scalings are: [medmad, znorm, scaledpA].
`medmad` is median absolute deviation scaling.
`znorm` is zscore normalization scaling.
`scaledpA` uses `sc` and `sh` tags to scale the raw signal to the pore model.
The implementation of each method can be found at `src/plot_utils.py/scale_signal()`
* `--no_pa`
(optional) Do not convert the signal to pA levels. By default, the raw signal is converted to pA levels [default value: false].
Expand Down Expand Up @@ -203,6 +213,12 @@ Plot reference - signal alignment pileups.
(optional) Create a log file using python cprofile profiler [default value: false].
* `--return_plot`
(optional) Return the plot object without saving to output. This is used in `plot_tracks` tool [default value: false]. Cannot be used in conjunction with `--output_dir`.
* `--no_colours`
(optional) hide base colours [default value: false].
* `--save_svg`
(optional) save as svg. tweak --region and --xrange to capture the necessary part of the plot [default value: false].
* `--xrange`
(optional) initial x range [default value: 350].

### plot_tracks

Expand Down Expand Up @@ -297,9 +313,10 @@ Instead of generating figures `metric` will generate statistics after parsing th
* `--sig_scale`
(optional) Plot the scaled signal.
By default, the signal is not scaled but converted to pA values.
Supported scalings are: [medmad, znorm].
Supported scalings are: [medmad, znorm, scaledpA].
`medmad` is median absolute deviation scaling.
`znorm` is zscore normalization scaling.
`scaledpA` uses `sc` and `sh` tags to scale the raw signal to the pore model.
The implementation of each method can be found at `src/plot_utils.py/scale_signal()`
* `--no_pa`
(optional) Do not convert the signal to pA levels. By default, the raw signal is converted to pA levels [default value: false].
Expand Down
Binary file added docs/figures/variants/chr22/fig3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 26 additions & 10 deletions docs/move_table.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,32 @@ The second move corresponds with `2 x stride` signal points. The third with `4 x

Different models have different strides. Strides from some of the guppy models are listed below.

| Model | Stride |
|----------------------------------|--------|
| dna_r10.4.1_e8.2_400bps_fast.cfg | 5 |
| dna_r10.4.1_e8.2_400bps_hac.cfg | 5 |
| dna_r10.4.1_e8.2_400bps_sup.cfg | 5 |
| dna_r9.4.1_450bps_fast_prom.cfg | 5 |
| dna_r9.4.1_450bps_hac_prom.cfg | 5 |
| dna_r9.4.1_450bps_sup_prom.cfg | 5 |
| rna_r9.4.1_70bps_hac_prom.cfg | 10 |
| rna_r9.4.1_70bps_fast_prom.cfg | 12 |
| Model | Stride | Guppy version |
|---------------------------------------|--------|---------------|
| dna_r10.4.1_e8.2_400bps_fast.cfg | 5 | v6.3.7 |
| dna_r10.4.1_e8.2_400bps_hac.cfg | 5 | v6.3.7 |
| dna_r10.4.1_e8.2_400bps_sup.cfg | 5 | v6.3.7 |
| dna_r9.4.1_450bps_fast_prom.cfg | 5 | v6.3.7 |
| dna_r9.4.1_450bps_hac_prom.cfg | 5 | v6.3.7 |
| dna_r9.4.1_450bps_sup_prom.cfg | 5 | v6.3.7 |
| rna_r9.4.1_70bps_fast_prom.cfg | 12 | v6.3.7 |
| rna_r9.4.1_70bps_hac_prom.cfg | 10 | v6.3.7 |
| | | |
| dna_r10.4.1_e8.2_400bps_fast.cfg | 5 | v6.5.7 |
| dna_r10.4.1_e8.2_400bps_fast_prom.cfg | 5 | v6.5.7 |
| dna_r10.4.1_e8.2_400bps_hac.cfg | 5 | v6.5.7 |
| dna_r10.4.1_e8.2_400bps_hac_prom.cfg | 5 | v6.5.7 |
| dna_r10.4.1_e8.2_400bps_sup.cfg | 5 | v6.5.7 |
| dna_r9.4.1_450bps_fast.cfg | 5 | v6.5.7 |
| dna_r9.4.1_450bps_fast_prom.cfg | 5 | v6.5.7 |
| dna_r9.4.1_450bps_hac.cfg | 5 | v6.5.7 |
| dna_r9.4.1_450bps_hac_prom.cfg | 5 | v6.5.7 |
| dna_r9.4.1_450bps_sup.cfg | 5 | v6.5.7 |
| dna_r9.4.1_450bps_sup_prom.cfg | 5 | v6.5.7 |
| rna_r9.4.1_70bps_fast_prom.cfg | 12 | v6.5.7 |
| rna_r9.4.1_70bps_fast.cfg | 12 | v6.5.7 |
| rna_r9.4.1_70bps_hac.cfg | 10 | v6.5.7 |
| rna_r9.4.1_70bps_hac_prom.cfg | 10 | v6.5.7 |

As of [dorado_v0.4.0](https://github.com/nanoporetech/dorado/tree/v0.4.0) stride is used in three stages.
1. In the process of creating chunks to feed into the NN model ([code](https://github.com/nanoporetech/dorado/blob/b2af8e828a03d85448bb39ee5629660e6ef5e74f/dorado/read_pipeline/BasecallerNode.cpp#L78)).
Expand Down
Loading

0 comments on commit 10b26be

Please sign in to comment.