diff --git a/docs/evaluate_motif.html b/docs/evaluate_motif.html new file mode 100644 index 0000000..54fe57a --- /dev/null +++ b/docs/evaluate_motif.html @@ -0,0 +1,251 @@ + + + + + + Evaluate and refine a table of known motifs - Modkit + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + +
+
+

Evaluate a table of known motifs

+

The modkit search command has an option to provide any number of known motifs with --know-motif. +If you already have a list of candidate motifs (e.f. from a previous run of modkit motif search) you can check these motifs quickly against a bedMethyl table with modkit motif evaluate.

+
modkit motif evaluate -i ${bedmethyl} --known-motifs-table motifs.tsv -r ${ref}
+
+

Similarly, the search algorithm can be run using known motifs as seeds:

+
modkit motif refine -i ${bedmethyl} --known-motifs-table motifs.tsv -r ${ref}
+
+

The output tables to both of these commands have the same schema:

+
+ + + + + + + + +
columnnamedescriptiontype
1mod_codecode specifying the modification found in the motifstr
2motifsequence of identified motif using IUPAC codesstr
3offset0-based offset into the motif sequence of the modified baseint
4frac_modfraction of time this sequence is found in the high modified set col-5 / (col-5 + col-6)float
5high_countnumber of occurances of this sequence in the high-modified setint
6low_countnumber of occurances of this sequence in the low-modified setint
7mid_countnumber of occurances of this sequence in the mid-modified setint
8log_oddslog2 odds of the motif being in the high-modified setint
+
+

In the human-readable table columns (1) and (2) are merged to show the modification code in the motif sequence context, the rest of the columns are the same as the machine-readable table.

+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + +
+ + diff --git a/docs/images/modkit_localise_ctcf_5mC.png b/docs/images/modkit_localise_ctcf_5mC.png new file mode 100644 index 0000000..f54c64f Binary files /dev/null and b/docs/images/modkit_localise_ctcf_5mC.png differ diff --git a/docs/intro_localize.html b/docs/intro_localize.html new file mode 100644 index 0000000..6b13300 --- /dev/null +++ b/docs/intro_localize.html @@ -0,0 +1,255 @@ + + + + + + Investigating patterns with localise - Modkit + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + +
+
+

Investigating patterns with localise

+

One a bedMethyl table has been created, modkit localise will use the pileup and calculate per-base modification aggregate information around genomic features of interest. +For example, we can investigate base modification patterns around CTCF binding sites.

+

+ 5mC patterns at CTCF sites +

+

The input requirements to modkit localise are simple:

+
    +
  1. BedMethyl table that has been bgzf-compressed and tabix-indexed
  2. +
  3. Regions file in BED format (plaintext).
  4. +
  5. Genome sizes tab-separated file: <chrom>\t<size_in_bp>
  6. +
+

an example command:

+
modkit localise ${bedmethyl} --regions ${ctcf} --genome-sizes ${sizes}
+
+

The output table has the following schema:

+
+ + + + + +
columnNameDescriptiontype
1mod codemodification code as present in the bedmethylstr
2offsetdistance in base pairs from the center of the genome features, negative values reflect towards the 5' of the genomeint
3n_validnumber of valid calls at this offset for this modification codeint
4n_modnumber of calls for this modification code at this offsetint
5percent_modifiedn_mod / n_valid * 100float
+
+

Optionally the --chart argument can be used to create HTML charts of the modification patterns.

+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + +
+ + diff --git a/docs/intro_motif.html b/docs/intro_motif.html new file mode 100644 index 0000000..050a6a9 --- /dev/null +++ b/docs/intro_motif.html @@ -0,0 +1,239 @@ + + + + + + Working with sequence motifs - Modkit + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + +
+
+

Working with sequence motifs

+

The modkit motif suite contains tools for discovery and exploration of short degenerate sequences (motifs) that may be enriched in a sample. +A common use case is to discover the motifs enriched for modification in a native bacterial sample which can give indication of methyltransferase enzymes present in the genomes present in the sample.

+

The following tools are available:

+
    +
  1. Find enriched motifs de novo from a bedMethyl with search.
  2. +
  3. evaluate or refine a table of known motifs
  4. +
  5. Making a motif BED file with motif bed
  6. +
+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + +
+ + diff --git a/docs/intro_stats.html b/docs/intro_stats.html new file mode 100644 index 0000000..f04e043 --- /dev/null +++ b/docs/intro_stats.html @@ -0,0 +1,265 @@ + + + + + + Calculating modification statistics in regions - Modkit + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + +
+
+

Calculating modification statistics in regions

+

There are many analysis operations available in modkit once you've generated a bedMethyl table. +One such operation is to calculate aggregation statistics on specific regions, for example in CpG islands or gene promoters. +The modkit stats command is designed for this purpose.

+
# these files can be found in the modkit repository
+cpgs=tests/resources/cpg_chr20_with_orig_names_selection.bed
+sample=tests/resources/lung_00733-m_adjacent-normal_5mc-5hmc_chr20_cpg_pileup.bed.gz
+modkit stats ${sample} --regions ${cpgs} -o ./stats.tsv [--mod-codes "h,m"]
+
+
+

Note that the argument --mod-codes can alternatively be passed multiple times, e.g. this is equivalent:
+--mod-codes c --mod-codes h

+
+

The output TSV has the following schema:

+
+ + + + + + + + +
columnNameDescriptiontype
1chromname of reference sequence from BAM headerstr
2start position0-based start positionint
3end position0-based exclusive end positionint
4namename of the region from input BED (. if not provided)str
5strandStrand (+, -, .) from the input BED (. assumed for when not provided)str
6+count_xtotal number of x base modification codes in the regionint
7+count_valid_xtotal valid calls for the primary base modified by code xint
8+percent_xcount_x / count_vali_x * 100float
+
+

Columns 6, 7, and 8 are repeated for each modification code found in the bedMethyl file or provided with --mod-codes argument.

+

An example output:

+
chrom  start     end       name     strand   count_h        count_valid_h  percent_h   count_m        count_valid_m  percent_m
+chr20  9838623   9839213   CpG: 47  .        12             1777           0.6752954   45             1777           2.532358
+chr20  10034962  10035266  CpG: 35  .        7              1513           0.46265697  0              1513           0
+chr20  10172120  10172545  CpG: 35  .        15             1229           1.2205045   28             1229           2.278275
+chr20  10217487  10218336  CpG: 59  .        29             2339           1.2398461   108            2339           4.617358
+chr20  10433628  10434345  CpG: 71  .        29             2750           1.0545455   2              2750           0.07272727
+chr20  10671925  10674963  CpG: 255 .        43             9461           0.45449743  24             9461           0.25367296
+
+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + +
+ +