-
Notifications
You must be signed in to change notification settings - Fork 42
/
Copy path12-high-level-genomic-functions.Rmd
200 lines (159 loc) · 8.23 KB
/
12-high-level-genomic-functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
```{r, echo = FALSE}
library(circlize)
```
# High-level genomic functions
In this chapter, several high-level functions which create tracks are introduced.
## Ideograms
`circos.initializeWithIdeogram()` initializes the circular plot and adds ideogram track
if the cytoband data is available. Actually, the ideograms are drawn by `circos.genomicIdeogram()`.
`circos.genomicIdeogram()` creates a small track of ideograms and can be used anywhere
in the circle. By default it adds ideograms for human genome hg19 (Figure \@ref(fig:ideogram)).
```{r ideogram, fig.cap = "Circular ideograms."}
circos.initializeWithIdeogram(plotType = c("labels", "axis"))
circos.track(ylim = c(0, 1))
circos.genomicIdeogram() # put ideogram as the third track
circos.genomicIdeogram(track.height = 0.2)
```
## Heatmaps {#genomic-heatmap}
Matrix which corresponds to genomic regions can be visualized as heatmaps. Heatmaps
completely fill the track and there are connection lines connecting heatmaps and original positions
in the genome. `circos.genomicHeatmap()` draws connection lines and heatmaps as two tracks
and combines them as an integrated track.
Generally, all numeric columns (excluding the first three columns) in the
input data frame are used to make the heatmap. Columns can also be specified
by `numeric.column` which is either an numeric vector or a character vector.
Colors can be specfied as a color matrix or a color mapping function generated
by `colorRamp2()`.
The height of the connection line track and the heatmap track can be set by `connection_height`
and `heatmap_height` arguments. Also parameters for the styles of lines and rectangle borders
can be adjusted, please check the help page of `circos.genomicHeatmap()`.
```{r genomic-heatmap1, eval = FALSE}
circos.initializeWithIdeogram()
bed = generateRandomBed(nr = 100, nc = 4)
col_fun = colorRamp2(c(-1, 0, 1), c("green", "black", "red"))
circos.genomicHeatmap(bed, col = col_fun, side = "inside", border = "white")
circos.clear()
```
In the left figure in Figure \@ref(fig:genomic-heatmap), the heatmaps are put inside the
normal genomic track. Heatmaps are also be put outside the normal genomic track by setting
`side = "outside"` (Figure \@ref(fig:genomic-heatmap), right).
```{r genomic-heatmap2, eval = FALSE}
circos.initializeWithIdeogram(plotType = NULL)
circos.genomicHeatmap(bed, col = col_fun, side = "outside",
line_col = as.numeric(factor(bed[[1]])))
circos.genomicIdeogram()
circos.clear()
```
```{r genomic-heatmap, echo = FALSE, fig.width = 8, fig.height = 4, fig.cap = "Genomic heamtaps."}
chunks <- knitr:::knit_code$get()
par(mfrow = c(1, 2))
eval(parse(text = chunks[["genomic-heatmap1"]]))
eval(parse(text = chunks[["genomic-heatmap2"]]))
```
## Labels
`circos.genomicLabels()` adds text labels for regions that are specified.
Positions of labels are automatically adjusted so that they do not
overlap to each other.
Similar as `circos.genomicHeatmap()`, `circos.genomicLabels()` also
creates two tracks where one for the connection lines and one for the
labels. You can set the height of the labels track to be the maximum
width of labels by `labels_height = max(strwidth(labels))`. `padding`
argument controls the gap between two neighbouring labels.
```{r genomic-labels1, eval = FALSE}
circos.initializeWithIdeogram()
bed = generateRandomBed(nr = 50, fun = function(k) sample(letters, k, replace = TRUE))
bed[1, 4] = "aaaaa"
circos.genomicLabels(bed, labels.column = 4, side = "inside")
circos.clear()
```
Similarlly, labels can be put outside of the normal genomic track (Figure \@ref(fig:genomic-labels) right).
```{r genomic-labels2, eval = FALSE}
circos.initializeWithIdeogram(plotType = NULL)
circos.genomicLabels(bed, labels.column = 4, side = "outside",
col = as.numeric(factor(bed[[1]])), line_col = as.numeric(factor(bed[[1]])))
circos.genomicIdeogram()
circos.clear()
```
```{r genomic-labels, echo = FALSE, fig.width = 8, fig.height = 4, fig.cap = "Genomic labels."}
chunks <- knitr:::knit_code$get()
par(mfrow = c(1, 2))
eval(parse(text = chunks[["genomic-labels1"]]))
eval(parse(text = chunks[["genomic-labels2"]]))
```
## Genomic axes
The genomic axes are not really high-level graphics, but it is better to also introduce
here. For `circos.initializeWithIdeogram()`, by default it draws axes with tick labels properly
formatted. The axes are internally implemented by `circos.genomicAxis()` and it can be
used to add genomic axes at any track (Figure \@ref(fig:genomic-axis)).
```{r genomic-axis, fig.cap = "Add genomic axes."}
circos.initializeWithIdeogram(plotType = NULL)
circos.genomicIdeogram()
# still work on the ideogram track
circos.track(track.index = get.current.track.index(), panel.fun = function(x, y) {
circos.genomicAxis(h = "top")
})
circos.track(ylim = c(0, 1), track.height = 0.1)
circos.track(track.index = get.current.track.index(), panel.fun = function(x, y) {
circos.genomicAxis(h = "bottom", direction = "inside")
})
circos.clear()
```
## Genomic density and Rainfall plot
Rainfall plots are used to visualize the distribution of genomic regions in the genome.
Rainfall plots are particularly useful to identify clusters of regions. In the
rainfall plot, each dot represents a region. The x-axis corresponds to the
genomic coordinate, and the y-axis corresponds to the minimal distance (log10
transformed) of the region to its two neighbouring regions. A cluster of regions will
appear as a “rainfall” in the plot.
`circos.genomicRainfall()` calculates neighbouring distance for each region
and draw as points on the plot. Since `circos.genomicRainfall()` generates data on
y-direction (`log10(distance)`), it is actually a high-level function which
creates a new track.
The input data can be a single data frame or a list of data frames.
```{r, eval = FALSE}
circos.genoimcRainfall(bed)
circos.genoimcRainfall(bed_list, col = c("red", "green"))
```
However, if the amount of regions in a cluster is high, dots will overlap, and
direct assessment of the number and density of regions in the cluster will be
impossible. To overcome this limitation, additional tracks are added which
visualize the genomic density of regions (defined as the fraction of a genomic
window that is covered by genomic regions).
`circos.genomicDensity()` calculates how much a genomic window is covered by
regions in `bed`. It is also a high-level function and creates a new track.
The input data can be a single data frame or a list of data frames.
```{r, eval = FALSE}
circos.genomicDensity(bed)
circos.genomicDensity(bed, baseline = 0)
circos.genomicDensity(bed, window.size = 1e6)
circos.genomicDensity(bedlist, col = c("#FF000080", "#0000FF80"))
```
Following example makes a rainfall plot for differentially methylated regions
(DMR) and their genomic densities. In the plot, red corresponds to hyper-methylated
DMRs (gain of methylation) and blue corresponds to hypo-methylated
DMRs (loss of methylation). You may see how the combination of rainfall track
and genomic density track helps to get a more precise inference of the
distribution of DMRs on genome (Figure \@ref(fig:genomic-rainfall)).
```{r genomic-rainfall, fig.width = 6, fig.height = 6, fig.cap = "Genomic rainfall plot and densities."}
load(system.file(package = "circlize", "extdata", "DMR.RData"))
circos.initializeWithIdeogram(chromosome.index = paste0("chr", 1:22))
bed_list = list(DMR_hyper, DMR_hypo)
circos.genomicRainfall(bed_list, pch = 16, cex = 0.4, col = c("#FF000080", "#0000FF80"))
circos.genomicDensity(DMR_hyper, col = c("#FF000080"), track.height = 0.1)
circos.genomicDensity(DMR_hypo, col = c("#0000FF80"), track.height = 0.1)
circos.clear()
```
`circos.genomicDensity()` also supports to calculate the overlap as the **number** of the regions
that overlap to each window by setting `count_by = "number"`.
```{r genomic-rainfall2, fig.width = 6, fig.height = 6, fig.cap = "Genomic densities."}
circos.initializeWithIdeogram(chromosome.index = paste0("chr", 1:22))
circos.genomicDensity(DMR_hyper, col = c("#FF000080"), track.height = 0.1)
circos.genomicDensity(DMR_hyper, col = c("#FF000080"), count_by = "number", track.height = 0.1)
circos.clear()
```
Internally, `rainfallTransform()` and `genomicDensity()` are used to the neighbrouing distance
and the genomic density values.
```{r}
head(rainfallTransform(DMR_hyper))
head(genomicDensity(DMR_hyper, window.size = 1e6))
```