-
Notifications
You must be signed in to change notification settings - Fork 42
/
Copy path01-introduction.Rmd
341 lines (291 loc) · 16 KB
/
01-introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
# (PART) General Functionality {-}
# Introduction {#introduction}
Circular layout is very useful to represent complicated information. First, it
elegantly represents information with long axes or a large amount of
categories; second, it intuitively shows data with multiple tracks focusing on
the same object; third, it easily demonstrates relations between elements. It
provides an efficient way to arrange information on the circle and it is
beautiful.
[**Circos**](http://circos.ca) is a pioneer tool widely used for circular
layout representations implemented in _Perl_. It greatly enhances the
visualization of scientific results (especially in Genomics field). Thus,
plots with circular layout are normally named as **"circos plot"**. Here the
**circlize** package aims to implement **Circos** in R. One important
advantage of the implementation in R is that R is an ideal environment which
provides seamless connection between data analysis and data visualization.
**circlize** is not a front-end wrapper to generate configuration files for
**Circos**, but completely coded in R style by using R's elegant statistical
and graphic engine. We aim to keep the flexibility and configurability of
**Circos**, but also make the package more straightforward to use and enhance
it to support more types of graphics.
In this book, chapters in Part I give detailed overviews of the general **circlize**
functionalities. Part II introduces functions specifically designed for
visualizing genomic datasets. Part III gives comprehensive guides on
visualizing relationships by Chord diagram.
## Principle of design
A circular layout is composed of _sectors_ and _tracks_. For data in different
categories, they are allocated into different sectors, and for multiple
measurements on the same category, they are represented as stacked tracks from
outside of the circle to the inside. The intersection of a sector and a track
is called a _cell_ (or a grid, a panel), which is the basic unit in a circular
layout. It is an imaginary plotting region for drawing data points.
Since most of the figures are composed of simple graphics, such as points,
lines, polygon, **circlize** implements low-level graphic functions for adding
graphics in the circular plotting regions, so that more complicated graphics
can be easily generated by different combinations of low-level graphic
functions. This principle ensures the generality that types of high-level
graphics are not restricted by the software itself and high-level packages
focusing on specific interests can be built on it.
Currently there are following low-level graphic functions that can be used for
adding graphics. The usage is very similar to the functions without `circos.`
prefix from the base graphic engine, except there are some enhancement
specifically designed for circular visualization.
- `circos.points()`: adds points in a cell.
- `circos.lines()`: adds lines in a cell.
- `circos.segments()`: adds segments in a cell.
- `circos.rect()`: adds rectangles in a cell.
- `circos.polygon()`: adds polygons in a cell.
- `circos.text()`: adds text in a cell.
- `circos.axis()` ands `circos.yaxis()`: add axis in a cell.
Following function draws links between two positions in the circle:
- `circos.link()`
Following functions draw high-level graphics:
- `circos.barplot()`: draw barplots.
- `circos.boxplot()`: draw boxplots.
- `circos.violin()`: draws violin plots.
- `circos.heatmap()`: draw circular heatmaps.
- `circos.raster()`: draw raster images.
- `circos.arrow()`: draw circular arrows.
Following functions arrange the circular layout.
- `circos.initialize()`: allocates sectors on the circle.
- `circos.track()`: creates plotting regions for cells in one single track.
- `circos.update()`: updates an existed cell.
- `circos.par()`: graphic parameters.
- `circos.info()`: prints general parameters of current circular plot.
- `circos.clear()`: resets graphic parameters and internal variables.
Thus, theoretically, you are able to draw most kinds of circular figures by
the above functionalities. Figure \@ref(fig:circlize-example) lists several
complex circular plots made by **circlize**. After going through this
book, you will definitely be able to implement yours.
```{r circlize-example, echo = FALSE, fig.cap = "Examples by circlize", out.width = "100%"}
knitr::include_graphics("images/ciclize_examples.jpg")
```
## A quick glance {#a-qiuck-glance}
Before we go too deep into the details, I first demonstrate a simple example
with using basic functionalities in the **circlize** package to help you get
a basic idea of how the package works.
First let's generate some random data. There needs to be a character vector to
represent categories, a numeric vector of x values and a vector of y values.
```{r}
set.seed(999)
n = 1000
df = data.frame(sectors = sample(letters[1:8], n, replace = TRUE),
x = rnorm(n), y = runif(n))
```
First, we initialize the circular layout. The circle is split into sectors
based on the data range on x-axes in each category. In following code, `df$x`
is split by `df$sectors` and the widths of sectors are automatically calculated
based on data ranges in each category. By default, sectors are positioned
starting from $\theta = 0$ (in the polar coordinate system) and go clockwise
around the circle. You will not see anything after running the following code
because no track has been added yet.
```{r circlize_glance_0, eval = FALSE, echo = -3}
library(circlize)
circos.par("track.height" = 0.1)
circos.par("points.overflow.warning" = FALSE)
circos.initialize(df$sectors, x = df$x)
```
We set a global parameter `track.height` to 0.1 by the option function
`circos.par()` so that all tracks which will be added have a default height of
0.1. The circle used by **circlize** always has a radius of 1, thus a height of
0.1 means 10% of the circle radius. In later chapters, you will learn how to set the
height with physical units, e.g. cm.
Note that the sector placement only needs values in the x direction (the circular
direction), values in the y direction (radial direction) will be used
in the step of creating tracks.
After the circular layout is initialized, graphics can be added to the plot in
a track-by-track manner. Before drawing anything, we need to know that all
tracks should be first created by `circos.trackPlotRegion()` or, for short,
`circos.track()`, then the low-level functions can be added afterwards. Just
like in the base R graphic engine where you first need to call `plot()` and then
you can use functions such as `points()` and `lines()` to add graphics. Since x-ranges
for cells in the track have already been defined in the initialization step,
here we only need to specify the y-range for each cell. The y-ranges can be
specified by `y` argument as a numeric vector (so that y-range will be
automatically extracted and calculated in each cell) or `ylim` argument as a
vector of length two. In principle, y-ranges should be same for all cells in a
same track. (See Figure \@ref(fig:circlize-glance-track-1))
```{r circlize_glance_1, eval=FALSE}
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
col = rep(c("#FF0000", "#00FF00"), 4)
circos.trackPoints(df$sectors, df$x, df$y, col = col, pch = 16, cex = 0.5)
circos.text(-1, 0.5, "text", sector.index = "a", track.index = 1)
```
```{r circlize-glance-track-1, echo = FALSE, fig.cap = "First example of circlize, add the first track."}
chunks <- knitr:::knit_code$get()
eval(parse(text = chunks[["circlize_glance_0"]]))
eval(parse(text = chunks[["circlize_glance_1"]]))
circos.clear()
```
Axes of the circular plot are normally drawn on the outermost side of the
track. Here we add axes in the first track by putting `circos.axis()` inside
the self-defined function `panel.fun` (see the code above). `circos.track()`
creates plotting region in a cell-by-cell manner and the `panel.fun` is
actually executed immediately after the plotting region for a certain cell is
created. Thus, `panel.fun` actually means adding graphics in the "current
cell" (Usage of `panel.fun` is further discussed in Section \@ref(panel-fun)).
By default, `circos.axis()` draws x-axes on top of each cell (or the outside
of each cell).
Sector name outside the first track is added using `circos.text()`.
`CELL_META` provides "meta information" for the current cell. There are
several parameters which can be retrieved from `CELL_META`. Its usage is
explained in Section \@ref(panel-fun). In above code, the sector names are
drawn outside the cells. You may see warning messages saying that data points
are exceeding the plotting regions. That is totally fine and do not worry about it.
You can also add sector names by creating an empty track without borders as the
first track and add sector names in it (like what
`circos.initializeWithIdeogram()` and `chordDiagram()` do, explained in the
following chapters).
When specifying the position of text in the y direction, an offset of
`mm_y(5)` (5mm) is added to the y position of the text. In `circos.text()`, x and y
values are measured in the data coordinate (the coordinate in cell), and there
are some helper functions that convert absolute units to corresponding values
in data coordinate. Section
\@ref(convert-functions) provides more information of converting units between
different coordinates.
After the track is created, points are added to the first track by
`circos.trackPoints()`. `circos.trackPoints()` simply adds points in all cells
simultaneously. As further explained in Section \@ref(points), it can be
replaced by putting `circos.text()` in `panel.fun`, however,
`circos.trackPoints()` would be more convenient if you need only points
in the cells (but I don't really recommend it). It is quite straightforward to understand that this
function needs a categorical variable (`df$sectors`), values on x direction
and y direction (`df$x` and `df$y`).
Low-level functions such as `circos.text()` can also be used outside
`panel.fun` as shown in the above code. If so, `sector.index` and `track.index`
need to be specified explicitly, because the "current" sector and "current"
track may not be what you want. If the graphics are directly added to the most recently created
track, `track.index` can be ommitted, because
this track is the one marked as the "current" track.
OK, now we add histograms to the second track. Here `circos.trackHist()` is a
high-level function, which means it creates a new track (as you can imagine, `hist()`
is also a high-level function). `bin.size` is explicitly set so that the bin
size for histograms in all cells are the same and can be compared to each
other (see Figure \@ref(fig:circlize-glance-track-2)).
```{r circlize_glance_2, eval=FALSE}
bgcol = rep(c("#EFEFEF", "#CCCCCC"), 4)
circos.trackHist(df$sectors, df$x, bin.size = 0.2, bg.col = bgcol, col = NA)
```
```{r circlize-glance-track-2, echo = FALSE, fig.cap = "First example of circlize, add the second track."}
eval(parse(text = chunks[["circlize_glance_0"]]))
eval(parse(text = chunks[["circlize_glance_1"]]))
eval(parse(text = chunks[["circlize_glance_2"]]))
circos.clear()
```
In the third track and in `panel.fun`, we randomly pick 10 data points in
each cell, sort them by x-values and connect them with lines. In following
code, when `sectors` (the first unnamed argument), `x` and `y` arguments are set in
`circos.track()`, x values and y values are split by `df$sectors` and
corresponding subset of x and y values are sent to `panel.fun` through
`panel.fun`'s `x` and `y` arguments. Thus, `x` an `y` in `panel.fun` are
exactly the values in the "current" cell. (See Figure
\@ref(fig:circlize-glance-track-3))
```{r circlize_glance_3, eval=FALSE}
circos.track(df$sectors, x = df$x, y = df$y,
panel.fun = function(x, y) {
ind = sample(length(x), 10)
x2 = x[ind]
y2 = y[ind]
od = order(x2)
circos.lines(x2[od], y2[od])
})
```
```{r circlize-glance-track-3, echo = FALSE, fig.cap = "First example of circlize, add the third track."}
eval(parse(text = chunks[["circlize_glance_0"]]))
eval(parse(text = chunks[["circlize_glance_1"]]))
eval(parse(text = chunks[["circlize_glance_2"]]))
eval(parse(text = chunks[["circlize_glance_3"]]))
circos.clear()
```
Now we go back to the second track and update the cell in sector "d".
This is done by `circos.updatePlotRegion()` or the short version
`circos.update()`. The function erases the current graphics in the cell.
`circos.update()` can not modify the `xlim` and `ylim` of the cell as well as
other settings related to the position of the cell. `circos.update()` needs
to explicitly specify the sector index and track index unless the "current"
cell is what you want to update. After calling `circos.update()`,
the "current" cell is redirected to the cell you just specified and you
can use low-level graphic functions to add graphics directly into it
(see Figure \@ref(fig:circlize-glance-track-update)).
```{r circlize_glance_3_update, eval=FALSE}
circos.update(sector.index = "d", track.index = 2,
bg.col = "#FF8080", bg.border = "black")
circos.points(x = -2:2, y = rep(0.5, 5), col = "white")
circos.text(CELL_META$xcenter, CELL_META$ycenter, "updated", col = "white")
```
```{r circlize-glance-track-update, echo = FALSE, fig.cap = "First example of circlize, update the second track."}
eval(parse(text = chunks[["circlize_glance_0"]]))
eval(parse(text = chunks[["circlize_glance_1"]]))
eval(parse(text = chunks[["circlize_glance_2"]]))
eval(parse(text = chunks[["circlize_glance_3"]]))
eval(parse(text = chunks[["circlize_glance_3_update"]]))
circos.clear()
```
Next we continue to create new tracks. Although we have gone back to the
second track, our newly created track is added as
the innermost track. In this new track, we add heatmaps by
`circos.rect()`. Note that here we haven't set the input data, only
the `ylim` argument, because heatmaps just fill the whole cell from the left
to the right and from the bottom to the top. Also the exact value of `ylim` is not
important and `x`, `y` in `panel.fun()` are not used (in fact, they are both
`NULL`) (see Figure \@ref(fig:circlize-glance-track-4)).
```{r circlize_glance_4, eval=FALSE}
circos.track(ylim = c(0, 1), panel.fun = function(x, y) {
xlim = CELL_META$xlim
ylim = CELL_META$ylim
breaks = seq(xlim[1], xlim[2], by = 0.1)
n_breaks = length(breaks)
circos.rect(breaks[-n_breaks], rep(ylim[1], n_breaks - 1),
breaks[-1], rep(ylim[2], n_breaks - 1),
col = rand_color(n_breaks), border = NA)
})
```
```{r circlize-glance-track-4, echo = FALSE, fig.cap = "First example of circlize, add the fourth track."}
eval(parse(text = chunks[["circlize_glance_0"]]))
eval(parse(text = chunks[["circlize_glance_1"]]))
eval(parse(text = chunks[["circlize_glance_2"]]))
eval(parse(text = chunks[["circlize_glance_3"]]))
eval(parse(text = chunks[["circlize_glance_3_update"]]))
eval(parse(text = chunks[["circlize_glance_4"]]))
circos.clear()
```
In the inside of the circle, either links or ribbons are added. There can be links
from single point to point, point to interval, or interval to interval. Section \@ref(links)
gives detailed usage of links. (See Figure \@ref(fig:circlize-glance-track-links))
```{r circlize_glance_5, eval=FALSE}
circos.link("a", 0, "b", 0, h = 0.4)
circos.link("c", c(-0.5, 0.5), "d", c(-0.5,0.5), col = "red",
border = "blue", h = 0.2)
circos.link("e", 0, "g", c(-1,1), col = "green", border = "black", lwd = 2, lty = 2)
```
```{r circlize-glance-track-links, echo = FALSE, fig.cap = "First example of circlize, add links."}
eval(parse(text = chunks[["circlize_glance_0"]]))
eval(parse(text = chunks[["circlize_glance_1"]]))
eval(parse(text = chunks[["circlize_glance_2"]]))
eval(parse(text = chunks[["circlize_glance_3"]]))
eval(parse(text = chunks[["circlize_glance_3_update"]]))
eval(parse(text = chunks[["circlize_glance_4"]]))
eval(parse(text = chunks[["circlize_glance_5"]]))
circos.clear()
```
Finally, we need to reset the graphic parameters and internal variables so
that they don't not mess up your next plot.
```{r, eval = FALSE}
circos.clear()
```