forked from hadley/ggplot2-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscales-colour.Rmd
728 lines (533 loc) · 37.1 KB
/
scales-colour.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
```{r setup, include = FALSE}
source("common.R")
columns(1, 2 / 3)
```
# Colour scales and legends {#scale-colour}
After position, the most commonly used aesthetics are those based on colour, and there are many ways to map values to colours in ggplot2. Because colour is complex, the chapter starts with a discussion of colour theory (Section \@ref(colour-theory)) with special reference to colour blindness (Section \@ref(colour-blindness)). Mirroring the structure in the previous chapters, the next three sections are dedicated to continuous colour scales (Section \@ref(colour-continuous)), discrete colour scales (Section \@ref(colour-discrete)), and binned colour scales (Section \@ref(binned-colour)). The chapter concludes by discussing date/time colour scales (Section \@ref(date-colour-scales)), transparency scales (Section \@ref(scales-alpha)), and the mechanics of legend positioning (Section \@ref(legend-layout)).
\index{Scales!colour}
## A little colour theory {#colour-theory}
<!-- ...as a treat -->
Before we look at the details, it's useful to learn a little bit of colour theory. Colour theory is complex because the underlying biology of the eye and brain is complex, and this introduction will only touch on some of the more important issues. An excellent and more detailed exposition is available online at <http://tinyurl.com/clrdtls>. \index{Colour}
At the physical level, colour is produced by a mixture of wavelengths of light. To characterise a colour completely, we need to know the complete mixture of wavelengths. Fortunately for us the human eye only has three different colour receptors, and so we can summarise the perception of any colour with just three numbers. You may be familiar with the RGB encoding of colour space, which defines a colour by the intensities of red, green and blue light needed to produce it. One problem with this space is that it is not perceptually uniform: the two colours that are one unit apart may look similar or very different depending on where they are in the colour space. This makes it difficult to create a mapping from a continuous variable to a set of colours. There have been many attempts to come up with colours spaces that are more perceptually uniform. We'll use a modern attempt called the HCL colour space, which has three components of **h**ue, **c**hroma and **l**uminance: \index{Colour!spaces}
* **Hue** ranges from 0 to 360 (an angle) and gives the "colour" of the colour (blue, red, orange, etc).
* **Chroma** is the "purity" of a colour, ranging from 0 (grey) to a maximum that varies with luminance.
* **Luminance** is the lightness of the colour, ranging from 0 (black) to 1 (white).
The three dimensions have different properties. Hues are arranged around a colour wheel and are not perceived as ordered: e.g. green does not seem "larger" than red, and blue does not seem to be "in between" green or red. In contrast, both chroma and luminance are perceived as ordered: pink is perceived as lying between red and white, and grey is seen to fall between black and white.
The combination of these three components does not produce a simple geometric shape. Figure \@ref(fig:hcl) attempts to show the 3d shape of the space. Each slice is a constant luminance (brightness) with hue mapped to angle and chroma to radius. You can see the centre of each slice is grey and the colours get more intense as they get closer to the edge.
`r columns(1, 1, 1)`
```{r hcl, echo = FALSE, out.width = "100%", fig.cap="The shape of the HCL colour space. Hue is mapped to angle, chroma to radius and each slice shows a different luminance. The HCL space is a pretty odd shape, but you can see that colours near the centre of each slice are grey, and as you move towards the edges they become more intense. Slices for luminance 0 and 100 are omitted because they would, respectively, be a single black point and a single white point."}
knitr::include_graphics("diagrams/hcl-space.png", dpi = 300)
```
### Colour blindness {#colour-blindness}
An additional complication is that a sizeable minority of people do not possess the usual complement of colour receptors and so can distinguish fewer colours than others. \index{Colour!blindness} Because of this, it is important to consider how a colour palette will look to people with common forms of colour blindness. A simple heuristic is to avoid red-green contrasts, and to check your plots with systems that simulate colour blindness. In addition to the many online tools that can assist with this (e.g., https://www.vischeck.com/), there are several R packages provide tools you may find helpful. The dichromat package [@dichromat] provides tools for simulating colour blindness, and a set of colour schemes known to work well for colour-blind people. Another useful tool is the colorBlindness package [@colorBlindness] which provides a `displayAllColors()` function that helps you approximate the appearance of a given set of colours under different forms of colour blindness. As an illustration, it quickly reveals that colours provided by the `rainbow()` palette are not appropriate if you are trying to create plots that are readable by colour blind people, nor do they reproduce well in greyscale:
`r columns(1, 2/3, 1)`
```{r, warning=FALSE}
colorBlindness::displayAllColors(rainbow(6))
```
By way of contrast, colours provided by `viridis::viridis()` are discriminable under the most common forms of colour blindness, and reproduce well in greyscale:
```{r, warning=FALSE}
colorBlindness::displayAllColors(viridis::viridis(6))
```
In addition to the viridis package there are other R packages that provide palettes that are explicitly colour blind safe, and you'll see these in use throughout the chapter. Finally, you can also help people with colour blindness in the same way that you can help people with black-and-white printers: by providing redundant mappings to other aesthetics like size, line type or shape.
## Continuous colour scales {#colour-continuous}
Colour gradients are often used to show the height of a 2d surface. The plots in this section use the surface of a 2d density estimate of the `faithful` dataset [@azzalini:1990], which records the waiting time between eruptions and during each eruption for the Old Faithful geyser in Yellowstone Park. I hide the legends and set `expand` to 0, to focus on the appearance of the data. Remember: although I use the `erupt` plot to illustrate concepts using with a fill aesthetic, the same ideas apply to colour scales. Any time I refer to `scale_fill_*()` in this section there is a corresponding `scale_colour_*()` for the colour aesthetic (or `scale_color_*()` if you prefer US spelling).
\index{Colour!gradients} \index{Scales!colour}
`r columns(3, 1)`
```{r}
erupt <- ggplot(faithfuld, aes(waiting, eruptions, fill = density)) +
geom_raster() +
scale_x_continuous(NULL, expand = c(0, 0)) +
scale_y_continuous(NULL, expand = c(0, 0)) +
theme(legend.position = "none")
```
### Particular palettes {#particular-palettes}
There are multiple ways to specify continuous colour scales. Later I'll talk about general purpose tools that you can use to construct your own palette, but this is often unnecessary as there are many "hand picked" palettes available. For example, ggplot2 supplies two scale functions that bundle pre-specified palettes, `scale_fill_viridis_c()` and `scale_fill_distiller()`. The viridis scales [@viridis] are designed to be perceptually uniform in both colour and when reduced to black and white, and to be perceptible to people with various forms of colour blindness.
```{r}
erupt
erupt + scale_fill_viridis_c()
erupt + scale_fill_viridis_c(option = "magma")
```
For most use cases, the viridis scales will work better than other continuous scales built into ggplot2, but there are other options that are useful in some situations. A second group of continuous colour scales built in to ggplot2 are derived from the ColorBrewer scales: `scale_fill_brewer()` provides these colours as discrete palettes, while `scale_fill_distiller()` and `scale_fill_fermenter()` are the continuous and binned analogs. I discuss these scales in Section \@ref(colour-discrete), but for illustrative purposes include some examples here:
```{r}
erupt + scale_fill_distiller()
erupt + scale_fill_distiller(palette = "RdPu")
erupt + scale_fill_distiller(palette = "YlOrBr")
```
There are many other packages that provide useful colour palettes. For example, scico [@scico] provides more palettes that are perceptually uniform and suitable for scientific visualisation:
```{r}
erupt + scico::scale_fill_scico(palette = "bilbao") # the default
erupt + scico::scale_fill_scico(palette = "vik")
erupt + scico::scale_fill_scico(palette = "lajolla")
```
However, as there are a great many palette packages in R, a particularly useful package is paletteer [@paletteer], which aims to provide a common interface:
```{r}
erupt + paletteer::scale_fill_paletteer_c("viridis::plasma")
erupt + paletteer::scale_fill_paletteer_c("scico::tokyo")
```
### Robust recipes {#robust-recipes}
The default scale for continuous fill scales is `scale_fill_continuous()` which in turn defaults to `scale_fill_gradient()`. As a consequence, these three commands produce the same plot using a gradient scale:
```{r}
erupt
erupt + scale_fill_continuous()
erupt + scale_fill_gradient()
```
Gradient scales provide a robust method for creating any colour scheme you like. All you need to do is specify two or more reference colours, and ggplot2 will interpolate linearly between them. There are three functions that you can use for this purpose:
\indexf{scale\_colour\_gradient} \indexf{scale\_fill\_gradient} \indexf{scale\_colour\_gradient2} \indexf{scale\_fill\_gradient2}
- `scale_fill_gradient()` produces a two-colour gradient
- `scale_fill_gradient2()` produces a three-colour gradient with specified midpoint
- `scale_fill_gradientn()` produces an n-colour gradient
The use of gradient scales is illustrated below. The first plot uses a scale that linearly interpolates from grey (hex code: `"#bebebe"`) at the `low` end of the scale limits to brown (`"#a52a2a"`) at the `high` end. The second plot has the same endpoints but uses `scale_fill_gradient2()` to interpolate first from grey to white (`#ffffff`) and then from white to brown. Note that the `mid` argument specifies the colour to be shown at the intermediate point, and `midpoint` is the value in the data at which this colour is used (the default is `midpoint = 0`). The third method is to use `scale_fill_gradientn()` which takes a vector of reference `colours` as its argument, and constructs a scale that linearly interpolates between the specified values. By default, the `colours` are presumed to be equally spaced along the scale, but if you prefer you can specify a vector of `values` that correspond to each of the reference colours.
```{r}
erupt + scale_fill_gradient(low = "grey", high = "brown")
erupt +
scale_fill_gradient2(
low = "grey",
mid = "white",
high = "brown",
midpoint = .02
)
erupt + scale_fill_gradientn(colours = terrain.colors(7))
```
Creating good colour palettes requires some care. Generally, for a two-point gradient scale you want to convey the perceptual impression that the values are sequentially ordered, so you want to keep hue constant, and vary chroma and luminance. The Munsell colour system is useful for this as it provides an easy way of specifying colours based on their hue, chroma and luminance. The munsell package [@munsell] provides easy access to the Munsell colours, which can then be used to specify a gradient scale:
`r columns(2, 1)`
```{r}
munsell::hue_slice("5P") + # generate a ggplot with hue_slice()
annotate( # add arrows for annotation
geom = "segment",
x = c(7, 7),
y = c(1, 10),
xend = c(7, 7),
yend = c(2, 9),
arrow = arrow(length = unit(2, "mm"))
)
# construct scale
erupt + scale_fill_gradient(
low = munsell::mnsl("5P 2/12"),
high = munsell::mnsl("5P 7/12")
)
```
The labels on the left plot are a little difficult to read at this scale, so I have used `annotate()` to add arrows highlighting the column used to construct the scale on the right. For more information on the munsell package see <https://github.com/cwickham/munsell/>.
Three-point gradient scales have slightly different design criteria. Typically the goal in such a scale is to convey the perceptual impression that there is a natural midpoint (often a zero value) from which the other values diverge. The left plot below shows how to create a divergent "yellow/blue" scale, though it is a little artificial in this example.
Finally, if you have colours that are meaningful for your data (e.g., black body colours or standard terrain colours), or you'd like to use a palette produced by another package, you may wish to use an n-point gradient. As an illustration, the middle and right plots below use the **colorspace** package [@zeileis:2008]. For more information on the colorspace package see <https://colorspace.r-forge.r-project.org/>.
\index{Colour!palettes} \indexf{scale\_colour\_gradientn} \indexf{scale\_fill\_gradientn}
`r columns(3, 1)`
```{r}
# munsell example
erupt + scale_fill_gradient2(
low = munsell::mnsl("5B 7/8"),
high = munsell::mnsl("5Y 7/8"),
mid = munsell::mnsl("N 7/0"),
midpoint = .02
)
# colorspace examples
erupt + scale_fill_gradientn(colours = colorspace::heat_hcl(7))
erupt + scale_fill_gradientn(colours = colorspace::diverge_hcl(7))
```
### Missing values
All continuous colour scales have an `na.value` parameter that controls what colour is used for missing values (including values outside the range of the scale limits). By default it is set to grey, which will stand out when you use a colourful scale. If you use a black and white scale, you might want to set it to something else to make it more obvious. You can set `na.value = NA` to make missing values invisible, or choose a specific colour if you prefer: \indexc{na.value} \index{Missing values!changing colour}
```{r}
df <- data.frame(x = 1, y = 1:5, z = c(1, 3, 2, NA, 5))
base <- ggplot(df, aes(x, y)) +
geom_tile(aes(fill = z), size = 5) +
labs(x = NULL, y = NULL) +
scale_x_continuous(labels = NULL)
base
base + scale_fill_gradient(na.value = NA)
base + scale_fill_gradient(na.value = "yellow")
```
### Limits, breaks, and labels {#colour-continuous-limits}
In the previous chapter I discussed how the appearance of axes can be controlled by setting the `limits` (Section \@ref(position-continuous-limits)), `breaks` (Section \@ref(position-continuous-breaks)) and `labels` (Section \@ref(position-continuous-labels)) argument to the scale function. The behaviour of colour scales can be controlled in an analogous fashion:
```{r, echo=FALSE}
toy <- data.frame(
const = 1,
up = 1:4,
txt = letters[1:4],
big = (1:4)*1000,
log = c(2, 5, 10, 2000)
)
```
`r columns(2, 2/3)`
```{r}
base <- ggplot(toy, aes(up, up, fill = big)) +
geom_tile() +
labs(x = NULL, y = NULL)
base
base + scale_fill_continuous(limits = c(0, 10000))
```
`r columns(2, 2/3)`
```{r}
base + scale_fill_continuous(breaks = c(1000, 2000, 4000))
base + scale_fill_continuous(labels = scales::label_dollar())
```
(The toy data set used here is the same one defined in Section \@ref(position-continuous-breaks)). You can suppress the breaks entirely by setting them to `NULL`, which removes the keys and labels.
### Legends {#guide-colourbar}
\index{Legend!colour bar} \index{Colour bar}
Every scale is associated with a guide that displays the relationship between the aesthetic and the data. For position scales, the axes serve this function. For colour scales this role is played by the legend, which can be customised with the help of a guide function. For continuous colour scales, the default legend takes the form of a "colour bar" displaying a continuous gradient of colours:
`r columns(3)`
```{r}
base <- ggplot(mpg, aes(cyl, displ, colour = hwy)) +
geom_point(size = 2)
base
```
The appearance of the legend can be controlled using the `guide_colourbar()` function. There are many arguments to this function, allowing you to exercise precise control over the legend. The most important arguments are illustrated below:
* `reverse` flips the colour bar to put the lowest values at the top.
* `barwidth` and `barheight` allow you to specify the size of the bar.
These are grid units, e.g. `unit(1, "cm")`.
* `direction` specifies the direction of the guide, `"horizontal"` or
`"vertical"`.
In Section \ref@(guide-axis) I introduced the `guides()` function that is used to set customised legends and axes. When applied to colour scales, it allows you to create custom legends like these:
```{r}
base + guides(colour = guide_colourbar(reverse = TRUE))
base + guides(colour = guide_colourbar(barheight = unit(2, "cm")))
base + guides(colour = guide_colourbar(direction = "horizontal"))
```
An alternative way to accomplish the same goal is to specify the `guide` argument to the scale function. These two plot specifications are identical:
```{r}
base + guides(colour = guide_colourbar(reverse = TRUE))
base + scale_colour_continuous(guide = guide_colourbar(reverse = TRUE))
```
You can learn more about guide functions in Section \@ref(scale-guide).
## Discrete colour scales {#colour-discrete}
Discrete colour and fill scales occur in many situations. A typical example is a barchart that encodes both position and fill to the same variable. Many concepts from Section \@ref(colour-continuous) apply to discrete scales, which I will illustrate using this barchart as the running example: \index{Colour!discrete scales}
`r columns(3, 1)`
```{r}
df <- data.frame(x = c("a", "b", "c", "d"), y = c(3, 4, 1, 2))
bars <- ggplot(df, aes(x, y, fill = x)) +
geom_bar(stat = "identity") +
labs(x = NULL, y = NULL) +
theme(legend.position = "none")
```
The default scale for discrete colours is `scale_fill_discrete()` which in turn defaults to `scale_fill_hue()` so these are identical plots:
```{r}
bars
bars + scale_fill_discrete()
bars + scale_fill_hue()
```
This default scale has some limitations (discussed shortly) so I'll begin by discussing tools for producing nicer discrete palettes.
### Brewer scales
`scale_colour_brewer()` is a discrete colour scale that---along with the continuous analog `scale_colour_distiller()` and binned analog `scale_colour_fermenter()`---uses handpicked "ColorBrewer" colours taken from <http://colorbrewer2.org/>. These colours have been designed to work well in a wide variety of situations, although the focus is on maps and so the colours tend to work better when displayed in large areas. There are many different options:
`r columns(1, 2)`
```{r}
RColorBrewer::display.brewer.all()
```
The first group of palettes are sequential scales that are useful when your discrete scale is ordered (e.g., rank data), and are available for continuous data using `scale_colour_distiller()`. For unordered categorical data, the palettes of most interest are those in the second group. 'Set1' and 'Dark2' are particularly good for points, and 'Set2', 'Pastel1', 'Pastel2' and 'Accent' work well for areas.
\index{Colour!Brewer} \indexf{scale\_colour\_brewer}
`r columns(3, 1)`
```{r}
bars + scale_fill_brewer(palette = "Set1")
bars + scale_fill_brewer(palette = "Set2")
bars + scale_fill_brewer(palette = "Accent")
```
Note that no palette is uniformly good for all purposes. Scatter plots typically use small plot markers, and bright colours tend to work better than subtle ones:
`r columns(3, 1)`
```{r brewer-pal}
# scatter plot
df <- data.frame(
x = 1:3 + runif(30),
y = runif(30),
z = c("a", "b", "c")
)
point <- ggplot(df, aes(x, y)) +
geom_point(aes(colour = z)) +
theme(legend.position = "none") +
labs(x = NULL, y = NULL)
# three palettes
point + scale_colour_brewer(palette = "Set1")
point + scale_colour_brewer(palette = "Set2")
point + scale_colour_brewer(palette = "Pastel1")
```
Bar plots usually contain large patches of colour, and bright colours can be overwhelming. Subtle colours tend to work better in this situation:
```{r}
# bar plot
df <- data.frame(x = 1:3, y = 3:1, z = c("a", "b", "c"))
area <- ggplot(df, aes(x, y)) +
geom_bar(aes(fill = z), stat = "identity") +
theme(legend.position = "none") +
labs(x = NULL, y = NULL)
# three palettes
area + scale_fill_brewer(palette = "Set1")
area + scale_fill_brewer(palette = "Set2")
area + scale_fill_brewer(palette = "Pastel1")
```
### Hue and grey scales
The default colour scheme picks evenly spaced hues around the HCL colour wheel. This works well for up to about eight colours, but after that it becomes hard to tell the different colours apart. You can control the default chroma and luminance, and the range of hues, with the `h`, `c` and `l` arguments:
\indexf{scale\_colour\_hue}
```{r}
bars
bars + scale_fill_hue(c = 40)
bars + scale_fill_hue(h = c(180, 300))
```
There are some problems with this default scheme. One is that unlike many of the other palettes discussed in this chapter, they are not colour blind safe (discussed in Section \@ref(colour-blindness)). A second is that because the colours all have the same luminance and chroma, they all appear as an identical shade of grey when printed in black and white. If you are intending a discrete colour scale to be printed in black and white, it is better to explicitly use `scale_fill_grey()` which maps discrete data to grays, from light to dark:
\indexf{scale\_colour\_grey} \index{Colour!greys}
```{r}
bars + scale_fill_grey()
bars + scale_fill_grey(start = 0.5, end = 1)
bars + scale_fill_grey(start = 0, end = 0.5)
```
### Paletteer scales
Another alternative is provided by the paletteer package, discussed earlier in connection to continuous colour scales in Section \@ref(particular-palettes). By providing a unified interface that spans a large number of packages, paletteer makes it possible to choose among a very large number of palettes in a consistent way:
```{r}
bars + paletteer::scale_fill_paletteer_d("rtist::vangogh")
bars + paletteer::scale_fill_paletteer_d("colorBlindness::paletteMartin")
bars + paletteer::scale_fill_paletteer_d("wesanderson::FantasticFox1")
```
### Manual scales {#manual-colour}
If none of the preexisting palettes is suitable, or if you have your own preferred colours, you can use `scale_fill_manual()` to set the colours manually. This can be useful if you wish to choose colours that highlight a secondary grouping structure or draw attention to different comparisons:
\indexf{scale\_colour\_manual}
```{r}
bars +
scale_fill_manual(
values = c("sienna1", "sienna4", "hotpink1", "hotpink4")
)
bars +
scale_fill_manual(
values = c("tomato1", "tomato2", "tomato3", "tomato4")
)
bars +
scale_fill_manual(
values = c("grey", "black", "grey", "grey")
)
```
You can also use a named vector to specify colors to be assigned to each level which allows you to specify the levels in any order you like:
```{r}
bars +
scale_fill_manual(
values = c(
"d" = "grey",
"c" = "grey",
"b" = "black",
"a" = "grey"
)
)
```
For more information about manual scales see Section \@ref(scale-manual).
### Limits, breaks, and labels {#colour-discrete-limits}
Scale limits for discrete colour scales can be set using the `limits` argument to the scale argument, or by using the `lims()` helper function. This can be important when the same variable is represented in different plots, and you want to ensure that the colours are consistent across plots. To demonstrate this I'll extend the example from Section \@ref(position-continuous-limits). Colour represents the fuel type, which can be **r**egular, **e**thanol, **d**iesel, **p**remium or **c**ompressed natural gas.
`r columns(2, 1)`
```{r}
mpg_99 <- mpg %>% filter(year == 1999)
mpg_08 <- mpg %>% filter(year == 2008)
base_99 <- ggplot(mpg_99, aes(displ, hwy, colour = fl)) + geom_point()
base_08 <- ggplot(mpg_08, aes(displ, hwy, colour = fl)) + geom_point()
base_99
base_08
```
Each plot makes sense on its own, but visual comparison between the two is difficult. The axis limits are different, and because only regular, premium and diesel fuels are represented in the 1998 data the colours are mapped inconsistently. To ensure a consistent mapping for the colour aesthetic, we can use `lims()` to manually set the limits. As discussed in Section \@ref(position-continuous-limits) it takes name-value pairs as input, where the name specifies the aesthetic and the value specifies the limits:
```{r}
base_99 + lims(colour = c("c", "d", "e", "p", "r"))
base_08 + lims(colour = c("c", "d", "e", "p", "r"))
```
The nice thing about `lims()` is that we can set the limits for multiple aesthetics at once. To ensure that x, y, and colour all use consistent limits we can do this:
```{r}
base_99 +
lims(
x = c(1, 7),
y = c(10, 45),
colour = c("c", "d", "e", "p", "r")
)
base_08 +
lims(
x = c(1, 7),
y = c(10, 45),
colour = c("c", "d", "e", "p", "r")
)
```
There are two potential limitations to these plots. First, while setting the scale limits does ensure that colours are mapped identically in both plots, it also means that the plot for the 1999 data displays labels for all five fuel types, despite the fact that ethanol and compressed natural gas fuels were not in use at that time. We can address this by manually setting the scale breaks, ensuring that only those fuel types that appear in the data are shown in the legend. The second limitation is that the labels are not particularly helpful, which we can address by specifying them manually. When setting multiple properties of a single scale, it can be more useful to customise using the arguments to the scale function rather than using the `lims()` helper function:
`r columns(2, 1)`
```{r}
base_99 +
scale_color_discrete(
limits = c("c", "d", "e", "p", "r"),
breaks = c("d", "p", "r"),
labels = c("diesel", "premium", "regular")
)
```
However, there is nothing stopping you from using `lims()` to control the position aesthetic limits, while using `scale_colour_discrete()` to exercise more fine-grained control over the colour aesthetic:
`r columns(2, 1)`
```{r}
base_99 +
lims(x = c(1, 7), y = c(10, 45)) +
scale_color_discrete(
limits = c("c", "d", "e", "p", "r"),
breaks = c("d", "p", "r"),
labels = c("diesel", "premium", "regular")
)
base_08 +
lims(x = c(1, 7), y = c(10, 45)) +
scale_color_discrete(
limits = c("c", "d", "e", "p", "r"),
labels = c("compressed", "diesel", "ethanol", "premium", "regular")
)
```
### Legends {#guide-legend}
\index{Legend!guide}
Legends for discrete colour scales can be customised using the `guide` argument to the scale function or with the `guides()` helper function, described in Section \@ref(guide-colourbar). For a discrete scale the default legend displays individual keys in a table, which can be customised using `guide_legend()`. The most useful options are:
* `nrow` or `ncol` which specify the dimensions of the table. `byrow`
controls how the table is filled: `FALSE` fills it by column (the default),
`TRUE` fills it by row.
`r columns(3)`
```{r legend-rows-cols}
base <- ggplot(mpg, aes(drv, fill = factor(cyl))) + geom_bar()
base
base + guides(fill = guide_legend(ncol = 2))
base + guides(fill = guide_legend(ncol = 2, byrow = TRUE))
```
* `reverse` reverses the order of the keys:
```{r}
base
base + guides(fill = guide_legend(reverse = TRUE))
```
* `override.aes` is useful when you want the elements in the legend
display differently to the geoms in the plot. This is often required
when you've used transparency or size to deal with moderate overplotting
and also used colour in the plot. \indexf{override.aes}
`r columns(2, 2/3)`
```{r}
base <- ggplot(mpg, aes(displ, hwy, colour = drv)) +
geom_point(size = 4, alpha = .2, stroke = 0)
base + guides(colour = guide_legend())
base + guides(colour = guide_legend(override.aes = list(alpha = 1)))
```
* `keywidth` and `keyheight` (along with `default.unit`) allow you to specify
the size of the keys. These are grid units, e.g. `unit(1, "cm")`.
You can learn more about guides in Section \@ref(scale-guide).
<!-- ### Exercises -->
<!-- 1. Compare and contrast the four continuous colour scales with the four discrete scales. -->
<!-- 1. Explore the distribution of the built-in `colors()` using the `luv_colours` dataset. -->
<!-- ### Exercises -->
<!-- 1. Recreate the following plot: -->
<!-- ```{r, echo = FALSE} -->
<!-- drv_labels <- c("4" = "4wd", "f" = "fwd", "r" = "rwd") -->
<!-- ggplot(mpg, aes(displ, hwy)) + -->
<!-- geom_point(aes(colour = drv)) + -->
<!-- scale_colour_discrete(labels = drv_labels) -->
<!-- ``` -->
## Binned colour scales {#binned-colour}
Colour scales also come in binned versions. The default scale is `scale_fill_binned()` which in turn defaults to `scale_fill_steps()`. As with the binned position scales discussed in Section \@ref(binned-position) these scales have an `n.breaks` argument that controls the number of discrete colour categories created by the scale. Counterintuitively---because the human visual system is very good at detecting edges---this can sometimes make a continuous colour gradient easier to perceive:
`r columns(3, 1)`
```{r}
erupt + scale_fill_binned()
erupt + scale_fill_steps()
erupt + scale_fill_steps(n.breaks = 8)
```
In other respects `scale_fill_steps()` is analogous to `scale_fill_gradient()`, and allows you to construct your own two-colour gradients. There is also a three-colour variant `scale_fill_steps2()` and n-colour scale variant `scale_fill_stepsn()` that behave similarly to their continuous counterparts:
```{r}
erupt + scale_fill_steps(low = "grey", high = "brown")
erupt +
scale_fill_steps2(
low = "grey",
mid = "white",
high = "brown",
midpoint = .02
)
erupt + scale_fill_stepsn(n.breaks = 12, colours = terrain.colors(12))
```
The viridis palettes can be used in the same way, by calling the palette generating functions directly when specifying the `colours` argument to `scale_fill_stepsn()`:
```{r}
erupt + scale_fill_stepsn(n.breaks = 9, colours = viridis::viridis(9))
erupt + scale_fill_stepsn(n.breaks = 9, colours = viridis::magma(9))
erupt + scale_fill_stepsn(n.breaks = 9, colours = viridis::inferno(9))
```
Alternatively, a brewer analog for binned scales also exists, and is called `scale_fill_fermenter()`:
```{r}
erupt + scale_fill_fermenter(n.breaks = 9)
erupt + scale_fill_fermenter(n.breaks = 9, palette = "Oranges")
erupt + scale_fill_fermenter(n.breaks = 9, palette = "PuOr")
```
Note that like the discrete `scale_fill_brewer()`---and unlike the continuous `scale_fill_distiller()`---the binned function `scale_fill_fermenter()` does not interpolate between the brewer colours, and if you set `n.breaks` larger than the number of colours in the palette a warning message will appear and some colours will not be displayed.
### Limits, breaks, and labels
In most respects setting limits, breaks, and labels for a binned scale follows the same logic that applies to continuous scales (Sections \@ref(position-continuous-breaks) and \@ref(colour-continuous-limits)). Like a continuous scale, the `limits` argument is typically a numeric vector of length two specifying the end points, `breaks` is a numeric vector specifying the break points, and `labels` is a character vector specifying the labels. All three arguments will accept functions as input (discussed in Section \@ref(numeric-position-scales)). The main difference between binned and continuous scales is that the `breaks` argument defines the edges of the bins rather than simply specifying locations of tick marks.
### Legends {#guide-coloursteps}
The default legend for binned scales uses colour steps rather than a colourbar, and can be customised using the `guide_coloursteps()` function. A colour step legend shows the area between breaks as a single constant colour, rather than displaying a colour gradient that varies smoothly along the bar. The arguments to `guide_coloursteps()` mostly mirror those for `guide_colourbar()` (see Section \@ref(guide-colourbar)), with additional arguments that are relevant to binned scales:
* `show.limits` indicates whether values should be shown at the ends of the stepped colour bar, analogous to the corresponding argument in `guide_bins()`
`r columns(2)`
```{r}
base <- ggplot(mpg, aes(cyl, displ, colour = hwy)) +
geom_point(size = 2) +
scale_color_binned()
base
base + guides(colour = guide_coloursteps(show.limits = TRUE))
```
* `ticks` is a logical variable indicating whether tick marks should be displayed adjacent to the legend labels (default is `NULL`, in which case the value is inherited from the scale)
* `even.steps` is a logical variable indicating whether bins should be evenly spaced (default is `TRUE`) or proportional in size to their frequency in the data
## Date-time colour scales {#date-colour-scales}
When a colour aesthetic is mapped to a date/time type, ggplot2 uses `scale_colour_date()` or `scale_colour_datetime()` to specify the scale. These are designed to handle date data, analogous to the date scales discussed in Section \@ref(date-scales). These scales have `date_breaks` and `date_labels` arguments that make it a little easier to work with these data, as the slightly contrived example below illustrates:
`r columns(2, 1)`
```{r}
base <- ggplot(economics, aes(psavert, uempmed, colour = date)) +
geom_point()
base
base +
scale_colour_date(
date_breaks = "142 months",
date_labels = "%b %Y"
)
```
## Alpha scales {#scales-alpha}
Alpha scales map the transparency of a shade to a value in the data. They are not often useful, but can be a convenient way to visually down-weight less important observations. `scale_alpha()` is an alias for `scale_alpha_continuous()` since that is the most common use of alpha, and it saves a bit of typing. An example of an alpha scale using the eruptions data is shown below:
`r columns(2, 2/3)`
```{r}
ggplot(faithfuld, aes(waiting, eruptions, alpha = density)) +
geom_raster(fill = "maroon") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))
```
## Legend position {#legend-layout}
A number of settings that affect the overall display of the legends are controlled through the theme system. You'll learn more about that in Section \@ref(themes), but for now, all you need to know is that you modify theme settings with the `theme()` function. \index{Themes!legend}
The position and justification of legends are controlled by the theme setting `legend.position`, which takes values "right", "left", "top", "bottom", or "none" (no legend). \index{Legend!layout}
`r columns(2, 2/3)`
```{r legend-position}
base <- ggplot(toy, aes(up, up)) +
geom_point(aes(colour = txt), size = 3) +
xlab(NULL) +
ylab(NULL)
base + theme(legend.position = "left")
base + theme(legend.position = "right") # the default
base + theme(legend.position = "bottom")
base + theme(legend.position = "none")
```
Switching between left/right and top/bottom modifies how the keys in each legend are laid out (horizontal or vertically), and how multiple legends are stacked (horizontal or vertically). If needed, you can adjust those options independently:
* `legend.direction`: layout of items in legends ("horizontal" or "vertical").
* `legend.box`: arrangement of multiple legends ("horizontal" or "vertical").
* `legend.box.just`: justification of each legend within the overall bounding
box, when there are multiple legends ("top", "bottom", "left", or "right").
Alternatively, if there's a lot of blank space in your plot you might want to place the legend inside the plot. You can do this by setting `legend.position` to a numeric vector of length two. The numbers represent a relative location in the panel area: `c(0, 1)` is the top-left corner and `c(1, 0)` is the bottom-right corner. You control which corner of the legend the `legend.position` refers to with `legend.justification`, which is specified in a similar way. Unfortunately positioning the legend exactly where you want it requires a lot of trial and error.
`r columns(3, 1.5)`
```{r legend-position-man}
base <- ggplot(toy, aes(up, up)) +
geom_point(aes(colour = txt), size = 3)
base +
theme(
legend.position = c(0, 1),
legend.justification = c(0, 1)
)
base +
theme(
legend.position = c(0.5, 0.5),
legend.justification = c(0.5, 0.5)
)
base +
theme(
legend.position = c(1, 0),
legend.justification = c(1, 0)
)
```
There's also a margin around the legends, which you can suppress with `legend.margin = unit(0, "mm")`.
<!-- ### Exercises -->
<!-- 1. How do you make legends appear to the left of the plot? -->
<!-- 1. What's gone wrong with this plot? How could you fix it? -->
<!-- `r columns(1, 2 / 3)` -->
<!-- ```{r} -->
<!-- ggplot(mpg, aes(displ, hwy)) + -->
<!-- geom_point(aes(colour = drv, shape = drv)) + -->
<!-- scale_colour_discrete("Drive train") -->
<!-- ``` -->
<!-- 1. Can you recreate the code for this plot? -->
<!-- `r columns(1, 2 / 3)` -->
<!-- ```{r, echo = FALSE} -->
<!-- ggplot(mpg, aes(displ, hwy, colour = class)) + -->
<!-- geom_point(show.legend = FALSE) + -->
<!-- geom_smooth(method = "lm", se = FALSE) + -->
<!-- theme(legend.position = "bottom") + -->
<!-- guides(colour = guide_legend(nrow = 1)) -->
<!-- ``` -->