-
Notifications
You must be signed in to change notification settings - Fork 20
/
Copy path07-MAW-PVI.Rmd
372 lines (218 loc) · 9.8 KB
/
07-MAW-PVI.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
---
title: "OPEN & REPRODUCIBLE MICROBIOME DATA ANALYSIS SPRING SCHOOL 2018"
author: "Sudarshan"
date: "`r Sys.Date()`"
output: bookdown::gitbook
site: bookdown::bookdown_site
---
# Inference of Microbial Ecological Networks
More information on [SPIEC-EASI](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004226).
The input for SPIEC-EASI is a counts table. The normalization and tranformation is done by the function.
This step is heavy on computational memory and slow. Noise filtered OTU-OTU level covariance would be ideal.
**Load packages and data**
```{r, tidy=TRUE, tidy.opts=list(width.cutoff=60), ignore = TRUE, eval=FALSE}
library(devtools)
install_github("zdk123/SpiecEasi")
#Other packages you need to install are
#install.packages("igraph")
install.packages("intergraph")
install.packages("GGally")
devtools::install_github("briatte/ggnet")
install.packages("network")
install.packages("ggnetwork")
```
```{r, warning=FALSE, message=FALSE}
library(microbiome) # data analysis and visualisation
library(phyloseq) # also the basis of data object. Data analysis and visualisation
library(RColorBrewer) # nice color options
library(ggpubr) # publication quality figures, based on ggplot2
library(dplyr) # data handling
library(SpiecEasi) # Network analysis for sparse compositional data
library(network)
library(intergraph)
#devtools::install_github("briatte/ggnet")
library(ggnet)
library(igraph)
```
**Read data**
```{r}
ps1 <- readRDS("./phyobjects/ps.ng.tax.rds")
```
**Select only stool samples**
We will subset our data to include only stool samples.
```{r}
ps1.stool <- subset_samples(ps1, bodysite == "Stool")
```
**For testing reduce the number of ASVs**
```{r}
ps1.stool.otu <- prune_taxa(taxa_sums(ps1.stool) > 100, ps1.stool)
# Add taxonomic classification to OTU ID
ps1.stool.otu.f <- microbiomeutilities::format_to_besthit(ps1.stool.otu)
head(tax_table(ps1.stool.otu))
```
Check the difference in two phyloseq objects.
```{r, eval=FALSE}
head(tax_table(ps1.stool.otu.f))
```
## Prepare data for SpiecEasi
The calculation of SpiecEasi are time consuming. For this tutorial, we will have the necessary input files for SpiecEasi.
* OTU table
* Taxonomy table
We save it as *.rds* object.
```{r}
otu.c <- t(otu_table(ps1.stool.otu.f)@.Data) #extract the otu table from phyloseq object
tax.c <- as.data.frame(tax_table(ps1.stool.otu.f)@.Data)#extract the taxonomy information
head(tax.c)
# use this only for first attempt to run it on server to save time
#saveRDS(otu.c, "input_data/stool.otu.c.rds")
#saveRDS(tax.c, "input_data/stool.tax.c.rds")
```
## SPIEC-EASI network reconstruction
More information on [SPIEC-EASI](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004226).
The input for SPIEC-EASI is a counts table. The normalization and tranformation is done by the function. This is very handy tool.
This step is heavy on computational memory and very slow. For this workshop we have already have the output and will skip this chuck.
```{r, eval=FALSE}
# In practice, use more repetitions
set.seed(1244)
net.c <- spiec.easi(otu.c, method='mb', icov.select.params=list(rep.num=50)) # reps have to increases for real data
# saveRDS(net.c, "input_data/net.c.rds")
#please use more numebr of rep.num (99 or 999) the paraemters
## Create graph object and get edge values
```
**We have save the output of net.c to save time**
The output of `spiec.easi` is stored in *./input_data/* as *stool.net.c.rds*. Read this file in R and follow the steps below.
```{r}
# the PC has low processing power, you can read the otuput created by us present in the input_data folder.
source("scripts/symBeta.R") # load custom function to get weights.
net.c <- readRDS("input_data/stool.net.rds")
class(net.c)
n.c <- symBeta(getOptBeta(net.c))
```
**Add names to IDs**
We also add abundance values to vertex (nodes).
```{r}
colnames(n.c) <- rownames(n.c) <- colnames(otu.c)
vsize <- log2(apply(otu.c, 2, mean)) # add log abundance as properties of vertex/nodes.
```
### Prepare data for plotting
```{r}
stool.ig <- graph.adjacency(n.c, mode='undirected', add.rownames = TRUE, weighted = TRUE)
stool.ig # we can see all the attributes and weights
#plot(stool.ig)
```
set the layout option
```{r, eval=FALSE}
# check what is it?
?layout_with_fr
```
```{r}
coords.fdr = layout_with_fr(stool.ig)
```
### igraph network
```{r}
E(stool.ig)[weight > 0]$color<-"steelblue" #now color the edges based on their values positive is steelblue
E(stool.ig)[weight < 0]$color<-"orange" #now color the edges based on their values
plot(stool.ig, layout=coords.fdr, vertex.size = 2, vertex.label.cex = 0.5)
```
The visualisation can be enhanced using [ggnet](https://briatte.github.io/ggnet/) R package.
```{r}
stool.net <- asNetwork(stool.ig)
network::set.edge.attribute(stool.net, "color", ifelse(stool.net %e% "weight" > 0, "steelblue", "orange"))
```
Start adding taxonomic information.
```{r}
colnames(tax_table(ps1.stool.otu.f))
phyla <- map_levels(colnames(otu.c), from = "best_hit", to = "Phylum", tax_table(ps1.stool.otu.f))
stool.net %v% "Phylum" <- phyla
stool.net %v% "nodesize" <- vsize
```
### Network plot
```{r, warning=FALSE, message=FALSE}
mycolors <- scale_color_manual(values = c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c","#fb9a99","#e31a1c","#fdbf6f","#ff7f00","#cab2d6","#6a3d9a","#ffff99","#b15928"))
p <- ggnet2(stool.net, node.color = "Phylum",
label = TRUE, node.size = "nodesize",
label.size = 2, edge.color = "color") + guides(color=guide_legend(title="Phylum"), size = FALSE) + mycolors
p
```
This is difficult to interpret. One way is to remove nodes that are connected to few other nodes. We can use degree as a network statisitic.
```{r}
stl.mb <- degree.distribution(stool.ig)
plot(0:(length(stl.mb)-1), stl.mb, ylim=c(0,.35), type='b',
ylab="Frequency", xlab="Degree", main="Degree Distributions")
# we will look at only taxa connect more than 10 others
p <- ggnet2(stool.net, node.color = "Phylum",
label = TRUE,
label.size = 3, edge.color = "color",
size = "degree", size.min = 10) + guides(color=guide_legend(title="Phylum"), size = FALSE) + mycolors
p
```
## Network properties
Check for the number of positive and negative edges.
```{r}
betaMat=as.matrix(symBeta(getOptBeta(net.c)))
# We divide by two since an edge is represented by two entries in the matrix.
positive=length(betaMat[betaMat>0])/2
negative=length(betaMat[betaMat<0])/2
total=length(betaMat[betaMat!=0])/2
```
### Modularity in networks
```{r}
net.c
mod.net <- net.c$refit
colnames(mod.net) <- rownames(mod.net) <- colnames(otu.c)#you can remove this
vsize <- log2(apply(otu.c, 2, mean))# value we may or may not use as vertex.attribute
stool.ig.mod <- graph.adjacency(mod.net, mode='undirected', add.rownames = TRUE)
plot(stool.ig.mod) # we can see all the attributes and weights
stool.net.mod <- asNetwork(stool.ig.mod)
```
Set vertex attributes. We can color by phyla and set the size of nodes based on log2 abundance.
```{r}
phyla <- map_levels(colnames(otu.c), from = "best_hit", to = "Phylum", tax_table(ps1.stool.otu.f))
stool.net.mod %v% "Phylum" <- phyla
stool.net.mod %v% "nodesize" <- vsize
```
### Network plot
```{r, warning=FALSE, message=FALSE}
mycolors <- scale_color_manual(values = c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c","#fb9a99","#e31a1c","#fdbf6f","#ff7f00","#cab2d6","#6a3d9a","#ffff99","#b15928"))
# check the colorpicker in the addins option in RStudio to interactively select color options.
p <- ggnet2(stool.net.mod, node.color = "Phylum",
label = TRUE, node.size = 2,
label.size = 2) + guides(color=guide_legend(title="Phylum"), size = FALSE) + mycolors
p
```
Identify modularity in networks.
```{r}
modules =cluster_fast_greedy(stool.ig.mod)
print(modules)
modularity(modules)
V(stool.ig.mod)$color=modules$membership
plot(stool.ig.mod, col = modules, vertex.size = 4, vertex.label = NA)
stool.net.mod %v% "membership" <- modules$membership
p <- ggnet2(stool.net.mod, node.color = "membership",
label = TRUE, node.size = "nodesize",
label.size = 2) + guides(color=guide_legend(title="membership"), size = FALSE) + mycolors
p
```
Check which OTUs are part of different modules.
```{r}
modulesOneIndices=which(modules$membership==1)
modulesOneOtus=modules$names[modulesOneIndices]
modulesTwoIndices=which(modules$membership==2)
modulesTwoOtus=modules$names[modulesTwoIndices]
modulesThreeIndices=which(modules$membership==3)
modulesThreeOtus=modules$names[modulesThreeIndices]
modulesFourIndices=which(modules$membership==4)
modulesFourOtus=modules$names[modulesFourIndices]
modulesFiveIndices=which(modules$membership==5)
modulesFiveOtus=modules$names[modulesFiveIndices]
modulesSixIndices=which(modules$membership==6)
modulesSixOtus=modules$names[modulesSixIndices]
print(modulesOneOtus)
```
### Good reads for ecological networks
[Using network analysis to explore co-occurrence patterns in soil microbial communities](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3260507/)
[Microbial Co-occurrence Relationships in the Human Microbiome](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002606)
[Correlation detection strategies in microbial data sets vary widely in sensitivity and precision](http://www.nature.com/ismej/journal/v10/n7/full/ismej2015235a.html)
```{r}
sessionInfo()
```