forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
117 lines (92 loc) · 3.59 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = T, message = F)
library(tidyverse)
library(here)
```
## Loading and preprocessing the data
```{r loadData}
dat <- read.csv(file = unz(here('activity.zip'),
file = 'activity.csv')) %>%
mutate(date = as.Date(date))
```
## What is mean total number of steps taken per day?
```{r calcMean}
totalSteps <- dat %>%
na.omit() %>%
group_by(date) %>%
summarize(totalSteps = sum(steps))
ggplot(totalSteps, aes(totalSteps)) +
geom_histogram() +
labs(title = 'Histogram of total number of steps taken per day',
x = 'Total number of steps per day',
y = '')
mymean <- totalSteps %>%
summarize(meanSteps = mean(totalSteps),
medianSteps = median(totalSteps))
# as.character necessary to simplify formatting of inline code output. Otherwise we get scientific format and googling to format inline R knitr numerical output was not successful.
meanSteps <- as.character(round(mymean$meanSteps, 0))
medianSteps <- as.character(round(mymean$medianSteps, 0))
```
The mean steps taken per day is `r meanSteps`.
The median steps taken per day is `r medianSteps`.
## What is the average daily activity pattern?
```{r averageSteps}
avInt <- dat %>%
na.omit() %>%
group_by(interval) %>%
summarize(aveSteps = mean(steps))
ggplot(avInt, aes(interval, aveSteps)) +
geom_line() +
labs(title = 'Average number of steps per 5-minute interval',
x = '5-minute interval',
y = 'Average steps')
maxSteps <- avInt$interval[avInt$aveSteps == max(avInt$aveSteps)]
```
The maximum number of steps in the 5-minute average intervals was the interval `r maxSteps`
## Imputing missing values
```{r missing}
totalNA <- sum(is.na(dat$steps))
```
There are `r totalNA` missing values in the dataset.
```{r fillMissing}
# Fill missing values with the average 5-min steps and find daily total steps
datMiss <- full_join(dat, avInt, by = 'interval') %>%
mutate(steps = ifelse(is.na(steps), aveSteps, steps)) %>%
group_by(date) %>%
summarize(totalSteps = sum(steps))
```
```{r hist}
ggplot(datMiss, aes(totalSteps)) +
geom_histogram() +
labs(title = 'Histogram of total number of steps taken per day',
x = 'Total number of steps per day',
y = '')
meanStepsF <- as.character(round(mean(datMiss$totalSteps, 0)))
medianStepsF <- as.character(round(median(datMiss$totalSteps, 0)))
```
The mean steps taken per day is `r meanStepsF` in the filled data set. This represents a difference of `r as.numeric(meanStepsF) - as.numeric(meanSteps)` steps in the filled data set.
The median steps taken per day is `r medianStepsF` in the filled data set. This is a difference of `r as.numeric(medianStepsF) - as.numeric(medianSteps)` steps in the filled data set.
## Are there differences in activity patterns between weekdays and weekends?
```{r weekday}
datAll <- full_join(dat, avInt, by = 'interval') %>%
mutate(steps = ifelse(is.na(steps), aveSteps, steps),
day = weekdays(date, abbreviate = T))
datAll <- datAll %>%
mutate(isweekend = ifelse(day %in% c('Sat', 'Sun'), 'Weekend', 'Weekday')) %>%
mutate(isweekend = factor(isweekend)) %>%
select(-aveSteps, -day) %>%
group_by(isweekend, interval) %>%
summarize(aveSteps = mean(steps))
ggplot(datAll, aes(interval, aveSteps)) +
geom_line() +
facet_wrap(~isweekend, dir = 'v') +
labs(title = 'Step comparison of weekdays vs weekends',
x = '5-minute time interval',
y = 'Average number of steps')
```