Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SunKyoung Moon Solution for Kickbones1 project #329

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions Projects/kickbones1/Solution for kickbones1 (first trial).Rmd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name wise: dont put "first trial", "version X", etc. into file names.. this is why we are using a version control system like git to manage such meta information via its history or versioning. please rename!

Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
title: "kickbones1_from SunKyoung Moon(fist trial)"
author: "Luis"
output: md_document
date: "2024-12-03"
---

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are missing a "code block start and end" using three backticks.. .please double check with the rmarkdown chapter from the beginning of the course!

library(ggplot2)
library(dplyr)
library(tidyr)

#Loading Data
url <- "https://raw.githubusercontent.com/Dr-Eberle-Zentrum/Data-projects-with-R-and-GitHub/refs/heads/main/Projects/SunKyoung%20Moon/kaggle_dataset.csv"

data <- read.csv(url)
Copy link
Member

@martin-raden martin-raden Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the data is within the same folder:

  • set the working directory to the script file's location
  • load the data directly from your local file

head(data)

# Convert app usage time from minutes/day to hours/day
data <- data %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommendation: dont overwrite your data! split rawdata and modified one, in case you need two data sets. but typically, it is not needed

mutate(AppUsageHours = `App.Usage.Time..min.day.` / 60)

# Compute median app usage time and sum users for each device model
device_summary <- data %>%
group_by(`Device.Model`, Gender) %>%
summarize(
MedianAppUsage = median(AppUsageHours, na.rm = TRUE),
UserCount = n()
) %>%
ungroup()

# Summarize age groups for scatter plot
data <- data %>%
mutate(AgeGroup = case_when(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this up to the pipeline after line 20... dont split your data wrangling

Age >= 20 & Age <= 29 ~ "20-29",
Age >= 30 & Age <= 39 ~ "30-39",
Age >= 40 & Age <= 49 ~ "40-49",
Age >= 50 & Age <= 59 ~ "50-59",
TRUE ~ "Others"
))

#Visualization part!
ggplot(data, aes(x = Device.Model, y = AppUsageHours, fill = Gender)) +
geom_violin(alpha = 0.7, scale = "width") +
geom_point(aes(color = AgeGroup), position = position_jitter(width = 0.2, height = 0), alpha = 0.5) +
scale_fill_manual(values = c("Male" = "blue", "Female" = "red")) +
scale_color_manual(values = c(
"20-29" = "gray",
"30-39" = "green",
"40-49" = "pink",
"50-59" = "purple"
)) +
labs(
title = "Mobile Device Usage for Different Models",
subtitle = "Median app usage time differentiated by gender and age groups",
x = "Device Model",
y = "Median App Usage Time (hours/day)",
fill = "Gender",
color = "Age Group"
) +
theme_minimal()

device_user_counts <- data %>%
group_by(Device.Model) %>%
summarize(UserCount = n())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont store the data in a variable just for subsequent visualization. directly pipe from data via your changes into the ggplot call.. that way, less things can go wrong!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you have less variables.. ;)

ggplot(device_user_counts, aes(x = Device.Model, y = UserCount)) +
geom_col(fill = "lightblue") +
geom_text(aes(label = UserCount), vjust = -0.5) +
labs(
title = "User Count per Device Model",
x = "Device Model",
y = "User Count"
) +
theme_minimal()

#Result

![visualization_Result](https://imgur.com/jni0RwO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont upload files somewhere else... they should be generated within the Rmd script and subsequently added to the github repo..!

Copy link
Member

@martin-raden martin-raden Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so please:

  • knit the Rmd file within Rstudio (will generate a md file and respective png files in a subfolder
  • commit the md and png files along with the markdown changes

![Visualization_Result2](https://imgur.com/DOEtSmH)

47 changes: 47 additions & 0 deletions Projects/kickbones1/Solution-for-kickbones1--first-trial-.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
library(ggplot2) library(dplyr) library(tidyr)

\#Loading Data url &lt;-
“<https://raw.githubusercontent.com/Dr-Eberle-Zentrum/Data-projects-with-R-and-GitHub/refs/heads/main/Projects/SunKyoung%20Moon/kaggle_dataset.csv>”

data &lt;- read.csv(url) head(data)

# Convert app usage time from minutes/day to hours/day

data &lt;- data %&gt;% mutate(AppUsageHours = `App.Usage.Time..min.day.`
/ 60)

# Compute median app usage time and sum users for each device model

device\_summary &lt;- data %&gt;% group\_by(`Device.Model`, Gender)
%&gt;% summarize( MedianAppUsage = median(AppUsageHours, na.rm = TRUE),
UserCount = n() ) %&gt;% ungroup()

# Summarize age groups for scatter plot

data &lt;- data %&gt;% mutate(AgeGroup = case\_when( Age &gt;= 20 & Age
&lt;= 29 ~ “20-29”, Age &gt;= 30 & Age &lt;= 39 ~ “30-39”, Age &gt;= 40
& Age &lt;= 49 ~ “40-49”, Age &gt;= 50 & Age &lt;= 59 ~ “50-59”, TRUE ~
“Others” ))

\#Visualization part! ggplot(data, aes(x = Device.Model, y =
AppUsageHours, fill = Gender)) + geom\_violin(alpha = 0.7, scale =
“width”) + geom\_point(aes(color = AgeGroup), position =
position\_jitter(width = 0.2, height = 0), alpha = 0.5) +
scale\_fill\_manual(values = c(“Male” = “blue”, “Female” = “red”)) +
scale\_color\_manual(values = c( “20-29” = “gray”, “30-39” = “green”,
“40-49” = “pink”, “50-59” = “purple” )) + labs( title = “Mobile Device
Usage for Different Models”, subtitle = “Median app usage time
differentiated by gender and age groups”, x = “Device Model”, y =
“Median App Usage Time (hours/day)”, fill = “Gender”, color = “Age
Group” ) + theme\_minimal()

device\_user\_counts &lt;- data %&gt;% group\_by(Device.Model) %&gt;%
summarize(UserCount = n())

ggplot(device\_user\_counts, aes(x = Device.Model, y = UserCount)) +
geom\_col(fill = “lightblue”) + geom\_text(aes(label = UserCount), vjust
= -0.5) + labs( title = “User Count per Device Model”, x = “Device
Model”, y = “User Count” ) + theme\_minimal()

\#Result ![visualization\_Result](https://imgur.com/jni0RwO)
![Visualization\_Result2](https://imgur.com/DOEtSmH)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.