-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SunKyoung Moon Solution for Kickbones1 project #329
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sun,
thanks for the submission. Looks good but your Rmarkdown is not valid and thus you are not generating (and linking) the resulting markdown (including its figures) but a separately uploaded image... please correct (see comments).
Best,
Martin
#Loading Data | ||
url <- "https://raw.githubusercontent.com/Dr-Eberle-Zentrum/Data-projects-with-R-and-GitHub/refs/heads/main/Projects/SunKyoung%20Moon/kaggle_dataset.csv" | ||
|
||
data <- read.csv(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the data is within the same folder:
- set the working directory to the script file's location
- load the data directly from your local file
|
||
#Result | ||
|
||
![visualization_Result](https://imgur.com/jni0RwO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont upload files somewhere else... they should be generated within the Rmd script and subsequently added to the github repo..!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so please:
- knit the Rmd file within Rstudio (will generate a md file and respective png files in a subfolder
- commit the md and png files along with the markdown changes
|
||
# Summarize age groups for scatter plot | ||
data <- data %>% | ||
mutate(AgeGroup = case_when( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this up to the pipeline after line 20... dont split your data wrangling
head(data) | ||
|
||
# Convert app usage time from minutes/day to hours/day | ||
data <- data %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommendation: dont overwrite your data! split rawdata and modified one, in case you need two data sets. but typically, it is not needed
device_user_counts <- data %>% | ||
group_by(Device.Model) %>% | ||
summarize(UserCount = n()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont store the data in a variable just for subsequent visualization. directly pipe from data via your changes into the ggplot call.. that way, less things can go wrong!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and you have less variables.. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file name wise: dont put "first trial", "version X", etc. into file names.. this is why we are using a version control system like git to manage such meta information via its history or versioning. please rename!
output: md_document | ||
date: "2024-12-03" | ||
--- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are missing a "code block start and end" using three backticks.. .please double check with the rmarkdown chapter from the beginning of the course!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sun,
your graph looks great! I just have two remarks:
- You've computed the user count for each device model and created a separat graph with that information. It would be nice if you could add the user count into the violin charts at the corresponding device names. Additionally please sort the different charts corresponding to the user count in a descending order. That being said the x-axis should look something like this:
----------------------------------------------------------------------------------------->
Iphone 12 Xiaomi Mi 11 Google Pixel 5 OnePlus 9 Samsung Galaxy S21
146 Users 146 Users 142 Users 133 Users 133 Users
- As of now you display a scatterplot for the age without the differantiation of the gender. This leads to the points being between the violin charts of male and female. It would be nice if you could change your code in such a way that the scatterplot points appear inside the corresponding charts.
2. chage the variable name of data (redundancy problem) 3. add the user count into the violin charts at the corresponding device names 4. scatterplot points appear inside the corresponding charts
Thanks for good feed-backs. Here is the version 2 link |
@kickbones1 project solution is here
link)