My Working Repository for R. It was a great learning experience to be able to configure this repository and connect it to RStudio.
Contribution on 12th May 2022
My first python script using reticulate
Contribution on 10th May 2022
Tidy Tuesday Week 18 Data Analysis and Plotting - Use of {ggtext}
Contribution on 05th May 2022
Started a new data visualization project on Andhra Pradesh Health Insurance Data
Contribution on 03rd May 2022
Answered a few R4DS Slack questions ....
Contribution on 02nd May 2022
Answered a few R4DS Slack questions ....
Contribution on 29th April 2022
What are anonymous functions and how can they can be used inline in piping to construct and add inline transformations ... All of it can be found in crosswords.R
Some old plots removed and combined into a faceted plot of answers and clues for both Big Dave and Times data set.
Happy Learning !!!!
Contribution on 25th April 2022
Tidy Tuesday Week 16 - 1st plot updated. 2nd plot created with time data frame. Use of tidy text package data frame, tokenization of the word clues and visualize the distribution of the most frequently used words in the clues. Also a major learning on how to use word cloud 2 package and how to create word clouds using this package.
Good learning ...
Contribution on 17th April 2022
I have managed to update my R and R Studio Installation after almost a year. Hurrah !!!. I have managed to install all the essential packages as well and I am up and running again with a new version of R and its base pipe operator.
Hence I updated the Readme as well ...
Contribution on 13th June 2021
An illustration of Hierarchical Clustering as a method of unsupervised learning is demonstrated with a RMD script hierarchical_clustering.rmd. This is an applied exercise of the book ISLR and analyzes the US Arrests data set to perform Hierarchical Clustering. Dendrogram plots are shown for the hierarchical clusters. Some interesting observations can be obtained from this notebook when the clustering is done on both a scaled version of the data and the un-scaled version of the data. The plots are also interesting and depicts the variations of both the approach.
Lots of learning to do but so little a lifespan we have !!!
Contribution on 05th June 2021
Data set Courtesy :- Kaggle
Data Set Link :- https://www.kaggle.com/anmolkumar/health-insurance-cross-sell-prediction
Health Insurance Cross Sell Predictions have been added today. The script insurance_cross_sell.R uses the Kaggle Data set of Health Insurance Cross Sell and applies the XgBoost Alogorithm to predict whether a customer can be be cross sold a health insurance plan based on a set of predictors. As the response variable is a bit screwed and based on the feedback from the Kaggle community the data has been balanced out by under-sampling with the help of the ROSE package and its function ovun.sample. This exercise can be a good tutorial on how to
- Use the R XGboost Package
- Use the ROSE package for handling imbalanced data sets
Contribution on 22nd May 2021
Mobile Price Classification using Kaggle Data Set and applying the Support Vector Machines and its different non-linear kernels i.e. Radial, Polynomial, Sigmoid. You will observe a very good comparative study of the training error and test error rates of these kernels. Random Forest is also applied as a reference model to understand given a training and test data set how the performances of a Support Vector Machine Model compare with it. You will see some plots with ggplot2 as well which does not need mentioning
Keep watching this space for some good stuff on R !!!
Thanks !!!
Contribution on 22nd July 2023
A program added to list out the packages installed and then use that list to install these packages when a new R version is upgraded. The script is added in my_rough_copy.R file
The auto install R Packages have now been into a new file auto_install_r_packages.R