Skip to content

ahmad-u/tutorials-for-data-scientists

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Please note: The code in these repos is sourced from the DataRobot user community and is not owned or maintained by DataRobot, Inc. You may need to make edits or updates for this code to function properly in your environment.

Tutorials for Data Scientists

This repository contains various end-to-end use case examples using the DataRobot API. Each use case directory contains instructions for its own use.

A simple example to get you started can be found here. The example can also be executed through Google Colab.

Additional code examples can be accessed from the following locations:

Usage

For each respective guide, follow the instructions in its own .ipynb or .Rmd file.

Please pay attention to the different DataRobot API Endpoints.

The API endpoint you specify for accessing DataRobot is dependent on the deployment environment, as follows:

The DataRobot API Endpoint is used to connect your IDE to DataRobot.

Important Links

Contents

Classification

  • Lead Scoring for selling online courses: Predict who is likely to become a customer by using binary classification strategy. Create a custom feature list. Get the ROC Curve, Feature Impact, and Feature Effects. Plot them for analysis. Retrain your model and make predictions. Python

  • Predict Hospital Readmissions: Predict which patients are likely to be readmitted within 30 days after being discharged by using binary classification. Install the software, find your API token, choose the best model, get the evaluation metrics, and make predictions. Python R

  • Predict COVID-19 at the County Level: Predict high risk counties with a look-alike modeling strategy. Build a binary classification model and rank each county by the probability of seeing cases. Set up the project, get evaluation and interpretability metrics, plot results, and get prediction explanations. Python.

  • Predict Medical Fraud: Predict fraudulent medical claims with binary classification. Connect to a SQL database, create a data store, write custom functions to build multiple projects, conduct anomaly detection and deploy the model using the prediction server. Save the results for a custom dashboard. Python

  • Lead Scoring Bank Marketing: Predict which customers are likely to purchase a product or service in response to a bank telemarketing campaign. Upload data, create a project, and get and plot the ROC Curve and Feature Impact. Get the holdout predictions. R

DRU

API Training: The DataRobot API Training is targeted at data scientists and motivated individuals with at least basic coding skills who want to take automation with DataRobot to the next level. Python R

Here you will be able to learn how to use the DataRobot API through a series of exercises that will challenge you, and teach you how to solve some of the most common problems that people run into.

Start by carefully reading the "API Training - Introductory Notebook" Python or R. This will help you learn the basics and provide a concrete overview for the API. Afterwards, go within the /Exercises folder and start downloading and solving the exercises.

The list of exercises is as follows:

  • Exercise 1 Feature Selection Curves Python R

  • Exercise 2. Advanced Feature Manipulation Python R

  • Exercise 3. Model Documentation Python R

  • Exercise 4. Beyond AutoPilot Python R

  • Exercise 5. Model Factory Python R

  • Exercise 6. Continuous Model Training Python R

  • Exercise 7. Using a Database Python R

Model Factories

  • Classification Model Factory: Create a model factory for a binary classification problem using our readmissions dataset. Predict the likelihood of patient readmission. Build a single project and find the best model. Then, build more projects based on admission id. Find the best model for each subproject. Make this model ready for deployment. Python R

  • Time Series Model Factory: Create a time series model factory using our store sales multiseries dataset. Set up a time series multiseries project. Get the best model and its performance. Cluster the data and create plots over time. Create a project for each cluster and evaluate the results. Python R

Model Management

  • Automated Retraining and Replacement of Models: Automatically retrain and replace models with this automated continuous training pipeline. Python/cURL

  • Monitoring Drift and Replacing Models: Monitor your deployment for data drift and replace the model once a criteria is met. Connect to a SQL server and create a data store. Create a project based on the data source. Deploy the recommended model and set up drift tracking settings. Upload and make predictions on a dataset with drift. Check the drift results and replace the model. Python

Multiclass

  • Multiclass one-vs-rest Modeling: Create a one-vs-rest model to do geophysical classification with 9 potential classes. Preprocess the data and split up the dataset. Use a loop to build nine projects and put the result into a DataFrame. Then, get the predictions and plot them with an advanced visualization technique. Python

  • Predicting Product Type Based on Customer Complaints: Use the free text from customer complaints to predict which product the customers are addressing. Python

Out of Time Validation (OTV)

  • Predict C02 levels of Mauna Loa: Create an OTV project to predict C02 levels. This project trains on older data and then validates on newer data. This strategy is done because scientists in this case know that the data changes. Import your data, create lagged features, define date-time partitioning, select a model, and get Feature Impact. Python

Regression

  • Double Pendulum with Eureqa Models: Solve a regression problem using Eureqa blueprints. Eureqa makes no prior assumptions about the dataset, instead fitting models to the data dynamically. The models are presented as mathematical equations, so end users can seamlessly understand results and recommendations. Set up a manual mode project and select Eureqa blueprints from the repository. Advance tune the default model and print the mathematical expression. Python

  • Analyzing Residuals to Build Better Models: Use residuals created by DataRobot insights to evaluate your models and make them better. Python

Time Series

  • Forecasting US COVID-19 Cases Using Time-Series: Create an AutoTS model on historical data taken from the US, France, and Spain. Clean and prepare the data. Create the time series project and build models. Forecast 10 days ahead for each country and write the results to a CSV file. R

VisualAI

  • VisualAI Heartbeats: Create a Visual AI project to classify images of sound. Heartbeats of people with normal and atypical heart conditions were recorded onto WAV files. This code shows you how to create spectrograms from the images and import them into DataRobot for Visual AI classification. Python

  • Detecting Droids with DataRobot: Create a Visual AI project to classify images of droids and create a custom shiny application. Build file paths to images and set up folders for VisualAI. Import that data in the platform and create image classification models. Get evaluation metrics and plot them with ggplot. Create a deployment using the prediction server. Make a shiny app that hits the deployment. R

  • Visual AI Oxford Pets: Create a Visual AI project to classify dog breeds! Python

Anomaly Detection (Unsupervised Learning)

  • Anti-Money Laundering with Outlier Detection: Create an unsupervised model that can predict money-laundering related transactions. Use a small set of labeled data to evaluate how the different models can perform. Python

Feature Discovery

  • Feature Discovery with Instacart Dataset: An example of how to use Feature Discovery through the Python API. Python

Development and Contributing

If you'd like to report an issue or bug, suggest improvements, or contribute code to this project, please refer to CONTRIBUTING.md.

Code of Conduct

This project has adopted the Contributor Covenant for its Code of Conduct. See CODE_OF_CONDUCT.md to read it in full.

License

Licensed under the Apache License 2.0. See LICENSE to read it in full.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Other 0.1%