Understand your code with Machine Learning

Workshop given at DevFest Nantes 2019.

Slides: on gDrive

OSS tools covered:

Abstract

Machine Learning on Source Code (MLonCode) is an emerging and exciting research domain which stands at the sweet spot between deep learning, natural language processing, social science, and programming.

During this 2 hours workshop, we are going to show you how to extract insights from code bases—step by step—by shedding light on those crucial aspects:

What information is available in your code

How to extract this information

What can you do with this knowledge: what are the tasks solvable by MLonCode

Which models can be used to solve them

To get our hands dirty, we will solve several example tasks, using source{d}, an open source stack to gain insights from codebases:

Suggest function names automatically

Cluster developers

Search projects by similarity

Prerequisites: a laptop with Docker installed. We will provide an image to all participants.

Slides: on gDrive

Prerequisites

Docker

Dependencies

Import Docker images (works offline):

docker load -i images/jupyter.tgz
docker load -i images/gitbase.tgz
docker load -i images/bblfshd-with-drivers.tgz

docker images

Run bblfsh

docker run \
    --detach \
    --rm \
    --name devfest_bblfshd \
    --privileged \
    --publish 9432:9432 \
    bblfsh/bblfshd:v2.15.0-drivers \
    --log-level DEBUG

Run gitbase

docker run \
    --detach \
    --rm \
    --name devfest_gitbase \
    --publish 3306:3306 \
    --link devfest_bblfshd:devfest_bblfshd \
    --env BBLFSH_ENDPOINT=devfest_bblfshd:9432 \
    --env MAX_MEMORY=1024 \
    --volume $(pwd)/repos/git-data:/opt/repos \
    srcd/gitbase:v0.24.0-rc2

Run the jupyter image

docker run \
    --rm \
    --name devfest_jupyter \
    --publish 8888:8888 \
    --link devfest_bblfshd:devfest_bblfshd \
    --link devfest_gitbase:devfest_gitbase \
    --volume $(pwd)/notebooks:/devfest/notebooks \
    --volume $(pwd)/repos:/devfest/repos \
    mloncode/devfest

With make

To build the workshop image and launch the 3 required containers

make build-and-run

To only launch the 3 required containers

make

Workflow

1. Download the data

We are going to use top 50 repositories from Apache Software Foundation though this workshop.

Notebook 1: data collection pipeline

2. Project and Developer Similarities

Build a vector model for projects and developers using Topic Modelling of code identifiers.

Notebook 2: project and developer similarities

3. Function Name Suggestion

Train a NMT seq2seq model for predicting method names based on identifiers in method bodies.

Notebook 2: function name suggestion

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
docs		docs
images		images
notebooks		notebooks
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
jupyter-notebook-config.json		jupyter-notebook-config.json
jupyter-server-config.json		jupyter-server-config.json
requirements-bigartm.txt		requirements-bigartm.txt
requirements-tf.txt		requirements-tf.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Understand your code with Machine Learning

Prerequisites

Dependencies

Workflow

1. Download the data

2. Project and Developer Similarities

3. Function Name Suggestion

About

Releases

Packages

Contributors 2

Languages

mloncode/devfest2019-workshop

Folders and files

Latest commit

History

Repository files navigation

Understand your code with Machine Learning

Prerequisites

Dependencies

Workflow

1. Download the data

2. Project and Developer Similarities

3. Function Name Suggestion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages