Skip to content

Create_AMI

Andy Feng edited this page Feb 6, 2017 · 5 revisions

Create TensorFlowOnSpark (TFoS) AMI on EC2

This tutorial outlines the steps to create TFoS AMI on AWS EC2 using a p2.xlarge instance using Ubuntu Server 16.04.

A prebuilt AMI image is available for you to use. See Get Started on EC2.

1. Launch a Ubuntu Server Instance

We launch a Ubuntu Server 16.04 LTS (HVM) AMI with a p2.xlarge instance in Amazon EC2. 16 GiB of storage on the root partition is required.

  1. Go to https://eu-west-1.console.aws.amazon.com/console
  2. Select EC2
  3. Request Spot Requests
  4. Specify an AMI
  5. Specify the spot max price
  6. Wait for instance to enter running state

2. ssh onto Your Instance

Please follow [AWS instruction](http://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-keypairs.html] to create a keyapir. Here is an example command):

export EC2_KEY=ec2_${USER}
export EC2_PEM_FILE=~/.ssh/ec2_${USER}.pem
ec2-create-keypair -O ${AWS_ACCESS_KEY_ID} -W ${AWS_SECRET_ACCESS_KEY} --region us-west-2 ${EC2_KEY}
emacs ${EC2_PEM_FILE}
chmod 600 ${EC2_PEM_FILE}

ssh onto your instance:

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${EC2_PEM_FILE} root@<MASTER>

3. Installing dependencies & build tools

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev libcurl3-dev
sudo pip install -U pip
sudo apt install yum
sudo pip install werkzeug

4. Installing CUDA 8

wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
rm cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo apt-get update
sudo apt-get install -y cuda

5. Installing cuDNN

Downloading cuDNN requires logging into NVIDIA developer site, so we can’t use wget to fetch the files. Download the following files from NVIDIA and upload them to your AWS instance.

sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb
sudo mkdir  /usr/lib/x86_64-linux-gnu/include
sudo cp /usr/include/cudnn.h  /usr/lib/x86_64-linux-gnu/include

6. Configure the Environment

Add to the following lines to your ~/.bash_profile file.

export CUDA_ROOT=/usr/local/cuda 
export CUDA_HOME=$CUDA_ROOT 
export PATH=$PATH:$CUDA_ROOT/bin 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64:$CUDA_ROOT/extras/CUPTI/lib64
export TFoS_HOME=/root/TensorFlowOnSpark
export HADOOP_HOME=/root/ephemeral-hdfs
export SPARK_HOME=/root/spark
export PATH=${PATH}:${HADOOP_HOME}/bin:${SPARK_HOME}/bin

7. Installing Java 8

sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update 
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get install -y oracle-java8-installer

8. Installing Bazel

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel

9. Building and Installing TensorFlow

Fetch TFoS source code:

git clone --recurse-submodules https://github.com/yahoo/TensorFlowOnSpark.git
cd TensorFlowOnSpark
git submodule init
git submodule update --force
git submodule foreach --recursive git clean -dfx

Configure tensorflow:

cd tensorflow 
TF_UNOFFICIAL_SETTING=1 ./configure

You could access almost all defaults:

  • Hadoop File System support? [y/N] y
  • CUDA support? [y/N] y
  • CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
  • Cudnn version you want to use. [Leave empty to use system default]: 5.1.10
  • location where cuDNN 5.1.10 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu
  • compute capability of your device [Default is: "3.5,5.2"]: 3.7

Building TensorFlow (Be patient. It could take several hours):

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Install it

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp27-cp27mu-linux_x86_64.whl

Test TensorFlow:

pushd ${TFoS_HOME}/tensorflow
python tensorflow/examples/tutorials/mnist/mnist_with_summaries.py --data_dir ${TFoS_HOME}/mnist

10. Download tensorflow-hadoop-1.0-SNAPSHOT.jar for TensorFlow Record reader

11. Create AMI image

Exit from your term:

rm -rf /root/.cache/bazel/
cat /dev/null > ~/.bash_history && history -c && exit

Use Amazon EC2 console to create an AMI image from your instance: Actions -> Image -> Create Image.

Clone this wiki locally