-
Notifications
You must be signed in to change notification settings - Fork 942
Create_AMI
This tutorial outlines the steps to create TFoS AMI on AWS EC2 using a p2.xlarge instance using Ubuntu Server 16.04.
A prebuilt AMI image is available for you to use. See Get Started on EC2.
We launch a Ubuntu Server 16.04 LTS (HVM) AMI with a p2.xlarge instance in Amazon EC2. 16 GiB of storage on the root partition is required.
- Go to https://eu-west-1.console.aws.amazon.com/console
- Select EC2
- Request Spot Requests
- Specify an AMI
- Specify the spot max price
- Wait for instance to enter running state
Please follow [AWS instruction](http://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-keypairs.html] to create a keyapir. Here is an example command):
export EC2_KEY=ec2_${USER}
export EC2_PEM_FILE=~/.ssh/ec2_${USER}.pem
ec2-create-keypair -O ${AWS_ACCESS_KEY_ID} -W ${AWS_SECRET_ACCESS_KEY} --region us-west-2 ${EC2_KEY}
emacs ${EC2_PEM_FILE}
chmod 600 ${EC2_PEM_FILE}
ssh onto your instance:
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${EC2_PEM_FILE} root@<MASTER>
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev libcurl3-dev
sudo pip install -U pip
sudo apt install yum
sudo pip install werkzeug
wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
rm cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo apt-get update
sudo apt-get install -y cuda
Downloading cuDNN requires logging into NVIDIA developer site, so we can’t use wget to fetch the files. Download the following files from NVIDIA and upload them to your AWS instance.
sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb
sudo mkdir /usr/lib/x86_64-linux-gnu/include
sudo cp /usr/include/cudnn.h /usr/lib/x86_64-linux-gnu/include
Add to the following lines to your ~/.bash_profile file.
export CUDA_ROOT=/usr/local/cuda
export CUDA_HOME=$CUDA_ROOT
export PATH=$PATH:$CUDA_ROOT/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64:$CUDA_ROOT/extras/CUPTI/lib64
export TFoS_HOME=/root/TensorFlowOnSpark
export HADOOP_HOME=/root/ephemeral-hdfs
export SPARK_HOME=/root/spark
export PATH=${PATH}:${HADOOP_HOME}/bin:${SPARK_HOME}/bin
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get install -y oracle-java8-installer
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel
Fetch TFoS source code:
git clone --recurse-submodules https://github.com/yahoo/TensorFlowOnSpark.git
cd TensorFlowOnSpark
git submodule init
git submodule update --force
git submodule foreach --recursive git clean -dfx
Configure tensorflow:
cd tensorflow
TF_UNOFFICIAL_SETTING=1 ./configure
You could access almost all defaults:
- Hadoop File System support? [y/N] y
- CUDA support? [y/N] y
- CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
- Cudnn version you want to use. [Leave empty to use system default]: 5.1.10
- location where cuDNN 5.1.10 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu
- compute capability of your device [Default is: "3.5,5.2"]: 3.7
Building TensorFlow (Be patient. It could take several hours):
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Install it
sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp27-cp27mu-linux_x86_64.whl
Test TensorFlow:
pushd ${TFoS_HOME}/tensorflow
python tensorflow/examples/tutorials/mnist/mnist_with_summaries.py --data_dir ${TFoS_HOME}/mnist
- git clone https://github.com/tensorflow/ecosystem.git
- follow build instructions to generate tensorflow-hadoop-1.0-SNAPSHOT.jar
Exit from your term:
rm -rf /root/.cache/bazel/
cat /dev/null > ~/.bash_history && history -c && exit
Use Amazon EC2 console to create an AMI image from your instance: Actions -> Image -> Create Image.