It is natural that you need huge computational resources to do deep learning, but the specifications of your PC are not enough, so I think that it is also a good idea to quickly set up a GPU instance of EC2.
Framework is Keras and backend is TensorFlow GPU version. It was configured to be set up in the pyenv virtual environment. TensorFlow 1.0 released the other day It seems that tf.keras
has been implemented, but I haven't tried it yet, so it's normal I will use Keras.
I referred to here for the environment construction. Run TensorFlow on AWS GPU instance
EC2 Instance: Ubuntu Server 16.04 LTS (HVM), SSD Volume Type in Quick Start Type : g2.2xlarge Standard configuration for storage etc. (memory 15GB, storage 8GB)
First, log in with SSH
ssh -i ~/[ec2key].pem ubuntu@[Instance IP]
Since it is quite big with CUDA, there is not enough free space in the middle of the work. Create a symbolic link to / mnt / tmp / to use ephemeral storage as your work area.
sudo mkdir /mnt/tmp
sudo chmod 777 /mnt/tmp
sudo rm -rf /tmp
sudo ln -s /mnt/tmp /tmp
cd /tmp
sudo apt-get update
sudo apt-get upgrade -y
After upgrading, I get annoyed with locale-related warnings.
sudo apt-get install language-pack-ja
sudo update-locale LANG=ja_JP.UTF-8
sudo apt-get install python
sudo apt-get install -y build-essential python-pip python-dev git python-numpy swig python-dev default-jdk zip zlib1g-dev ipython
echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot
After rebooting, log in and install linux-image-extra-virtual.
ssh -i ~/[ec2key].pem ubuntu@[Instance IP]
sudo apt-get install -y linux-image-extra-virtual
sudo reboot
After rebooting, log in and install linux-headers.
ssh -i ~/[ec2key].pem ubuntu@[Instance IP]
sudo apt-get install -y linux-source linux-headers-`uname -r`
The latest version at the moment is 8.0. Download and install
cd /tmp
wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run
chmod +x cuda_8.0.44_linux-run
./cuda_8.0.44_linux-run -extract=`pwd`/nvidia_installers
cd nvidia_installers/
sudo ./NVIDIA-Linux-x86_64-367.48.run
#Select Accept. Kernel setup has 100 progress bars%It takes a while after becoming
#Select OK
#Select OK
#Select Yes
#Select OK
sudo modprobe nvidia
sudo ./cuda-linux64-rel-8.0.44-21122537.run
#The Readme is displayed, so exit with q.
#Enter accept
# install Path: default
# shortcut?: default
Download cuDNN locally once from https://developer.nvidia.com/cudnn Because cuDNN cannot be dropped without signing up with Developer. it's no use.
Transfer cuDNN downloaded from your local terminal to your EC2 instance
scp -i [ec2key].pem cudnn-8.0-linux-x64-v5.1.tgz ubuntu@[Instance IP]:/tmp
Return to EC2 instance
cd /tmp
tar -xzf cudnn-8.0-linux-x64-v5.1.tgz
sudo mv ./cuda/lib64/* /usr/local/cuda/lib64/
sudo mv ./cuda/include/* /usr/local/cuda/include/
Add the following to ~ / .bashrc
.bashrc
# cuDNN
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda
source ~/.bashrc
python -V
Python 2.7.12
sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
git clone https://github.com/yyuu/pyenv.git ~/.pyenv
git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv
Create ~ / .bash_profile
and write the following
.bash_profile
# pyenv
export PYENV_ROOT=$HOME/.pyenv
export PATH=$PYENV_ROOT/bin:$PATH
eval "$(pyenv init -)"
# virtualenv
eval "$(pyenv virtualenv-init -)"
export PYENV_VIRTUALENV_DISABLE_PROMPT=1
source ~/.bash_profile
Create a virtual environment for Keras
pyenv install 3.5.3
pyenv virtualenv 3.5.3 keras
pyenv activate keras
python -V
Python 3.5.3
pip install tensorflow-gpu
Check the version of TensorFlow and if it looks like the following, you can use the GPU.
python -c 'import tensorflow as tf; print(tf.__version__)'
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
1.0.0
pip install pillow
pip install h5py
pip install matplotlib
pip install keras
cd /tmp
git clone https://github.com/fchollet/keras.git
cd keras/examples
For the time being, try running the one that solves MNIST with CNN
python mnist_cnn.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
60000/60000 [==============================] - 13s - loss: 0.3770 - acc: 0.8839 - val_loss: 0.0932 - val_acc: 0.9709
Epoch 2/12
60000/60000 [==============================] - 11s - loss: 0.1363 - acc: 0.9603 - val_loss: 0.0632 - val_acc: 0.9801
Epoch 3/12
60000/60000 [==============================] - 11s - loss: 0.1064 - acc: 0.9687 - val_loss: 0.0509 - val_acc: 0.9835
Epoch 4/12
60000/60000 [==============================] - 11s - loss: 0.0900 - acc: 0.9736 - val_loss: 0.0443 - val_acc: 0.9857
Epoch 5/12
60000/60000 [==============================] - 11s - loss: 0.0769 - acc: 0.9775 - val_loss: 0.0405 - val_acc: 0.9865
Epoch 6/12
60000/60000 [==============================] - 11s - loss: 0.0689 - acc: 0.9795 - val_loss: 0.0371 - val_acc: 0.9870
Epoch 7/12
60000/60000 [==============================] - 11s - loss: 0.0649 - acc: 0.9803 - val_loss: 0.0361 - val_acc: 0.9881
Epoch 8/12
60000/60000 [==============================] - 11s - loss: 0.0594 - acc: 0.9823 - val_loss: 0.0356 - val_acc: 0.9886
Epoch 9/12
60000/60000 [==============================] - 11s - loss: 0.0547 - acc: 0.9841 - val_loss: 0.0321 - val_acc: 0.9889
Epoch 10/12
60000/60000 [==============================] - 11s - loss: 0.0525 - acc: 0.9841 - val_loss: 0.0320 - val_acc: 0.9889
Epoch 11/12
60000/60000 [==============================] - 11s - loss: 0.0506 - acc: 0.9850 - val_loss: 0.0323 - val_acc: 0.9892
Epoch 12/12
60000/60000 [==============================] - 11s - loss: 0.0471 - acc: 0.9856 - val_loss: 0.0314 - val_acc: 0.9897
Test score: 0.0314083654978
Test accuracy: 0.9897
Execution time is: 2:23 (excluding data download time). It took about 35 minutes with my MBA, so it's more than 10 times faster. It's about 10 seconds per epoch.
I also tried IMDB's LSTM, which seems to be heavy.
python imdb_cnn_lstm.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 100)
X_test shape: (25000, 100)
Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/2
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
30/25000 [..............................] - ETA: 1397s - loss: 0.6936 - acc: 0.4333I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3811 get requests, put_count=2890 evicted_count=1000 eviction_rate=0.346021 and unsatisfied allocation rate=0.530307
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
360/25000 [..............................] - ETA: 160s - loss: 0.6935 - acc: 0.4833I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2156 get requests, put_count=2374 evicted_count=1000 eviction_rate=0.42123 and unsatisfied allocation rate=0.373377
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
870/25000 [>.............................] - ETA: 94s - loss: 0.6925 - acc: 0.5287I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4249 get requests, put_count=4491 evicted_count=1000 eviction_rate=0.222668 and unsatisfied allocation rate=0.192281
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 655 to 720
25000/25000 [==============================] - 63s - loss: 0.3815 - acc: 0.8210 - val_loss: 0.3519 - val_acc: 0.8456
Epoch 2/2
25000/25000 [==============================] - 60s - loss: 0.1970 - acc: 0.9238 - val_loss: 0.3471 - val_acc: 0.8534
24990/25000 [============================>.] - ETA: 0sTest score: 0.347144101623
Test accuracy: 0.853440059948
Execution time: 2:25 (excluding data download time). It was overwhelming because it took more than 40 minutes for MBA.
Let's get on with it and try something that seems heavier. mnist_acgan.py
seems to solve MNIST with a guy called ACGAN (Auxiliary Classifier Generative Adversarial Network). Is it a relative of DCGAN? For details, it seems to be listed in here, but it is difficult, so I will postpone it.
It will be heavy because it is GAN for the time being. What is it like?
python mnist_acgan.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Epoch 1 of 50
0/600 [..............................] - ETA: 0sI tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 3.74GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
599/600 [============================>.] - ETA: 1s
Testing for epoch 1:
component | loss | generation_loss | auxiliary_loss
-----------------------------------------------------------------
generator (train) | 3.88 | 1.49 | 2.39
generator (test) | 3.36 | 1.04 | 2.32
discriminator (train) | 2.11 | 0.53 | 1.58
discriminator (test) | 2.12 | 0.70 | 1.43
Epoch 2 of 50
44/600 [=>............................] - ETA: 732s
It took about 15 minutes to finish 1 epoch. There are 48 epochs left. .. It was endless and costly, so I stopped halfway through. Well, if you turn it for half a day, you'll get results. It's amazing.
df -k
Filesystem 1K-blocks Used Available Use% Mounted on
udev 7679880 0 7679880 0% /dev
tmpfs 1539900 8800 1531100 1% /run
/dev/xvda1 8117828 6149060 1533492 81% /
tmpfs 7699496 0 7699496 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 7699496 0 7699496 0% /sys/fs/cgroup
/dev/xvdb 66946696 3017852 60521484 5% /mnt
tmpfs 1539904 0 1539904 0% /run/user/1000
System disk usage is 81%. Ephemeral storage volatilizes when the instance is stopped, so if you are worried about the default 8GB, you should expand the storage procedure.
I took a snapshot of the instance I was able to build and created a g2.xlarge instance from the AMI. Try to work properly.
pyenv activate keras
python -V
Python 3.5.3
python -c 'import tensorflow as tf; print(tf.__version__)'
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 61, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import *
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 72, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 61, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
TensorFlow throws an error. I'm not sure, but is it related to CUDA? It may be necessary to consider later whether it can be converted to AMI. If you want to use it intermittently, you may want to stop the instance and leave it.
It takes less than an hour to build the above Keras environment. It's okay to stop the instance and leave it, but it's not too much trouble to build it from Sara when you want to use it. It is better to think according to the frequency of use and the case. It's a lot easier than before the TensorFlow GPU environment was built.
Recommended Posts