I decided to try ChainerMN and built an environment on AWS, so I will keep a record of the work.
AWS p2 instances are reasonably priced, so you want to finish building your environment quickly.
-How to install CUDA on Ubuntu 16.04 -Unofficial tips for people who have trouble installing Chainer 1.5
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install linux-generic
$ sudo apt-get install build-essential
$ vi .bashrc #Added the following two lines
export LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
export CPATH="/usr/local/include"
Go to CUDA Toolkit Download, select Linux, x86_64, Ubuntu, 16.04, deb [network], pick up the download link, and do the following: do.
$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda nvidia-367
$ sudo reboot
$ sudo apt-get autoremove
$ rm cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ vi .bashrc #Added the following 4 lines
export CUDA_HOME="/usr/local/cuda-8.0"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export CPATH="$CUDA_HOME/include:$CPATH"
Log in again.
Download cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0, cuDNN v5.1 Library for Linux at cuDNN Download and AWS Put it on top.
$ tar zxvf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo cp -a cuda/lib64/* $CUDA_HOME/lib64/
$ sudo cp -a cuda/include/* $CUDA_HOME/include/
$ sudo ldconfig
$ rm -rf cuda cudnn-8.0-linux-x64-v5.1.tgz
In the work of the previous section
$ sudo ldconfig
When I did
/sbin/ldconfig.real: /usr/lib/nvidia-375/libEGL.so.1 is not a symbolic link
/sbin/ldconfig.real: /usr/lib32/nvidia-375/libEGL.so.1 is not a symbolic link
I was supposed to get it, so I re-pasted the symbolic link in the work below.
$ sudo mv /usr/lib/nvidia-375/libEGL.so.1 /usr/lib/nvidia-375/libEGL.so.1.org
$ sudo mv /usr/lib32/nvidia-375/libEGL.so.1 /usr/lib32/nvidia-375/libEGL.so.1.org
$ sudo unlink /usr/lib/nvidia-375/libEGL.so
$ sudo ln -s /usr/lib/nvidia-375/libEGL.so.375.66 /usr/lib/nvidia-375/libEGL.so
$ sudo unlink /usr/lib32/nvidia-375/libEGL.so
$ sudo ln -s /usr/lib32/nvidia-375/libEGL.so.375.66 /usr/lib32/nvidia-375/libEGL.so
$ sudo ldconfig
I don't know if this is the case, but I can use it for the time being.
$ sudo vi /etc/default/grub #Edit line 12 for:
GRUB_CMDLINE_LINUX="systemd.unit=multi-user.target"
$ sudo update-grub
$ sudo reboot
Pick up the Open MPI download link from Open MPI Open Source High Performance Computing and do the following:
$ wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.1.tar.bz2
$ tar jxvf openmpi-2.1.1.tar.bz2
$ cd openmpi-2.1.1
$ ./configure --with-cuda
$ make -j4
$ sudo make install
$ cd
$ rm -rf openmpi-2.1.1 openmpi-2.1.1.tar.bz2
$ git clone https://github.com/NVIDIA/nccl.git
$ cd nccl
$ make CUDA_HOME=/usr/local/cuda-8.0
$ sudo mkdir /usr/local/nccl
$ sudo make PREFIX=/usr/local/nccl install
$ cd
$ rm -rf nccl
$ vi .bashrc #Added the following 4 lines
export NCCL_ROOT="/usr/local/nccl"
export CPATH="$NCCL_ROOT/include:$CPATH"
export LD_LIBRARY_PATH="$NCCL_ROOT/lib/:$LD_LIBRARY_PATH"
export LIBRARY_PATH="$NCCL_ROOT/lib/:$LIBRARY_PATH"
Log in again.
$ sudo apt-get install python3-pip
$ sudo pip3 install --upgrade pip
$ pip3 install --user pillow h5py chainer\==1.24.0
$ pip3 install --user cython
$ pip3 install --user chainermn
--Cause: Cython did not look at LD_LIBRARY_PATH and CPATH correctly
According to the reference Unofficial Tips for People Who Can't Install Chainer 1.5, LD_LIBRARY_PATH and CPATH must be set before installing Cython. .. Also note that if you pip with sudo, environment variables will not be inherited by root. Let's do it with --user.
Recommended Posts