The other day, I wrote an article "xgboost (python) on EC2 spot instance environment is prepared by AWS Lambda", but the Chainer version is just made it.

It would be nice if the environment could be built with a single button or cli, so in short, the content of the following article I wrote earlier was automated with Lambda.

Set up AWS EC2 g2.2xlarge with a spot instance and try running chainer http://qiita.com/pyr_revs/items/e1545e6f464b712517ed

What are you doing

Create an sh file to install Chainer and save it in S3
Request a Spot Instance for EC2. Install NVIDIA driver, CUDA, and various other dependencies in UserData
When the instance is launched, UserData will perform a dependency installation. Furthermore, download the sh file created in Step1 from s3 and take over the processing to ec2-user.
Chainer is installed by the sh file that runs with ec2-user privileges.
To check the operation, try running Chainer's mnist on the GPU.
Throw Notification to SNS when all is done

Where preparation / setting is required

IAM Role preparation, Lambda settings, things to play with in the code, etc. are almost the same as below.

xgboost (python) on EC2 Spot instance environment is prepared by AWS Lambda # Where preparation / setting is required http://qiita.com/pyr_revs/items/4cc188a63eb9313cd232#%E6%BA%96%E5%82%99%E8%A8%AD%E5%AE%9A%E3%81%8C%E5%BF%85%E8%A6%81%E3%81%AA%E3%81%A8%E3%81%93%E3%82%8D

Lambda Function

Due to the price of g2.2xlarge, EC2 is in northern Virginia (Availability Zone is us-east-1d which seems to be stable as of today). Other S3 / SNS / Lambda is assumed to be in the Tokyo region.

Since it is long, I also raised it to gist. https://gist.github.com/pyr-revs/31dba1c9aeff575f58b9

console.log('Launch-Chainer: Start');

var ec2Region = 'us-east-1';
var s3Region = 'ap-northeast-1';
var snsRegion = 'ap-northeast-1';

var s3Bucket = 'mybucket';
var shellScriptS3Key = 'sh/launch_chainer.sh';
var shellScriptS3Path = 's3://' + s3Bucket + '/' + shellScriptS3Key;

var cuDnnAS3Path = 's3://' + s3Bucket + '/cuda/cudnn-6.5-linux-x64-v2.tgz'; // optional

var availabilityZone = ec2Region + 'd';
var spotPrice = '0.2';
var imageId = 'ami-65116700'; // us-east-1, Amazon Linux 2015.09, HVM Instance Store 64 bit
//var imageId = 'ami-e3106686'; // us-east-1, Amazon Linux 2015.09, HVM（SSD）EBS-Backed 64 bit
//var imageId = 'ami-a22fb8a2'; // ap-northeast-1, Amazon Linux 2015.09, HVM Instance Store 64 bit
//var imageId = 'ami-9a2fb89a'; // ap-northeast-1, Amazon Linux 2015.09, HVM（SSD）EBS-Backed 64 bit

var instanceType = 'g2.2xlarge';
var iamInstanceProfile = 'my_ec2_role';
var securityGroup = 'launch-wizard-1';
var keyName = 'my_ssh_keypair';

var userData = (function () {/*#!/bin/bash
cd /root
# Update sudoers
tmp_sudoers=/root/sudoers_tmp
cat /etc/sudoers > $tmp_sudoers
cat >> $tmp_sudoers <<EOF
Defaults:ec2-user !requiretty
EOF
cat $tmp_sudoers > /etc/sudoers
# Install yum deps
yum update -y
yum groupinstall -y "Development tools"
yum -y install gcc-c++ python27-devel atlas-sse3-devel lapack-devel
yum install -y kernel-devel-`uname -r`
# Install NVIDIA Driver
wget -q http://us.download.nvidia.com/XFree86/Linux-x86_64/346.96/NVIDIA-Linux-x86_64-346.96.run
chmod +x NVIDIA-Linux-x86_64-346.96.run
./NVIDIA-Linux-x86_64-346.96.run -s > driver.log 2>&1
# Install CUDA (without driver installation... for Amazon Linux 2015.09)
wget -q http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_7.0.28_linux.run
chmod +x cuda_7.0.28_linux.run
./cuda_7.0.28_linux.run -extract=/root
./cuda-linux64-rel-7.0.28-19326674.run -noprompt > cuda.log 2>&1
# Install cuDNN (Optional)
#aws s3 cp %s ./
#tar zxvf cudnn-6.5-linux-x64-v2.tgz
#cd cudnn-6.5-linux-x64-v2
#cp lib* /usr/local/cuda/lib64/
#cp cudnn.h /usr/local/cuda/include/
# Install python deps
pip install numpy
pip install six
# Update .bashrc for ec2-user
tmp_bashrc=/home/ec2-user/.bashrc_backup
cat /home/ec2-user/.bashrc > $tmp_bashrc
cat >> $tmp_bashrc <<EOF
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
EOF
cat $tmp_bashrc > /home/ec2-user/.bashrc
# Launch post-installation script with ec2-user
aws s3 cp %s /home/ec2-user/launch_chainer.sh
chown ec2-user /home/ec2-user/launch_chainer.sh
chmod +x /home/ec2-user/launch_chainer.sh
su - ec2-user /home/ec2-user/launch_chainer.sh
*/}).toString().match(/[^]*\/\*([^]*)\*\/\}$/)[1];

var shellScriptContents = (function () {/*#!/bin/bash
cd /home/ec2-user
# Install Chainer
git clone https://github.com/pfnet/chainer
cd /home/ec2-user/chainer
sudo -s python setup.py install > setup.log 2>&1
# Run Chainer Sample with GPU
cd /home/ec2-user/chainer/examples/mnist
python train_mnist.py --gpu=0 > run.log 2>&1
# Send SNS Message
export AWS_DEFAULT_REGION=%s
aws sns publish --topic-arn arn:aws:sns:ap-northeast-1:xxxxxxxxxxxx:My-SNS-Topic --subject "Launch Chainer Done" --message "Launch Chainer Done!!"
*/}).toString().match(/[^]*\/\*([^]*)\*\/\}$/)[1];

exports.handler = function(event, context) {
    var util = require('util');
    var AWS = require('aws-sdk');
    
    // Write sh file for chainer launch to S3
    AWS.config.region = s3Region;
    var shellScriptContentsFormatted = util.format(shellScriptContents, snsRegion);
    var s3 = new AWS.S3();
    var s3Params = {Bucket: s3Bucket, Key: shellScriptS3Key, Body: shellScriptContentsFormatted};
    var s3Options = {partSize: 10 * 1024 * 1024, queueSize: 1};
    
    s3.upload(s3Params, s3Options, function(err, data) {
        if (err) {
            console.log(err, err.stack);
            context.fail('[Fail]');
        }
        else {
            console.log(data);
            
            // Lauch EC2 Spot Instance with UserData
            var userDataFormatted = util.format(userData, cuDnnAS3Path, shellScriptS3Path);
            var userDataBase64 = new Buffer(userDataFormatted).toString('base64');
    
            var ec2LaunchParams = {
                SpotPrice: spotPrice, 
                LaunchSpecification : {
                    IamInstanceProfile: {
                      Name: iamInstanceProfile
                    },
                    // EBS Setting (for ami-e3106686)
                    /*
                    BlockDeviceMappings : [
						{
                          DeviceName : '/dev/xvda',
                          Ebs : { VolumeSize : 16 }
                        },
					],
                    */
                    // Instance Storage Setting (for ami-65116700)
                    BlockDeviceMappings : [
						{
					      DeviceName  : '/dev/sdb',
					      VirtualName : 'ephemeral0'
					    }
					],
                    ImageId: imageId,
                    InstanceType: instanceType,
                    KeyName: keyName,
                    Placement: {
                      AvailabilityZone: availabilityZone
                    },
                    SecurityGroups: [
                        securityGroup
                    ],
                    UserData: userDataBase64
                }
            };
            
            AWS.config.region = ec2Region;
            var ec2 = new AWS.EC2();
            ec2.requestSpotInstances(ec2LaunchParams, function(err, data) {
                if (err) {
                    console.log(err, err.stack);
                    context.fail('[Fail]');
                }
                else {
                    console.log(data);
                    context.succeed('[Succeed]');
                }
            });
        }
    });
};

Addictive & changes from previous article

NVIDIA Driver Version for Amazon Linux 2015.09

In Amazon Linux 2015.09 released the other day, when I try to extract and install the NVIDIA Driver included in CUDA, I get the following curse message.

ERROR: Unable to build the NVIDIA kernel module

It seems that the kernel version of the OS went up and the kernel-devel and some driver did not match and died.

NVIDIA driver download http://www.nvidia.co.jp/Download/Find.aspx?lang=jp

Among the drivers that came out with this, "346.96 / 1.9.2015" was close to the previous version number. I put it in and it worked, so it's a mess, but I think that the same problem will occur in the future.

At worst, put NVIDIA AMI on demand (not spot instance), see the driver version with nvidia-smi, or see the cuda version. Or you may need to check the status of kernel-devel.

Disk settings

Before, to be honest, I didn't really understand what EBS was and what instance storage was, so I sloppyly raised EBS to 16GB to avoid running out of tmp, but "HVM ** Instance Store " I've noticed that I don't need EBS if I use " 64 bit" ami. Currently, the OS is installed directly in the basic instance storage. With the SSD 60GB that comes with the g2.2xlarge, I'm not dissatisfied for the moment.

Support for Chainer 1.3 series

Now that I don't have to put in pycuda, the setup is much easier. I think that the operation check of cuDNN is OK if python is started in interactive mode and true is returned below.

from chainer import cuda
print cuda.cudnn_enabled