TensorFlow is a deep learning system developed by Google and is released under the Apache 2.0 license. It supports GPU, C ++, and Python.
"I used the GPU to move it crunchy!"
Since there are many posts such as, I dared to install TensorFlow (Python version, Anaconda) on AWS EC2 t2.micro (for free usage tier) and execute it. I hope it helps you understand the CPU credits for your AWS EC2 T2 instance.
LeNet-5, MNIST This time, the sample program tensorflow/models/image/mnist/convolutional.py To work. The model LeNet-5 looks like this. (Source)
MNIST is a dataset of handwritten numbers 0-9. (MNIST DATABASE) There are 60,000 training data and 10,000 evaluation data.
Both LeNet-5 and MNIST are published in this paper.
[LeCun et al., 1998] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, november 1998
Using this TensorFlow sample (convolutional.py), we can achieve a correct answer rate of about 99.2%.
** Anaconda Download **
$ mkdir tensorflow
$ cd tensorflow
$ curl https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh -o Anaconda3-4.2.0-Linux-x86_64.sh
** Anaconda installation ** You will be asked to accept the license, install path, and set PATH in .bashrc.
$ bash Anaconda3-4.2.0-Linux-x86_64.sh
>>> (ENTER)
>>> yes
[/home/ubuntu/anaconda3] >>> (ENTER)
PATH in your /home/ubuntu/.bashrc ? [yes|no]
[no] >>> yes
After installation, log out and log in to recognize your PATH. Check the operation of Anaconda.
```
$ conda -V
conda 4.2.9
```
** TensorFlow installation ** Create a tensorflow environment in Anaconda.
$ conda create -n tensorflow python=3.5
Proceed ([y]/n)? y
Install TensorFlow in Anaconda's tensorflow environment.
```
$ source activate tensorflow
(tensorflow)$ conda install -c conda-forge tensorflow
The following NEW packages will be INSTALLED:
mkl: 11.3.3-0
mock: 2.0.0-py35_0 conda-forge
numpy: 1.11.2-py35_0
pbr: 1.10.0-py35_0 conda-forge
protobuf: 3.0.0b2-py35_0 conda-forge
six: 1.10.0-py35_0 conda-forge
tensorflow: 0.10.0-py35_0 conda-forge
Proceed ([y]/n)? y
```
By the way, here is the command to terminate Anaconda's TensorFlow environment.
```
(tensorflow)$ source deactivate
```
Click here for the command to start Anaconda's TensorFlow environment from the next time.
```
(tensorflow)$ source activate tensorflow
```
If you do not switch to the TensorFlow environment, the following error will occur when executing TensorFlow.
```
>>> import tensorflow as tf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'tensorflow'
```
Check the directory where TensorFlow is installed.
```
(tensorflow)$ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow
```
** TensorFlow operation check ** Let's calculate with Hellow TensorFlow! And TensorFlow.
$ source activate tensorflow
(tensorflow)$ python
...
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print(sess.run(a + b))
42
** TMNIST operation check **
(tensorflow)$ python /home/ubuntu/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py --self_test
Running self-test.
Initialized!
Step 0 (epoch 0.00), 6.7 ms
Minibatch loss: 9.772, learning rate: 0.010000
Minibatch error: 92.2%
Validation error: 0.0%
Test error: 0.0%
test_error 0.0
Now you are ready to go.
Let's run it.
(tensorflow) $ python /home/ubuntu/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Initialized!
Step 0 (epoch 0.00), 6.8 ms
Minibatch loss: 12.053, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 432.6 ms
Minibatch loss: 3.276, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.2%
Step 200 (epoch 0.23), 435.2 ms
Minibatch loss: 3.457, learning rate: 0.010000
Minibatch error: 14.1%
Validation error: 3.9%
Step 300 (epoch 0.35), 430.3 ms
Minibatch loss: 3.204, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 3.1%
Step 400 (epoch 0.47), 431.9 ms
Minibatch loss: 3.211, learning rate: 0.010000
Minibatch error: 9.4%
Validation error: 2.5%
Learning will proceed smoothly. The status is displayed every 100 steps.
item | meaning |
---|---|
Step | Number of learning |
epoch | Number of times using all training data |
ms | Average time taken for one study |
Minibatch loss | Number of training data thinned out |
learning rate | Parameters of how carefully you proceed with learning |
Minibatch error | Error rate of training data |
Validation error | Validation data error rate |
Ultimately, the goal is to reduce the validation error.
The parameters of the program are as follows.
item | Setting |
---|---|
Exit conditions | epoch>10 |
Batch size | 64 |
Activation function | ReLU |
Learning will proceed smoothly for a while, but the learning speed will slow down to 1/10 on the way.
Step 5000 (epoch 5.82), 434.0 ms
Step 5100 (epoch 5.93), 431.1 ms
Step 5200 (epoch 6.05), 430.0 ms
Step 5300 (epoch 6.17), 434.3 ms
Step 5400 (epoch 6.28), 533.1 ms
Step 5500 (epoch 6.40), 581.7 ms
Step 5600 (epoch 6.52), 581.4 ms
Step 5700 (epoch 6.63), 580.6 ms
Step 5800 (epoch 6.75), 582.4 ms
Step 5900 (epoch 6.87), 785.4 ms
Step 6000 (epoch 6.98), 975.2 ms
Step 6100 (epoch 7.10), 969.0 ms
Step 6200 (epoch 7.21), 2485.7 ms
Step 6300 (epoch 7.33), 4477.5 ms
Step 6400 (epoch 7.45), 4492.2 ms
Step 6500 (epoch 7.56), 3791.0 ms
Step 6600 (epoch 7.68), 4414.7 ms
Step 6700 (epoch 7.80), 4485.0 ms
Step 6800 (epoch 7.91), 4259.3 ms
Step 6900 (epoch 8.03), 3942.3 ms
Looking at the monitoring, the CPU usage was 100% at the beginning, but it is limited to 10% in the middle.
There is plenty of memory on the t2.micro instance.
$ free -h
total used free shared buff/cache available
Mem: 990M 374M 372M 4.4M 243M 574M
Swap: 0B 0B 0B
If you check the CPU credit balance, you can see that the 30 counts that were initially charged have been used up.
** CPU credit balance **
** CPU credit usage **
So what does the CPU credit value mean? In fact, this value represents a savings in how many minutes you can use your CPU 100%. So, while the CPU credits are being charged, if you look at the graph of CPU credit usage, you are consuming 5 counts every 5 minutes.
The CPU used in t2.micro is "Intel (R) Xeon (R) CPU E5-2676 v3 @ 2.40GHz".
$ cat /proc/cpuinfo | grep "model name"
model name : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
And t2.micro has the right to take advantage of 10% of CPU performance. As for how 10% is allocated, 6 counts of CPU credits are allocated per hour. Since 1 count gives you the right to use 100% of the CPU for 1 minute, 6 counts in 1 hour (60 minutes) is exactly 10%.
You can save up to 24 hours of CPU credit balance. With t2.micro, you can store up to 144 counts (6 counts / hour x 24 hours). Immediately after creating the t2.micro instance, about 30 counts are allocated. If you stop the instance, the CPU credit will be cleared to 0.
It takes about 60 minutes to run TensorFlow's LeNet-5 MINIST on t2.micro and continue to use 100% of the CPU. If you create an instance and run TensorFlow's LeNet-5 MINIST as it is, you will be out of CPU credits for 30 counts. In t2.micro, which has been stopped and the CPU credits have been cleared to 0, 60 counts of CPU credits are insufficient.
Immediately after creating an instance, it takes about 6 hours to wait 5 hours for 30 counts to accumulate even if 30 counts are insufficient, or to keep running slowly with 30 counts insufficient. There is no difference in the end time and instance startup time.
Considering the CPU credit restrictions, let's think about how to use the CPU efficiently.
** Create a new instance ** If you just want to earn CPU credits, you can back up to an AMI and recreate an instance based on the AMI every time, and you'll get 30 counts at the start.
** Use slowdown time (15 minutes) ** As a way to earn CPU counts without changing the program too much, you can use slowing down over 15 minutes. If the CPU credit becomes 0 when the CPU usage rate is 100%, the CPU usage rate will be limited to 10% over 15 minutes. When the CPU credit becomes 0, if the program is stopped once until the CPU credit is given to some extent, the CPU performance of 10% or more can be used for 15 minutes with the CPU credit being 0. So, the calculation will proceed a little. However, this method has a back side. First of all, if the period of 100% CPU utilization is short, it will slow down from 100% to 10% in a short time. Also, the CPU usage rate does not suddenly go from 0% to 100% even with CPU credits, and it takes about 6 minutes to go from 0% to 100%. The CPU utilization limit takes 15 minutes to go from a steady 100% to 10%, but it only takes 5 minutes to go from an instantaneous 40% to 10%. In order for the CPU utilization to reach 100%, it is necessary to accumulate CPU credits for 6 minutes, and it is necessary to stop the calculation for 60 minutes. If you continue the calculation at low speed for 60 minutes, the calculation for 6 minutes will proceed in terms of 100% CPU, but if you stop the CPU, the calculation for this amount will not proceed. The method of repeating the stop and start of the calculation is "** not very effective **" because the slowdown speed of the CPU utilization is fast and it takes a certain amount of time to start the CPU utilization. It seems. Try adding the following sleep processing.
start_time = time.time()
...
cpu_start_time = time.time() #★ Start time
for step in xrange(int(num_epochs * train_size) // BATCH_SIZE):
...
if step % EVAL_FREQUENCY == 0:
elapsed_time = time.time() - start_time
avrg_time = 1000 * elapsed_time / EVAL_FREQUENCY
...
#★ Sleep for 50 minutes when the average calculation time exceeds 3000 msec
if avrg_time > 3000:
print("sleep t2.micro cpu. passed time=%d" % (time.time() - cpu_start_time))
time.sleep(3000)
print("run t2.micro cpu. passed time=%d" % (time.time() - cpu_start_time))
start_time = time.time()
The graph looks like this. Instead of calculating 10% at a time, 100% is calculated instantaneously.
** CPU usage **
** CPU credit usage **
** CPU credit balance **
AlexNet MNIST Next, run the AlexNet MNIST benchmark on EC2 t2.micro (for free tier) and compare it to the GPU. The program is as follows under the installation directory of tensorflow.
tensorflow/models/image/alexnet/alexnet_benchmark.py
According to the program's comments, GPUs seem to have this kind of performance.
Forward pass:
Run on Tesla K40c: 145 +/- 1.5 ms / batch
Run on Titan X: 70 +/- 0.1 ms / batch
Forward-backward pass:
Run on Tesla K40c: 480 +/- 48 ms / batch
Run on Titan X: 244 +/- 30 ms / batch
On the other hand, the console log that ran EC2 t2.micro looks like this. Even if EC2 t2.micro is running on 100% CPU, there is a difference of about 100 times between GPU and EC2 t2.micro.
conv1 [128, 56, 56, 64]
pool1 [128, 27, 27, 64]
conv2 [128, 27, 27, 192]
pool2 [128, 13, 13, 192]
conv3 [128, 13, 13, 384]
conv4 [128, 13, 13, 256]
conv5 [128, 13, 13, 256]
pool5 [128, 6, 6, 256]
2016-10-24 16:18:03.743222: step 10, duration = 9.735
2016-10-24 16:19:40.927811: step 20, duration = 9.675
2016-10-24 16:21:17.593104: step 30, duration = 9.664
2016-10-24 16:22:53.894240: step 40, duration = 9.684
2016-10-24 16:24:29.968737: step 50, duration = 9.597
2016-10-24 16:26:06.527066: step 60, duration = 9.686
2016-10-24 16:27:43.229298: step 70, duration = 9.689
2016-10-24 16:29:19.643403: step 80, duration = 9.679
2016-10-24 16:30:56.202710: step 90, duration = 9.588
2016-10-24 16:32:22.877673: Forward across 100 steps, 9.553 +/- 0.962 sec / batch
2016-10-24 16:42:27.229588: step 10, duration = 28.700
2016-10-24 16:49:33.216683: step 20, duration = 72.885
...
Then, around step 20 of Forward-backward, CPU credit 30 is used up. After that, it gradually slows down, it operates at 10% CPU, and the calculation continues endlessly at 1/1000 speed of GPU. Hmmm. It won't be a match at all.
Let's change the viewpoint a little.
How can I get a sense of accomplishment by running TensorFlow's LeNet-5 MNIST (convolutional.py) on AWS EC2 t2.micro (free tier)?
Immediately after creating an instance, there are 30 CPU credits, so let's set it so that the calculation will be completed in 30 to 40 minutes.
Modified convolutional.py
NUM_EPOCHS = 6
That's right. There is only one change. It is OK if you set the number of learning to be within the CPU credit.
Let's have a fun Deep Learning life with TensorFlow! !!