Introduction

This is my first post. I've been looking for something like a blog as a memorandum of development for a long time, but when I met Qiita, I was thinking of posting it someday. So, this time, I hit a wall that I couldn't solve and decided to post it.

Problem (Unsolved as of 12/3) (Resolved? As of 12/7)

It's been almost a month since Google released TensorFlow. I recently started studying deep learning, so I jumped at TensorFlow.

TensorFlow has already been summarized in various places, and the tutorial went smoothly. (I'm thinking of putting it together at a later date)

Then, I wrote and executed a program to recognize the image data. Then the following error occurs

$ python test.py 
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 12
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:05:00.0
Total memory: 11.99GiB
Free memory: 11.47GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 11701021287
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 12
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 256 (256B) Pool: chunks: 64 free: 24 cumulative malloc: 134728 cumulative freed: 134688
Number of chunks: 64, in_use chunks: 40
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 4096 (4.0KiB) Pool: chunks: 8 free: 2 cumulative malloc: 2812 cumulative freed: 2806
Number of chunks: 8, in_use chunks: 6
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 8192 (8.0KiB) Pool: chunks: 8 free: 3 cumulative malloc: 2814 cumulative freed: 2809
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 16384 (16.0KiB) Pool: chunks: 8 free: 3 cumulative malloc: 11233 cumulative freed: 11228
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 65536 (64.0KiB) Pool: chunks: 16 free: 16 cumulative malloc: 44896 cumulative freed: 44896
Number of chunks: 16, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 98304 (96.0KiB) Pool: chunks: 8 free: 8 cumulative malloc: 11224 cumulative freed: 11224
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 131072 (128.0KiB) Pool: chunks: 4 free: 4 cumulative malloc: 14030 cumulative freed: 14030
Number of chunks: 4, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 212992 (208.0KiB) Pool: chunks: 8 free: 3 cumulative malloc: 11232 cumulative freed: 11227
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 229376 (224.0KiB) Pool: chunks: 2 free: 1 cumulative malloc: 2 cumulative freed: 1
Number of chunks: 2, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 262144 (256.0KiB) Pool: chunks: 8 free: 8 cumulative malloc: 16836 cumulative freed: 16836
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 425984 (416.0KiB) Pool: chunks: 1 free: 1 cumulative malloc: 2806 cumulative freed: 2806
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 524288 (512.0KiB) Pool: chunks: 8 free: 8 cumulative malloc: 25254 cumulative freed: 25254
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 1048576 (1.00MiB) Pool: chunks: 8 free: 8 cumulative malloc: 25254 cumulative freed: 25254
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 13631488 (13.00MiB) Pool: chunks: 8 free: 3 cumulative malloc: 2814 cumulative freed: 2809
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 268435456 (256.00MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 369098752 (352.00MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 738197504 (704.00MiB) Pool: chunks: 1 free: 0 cumulative malloc: 1 cumulative freed: 0
Number of chunks: 1, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 1476395008 (1.38GiB) Pool: chunks: 0 free: 0 cumulative malloc: 0 cumulative freed: 0
Number of chunks: 0, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 2952790016 (2.75GiB) Pool: chunks: 3 free: 3 cumulative malloc: 3 cumulative freed: 3
Number of chunks: 3, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:345] Aggregate Region Memory: 11701021287 (10.90GiB)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:347] Aggregate Chunk Memory: 10363027456 (9.65GiB)
W tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:89] Out of GPU memory, see memory state dump above
W tensorflow/core/kernels/conv_ops.cc:162] Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x10426540 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x127a7090 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
	 [[Node: range_1/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_394_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x127a7090 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
	 [[Node: Cast/_13 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_393_Cast", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "img_ditect_train.py", line 229, in <module>
    keep_prob: 1.0})
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 345, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 419, in _do_run
    e.code)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
	 [[Node: range_1/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_394_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'conv2/Conv2D', defined at:
  File "test.py", line 196, in <module>
    logits = inference(images_placeholder, keep_prob)
  File "test.py", line 70, in inference
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
  File "test.py", line 46, in conv2d
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 207, in conv2d
    use_cudnn_on_gpu=use_cudnn_on_gpu, name=name)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
    op_def=op_def)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 988, in __init__
    self._traceback = _extract_stack()

I'm currently investigating the cause, but the GPU memory is insufficient? I don't know ... I would like to ask someone to teach ...

Postscript (12/7)

You can see things calmly when you set the date. Regarding the above error, for the time being, the program passed without throwing an error, so I will add it. For the time being.

From MATS ** ”shapedim {size: 28060} dim {size: 14} dim {size: 14} dim {size: 64} Why don't you try reducing the size?” ** As you pointed out, I focused on the part {size: 28060}.

At the stage of last week, what was 28060 ... and I was in a vicious circle, but I think it's the number of images I'm giving calmly. I noticed. (Half number) Well, why didn't you notice it? By the way, the program is an image recognition program. I will post it when it is completed.

So, when I reduced the number of images to be learned to about 1000, the program passed. The recognition accuracy is not bad either.

However, my perception is that if there are not many images to give, high accuracy will not be obtained, so I would like to solve it programmatically.

Then, I will add it as soon as there is an update.

I stumbled on TensorFlow (What is Out of GPU Memory)

Introduction

Problem (Unsolved as of 12/3) (Resolved? As of 12/7)

Postscript (12/7)