To build a CNN in Theano, I did some research on Theano's two-dimensional convolution function `theano.tensor.nnet.conv ()`
. We compared it with the N-dimensional convolution function `scipy.signal.fftconvolve ()`
, which is probably commonly used in signal processing.
First, let's convolve between simple two-dimensional arrays.
import theano
import theano.tensor as T
import theano.tensor.signal as signal
import scipy.signal as s
m = T.matrix()
w = T.matrix()
#Must be rank 4.
o_full = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
border_mode='full')
o_valid = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
border_mode='valid')
m_arr = arange(25.).reshape((5,5)).astype(float32)
w_arr = ones((3,3)).astype(float32)
print("m_arr =")
print(m_arr)
print("w_arr =")
print(w_arr)
print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int))
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int))
print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int))
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int))
Folded arraym_arr
Window function that convolves with(or kernel or filter)w_arr
Totheano.tensor.nnet.conv.conv2d()When
scipy.signal.fftconvolve()```
It is flowing to each. here,
# Must be rank 4.
o_full = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
border_mode='full')
o_valid = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
border_mode='valid')
like,m[None,None,:,:], w[None, None,:,:]
The format of the input and kernel array is[Number of images, number of channels, height, width]
Because it is.m,w
Is rank 2T.matrix()
Because I defined it as[None, None,:,:]
By doing like, we have increased the top rank by two.This broadcast is the same as that of NumpySo it's very easy to use personally.
The output looks like this:
m_arr =
[[ 0. 1. 2. 3. 4.]
[ 5. 6. 7. 8. 9.]
[ 10. 11. 12. 13. 14.]
[ 15. 16. 17. 18. 19.]
[ 20. 21. 22. 23. 24.]]
w_arr =
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
Output for Theano.
full:
[[[[ 0 1 3 6 9 7 4]
[ 5 12 21 27 33 24 13]
[ 15 33 54 63 72 51 27]
[ 30 63 99 108 117 81 42]
[ 45 93 144 153 162 111 57]
[ 35 72 111 117 123 84 43]
[ 20 41 63 66 69 47 24]]]]
valid:
[[[[ 54 63 72]
[ 99 108 117]
[144 153 162]]]]
Output for scipy.
full:
[[ 0 1 3 6 9 7 4]
[ 5 12 21 27 33 24 13]
[ 15 33 54 63 72 51 27]
[ 30 63 99 108 117 81 42]
[ 45 93 144 153 162 111 57]
[ 35 72 111 117 123 84 43]
[ 20 41 63 66 69 47 24]]
valid:
[[ 54 63 72]
[ 99 108 117]
[144 153 162]]
The output of the convolution is rounded off for easy viewing and then converted to an int.
theano.tensor.nnet.conv()Then border_There was an argument called mode. You can select full or valid for this. Convolution takes the sum by multiplying the image while moving the filter, but full is a mode that includes the result of the state where at least one of the elements overlaps the image even if the filter protrudes from the image, valid is This mode outputs only the result when the filter does not extend beyond the image. The image of a certain axis and the size of the filter are different$M,m$At the time, the size of the axis with the output array is full$M+(m-1)$, Valid$M-(m-1)$Will be. Height in the above example(or width)But$M=5,m=3$Therefore, it is 7 when it is full and 3 when it is valid.
When you check the output,```theano.tensor.nnet.conv()```When```scipy.signal.fftconvolve()```so(Except for the rank of the array)等しいこWhenが確認soきます。
However, the two outputs have different meanings.```scipy.signal.fftconvolve()```Returns the result of a pure N-dimensional convolution, whereas```theano.tensor.nnet.conv()```Returns the result of convolution for each number of images and each filter. The output array is```[Number of images, number of files, height, width]```is. Also, as will be described later```theano.tensor.nnet.conv()```Must have the same number of channels for the image and the filter.
#Convolution when the dimensions of the number of images and the number of channels are added
Next, we will perform a convolution that adds the dimensions of the number of images and the number of channels. Convolve a 3x3 filter with 1 image and 3 channels for a 5x5 image with 2 images and 3 channels. The program looks like this:
```python
m = T.tensor4()
w = T.tensor4()
# Must be rank 4.
o_full = nnet.conv.conv2d(m, w,
border_mode='full')
o_valid = nnet.conv.conv2d(m, w,
border_mode='valid')
m_arr = arange(2*3*5*5).reshape((2, 3, 5, 5)).astype(float32)
w_arr = ones((1,3,3,3)).astype(float32)
print("m_arr =")
print(m_arr)
print("w_arr =")
print(w_arr)
print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int))
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int))
print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int))
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int))
To set a rank 4 tensorm
Whenw
ToT.tensor4()
Is set.
m_arr =
[[[[ 0. 1. 2. 3. 4.]
[ 5. 6. 7. 8. 9.]
[ 10. 11. 12. 13. 14.]
[ 15. 16. 17. 18. 19.]
[ 20. 21. 22. 23. 24.]]
[[ 25. 26. 27. 28. 29.]
[ 30. 31. 32. 33. 34.]
[ 35. 36. 37. 38. 39.]
[ 40. 41. 42. 43. 44.]
[ 45. 46. 47. 48. 49.]]
[[ 50. 51. 52. 53. 54.]
[ 55. 56. 57. 58. 59.]
[ 60. 61. 62. 63. 64.]
[ 65. 66. 67. 68. 69.]
[ 70. 71. 72. 73. 74.]]]
[[[ 75. 76. 77. 78. 79.]
[ 80. 81. 82. 83. 84.]
[ 85. 86. 87. 88. 89.]
[ 90. 91. 92. 93. 94.]
[ 95. 96. 97. 98. 99.]]
[[ 100. 101. 102. 103. 104.]
[ 105. 106. 107. 108. 109.]
[ 110. 111. 112. 113. 114.]
[ 115. 116. 117. 118. 119.]
[ 120. 121. 122. 123. 124.]]
[[ 125. 126. 127. 128. 129.]
[ 130. 131. 132. 133. 134.]
[ 135. 136. 137. 138. 139.]
[ 140. 141. 142. 143. 144.]
[ 145. 146. 147. 148. 149.]]]]
w_arr =
[[[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]]]
Output for Theano.
full:
[[[[ 75 153 234 243 252 171 87]
[ 165 336 513 531 549 372 189]
[ 270 549 837 864 891 603 306]
[ 315 639 972 999 1026 693 351]
[ 360 729 1107 1134 1161 783 396]
[ 255 516 783 801 819 552 279]
[ 135 273 414 423 432 291 147]]]
[[[ 300 603 909 918 927 621 312]
[ 615 1236 1863 1881 1899 1272 639]
[ 945 1899 2862 2889 2916 1953 981]
[ 990 1989 2997 3024 3051 2043 1026]
[1035 2079 3132 3159 3186 2133 1071]
[ 705 1416 2133 2151 2169 1452 729]
[ 360 723 1089 1098 1107 741 372]]]]
valid:
[[[[ 837 864 891]
[ 972 999 1026]
[1107 1134 1161]]]
[[[2862 2889 2916]
[2997 3024 3051]
[3132 3159 3186]]]]
Output for scipy.
full:
[[[[ 0 1 3 6 9 7 4]
[ 5 12 21 27 33 24 13]
[ 15 33 54 63 72 51 27]
[ 30 63 99 108 117 81 42]
[ 45 93 144 153 162 111 57]
[ 35 72 111 117 123 84 43]
[ 20 41 63 66 69 47 24]]
[[ 25 52 81 87 93 64 33]
[ 60 124 192 204 216 148 76]
[ 105 216 333 351 369 252 129]
[ 135 276 423 441 459 312 159]
[ 165 336 513 531 549 372 189]
[ 120 244 372 384 396 268 136]
[ 65 132 201 207 213 144 73]]
[[ 75 153 234 243 252 171 87]
[ 165 336 513 531 549 372 189]
[ 270 549 837 864 891 603 306]
[ 315 639 972 999 1026 693 351]
[ 360 729 1107 1134 1161 783 396]
[ 255 516 783 801 819 552 279]
[ 135 273 414 423 432 291 147]]
[[ 75 152 231 237 243 164 83]
[ 160 324 492 504 516 348 176]
[ 255 516 783 801 819 552 279]
[ 285 576 873 891 909 612 309]
[ 315 636 963 981 999 672 339]
[ 220 444 672 684 696 468 236]
[ 115 232 351 357 363 244 123]]
[[ 50 101 153 156 159 107 54]
[ 105 212 321 327 333 224 113]
[ 165 333 504 513 522 351 177]
[ 180 363 549 558 567 381 192]
[ 195 393 594 603 612 411 207]
[ 135 272 411 417 423 284 143]
[ 70 141 213 216 219 147 74]]]
[[[ 75 151 228 231 234 157 79]
[ 155 312 471 477 483 324 163]
[ 240 483 729 738 747 501 252]
[ 255 513 774 783 792 531 267]
[ 270 543 819 828 837 561 282]
[ 185 372 561 567 573 384 193]
[ 95 191 288 291 294 197 99]]
[[ 175 352 531 537 543 364 183]
[ 360 724 1092 1104 1116 748 376]
[ 555 1116 1683 1701 1719 1152 579]
[ 585 1176 1773 1791 1809 1212 609]
[ 615 1236 1863 1881 1899 1272 639]
[ 420 844 1272 1284 1296 868 436]
[ 215 432 651 657 663 444 223]]
[[ 300 603 909 918 927 621 312]
[ 615 1236 1863 1881 1899 1272 639]
[ 945 1899 2862 2889 2916 1953 981]
[ 990 1989 2997 3024 3051 2043 1026]
[1035 2079 3132 3159 3186 2133 1071]
[ 705 1416 2133 2151 2169 1452 729]
[ 360 723 1089 1098 1107 741 372]]
[[ 225 452 681 687 693 464 233]
[ 460 924 1392 1404 1416 948 476]
[ 705 1416 2133 2151 2169 1452 729]
[ 735 1476 2223 2241 2259 1512 759]
[ 765 1536 2313 2331 2349 1572 789]
[ 520 1044 1572 1584 1596 1068 536]
[ 265 532 801 807 813 544 273]]
[[ 125 251 378 381 384 257 129]
[ 255 512 771 777 783 524 263]
[ 390 783 1179 1188 1197 801 402]
[ 405 813 1224 1233 1242 831 417]
[ 420 843 1269 1278 1287 861 432]
[ 285 572 861 867 873 584 293]
[ 145 291 438 441 444 297 149]]]]
valid:
[[[[ 837 864 891]
[ 972 999 1026]
[1107 1134 1161]]]
[[[2862 2889 2916]
[2997 3024 3051]
[3132 3159 3186]]]]
It's long and difficult to compare,valid
Is the same, butfull
The results are different between the two. So, let's look at the shape of the array after output.
print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int).shape)
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int).shape)
print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int).shape)
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int).shape)
Output for Theano.
full:
(2, 1, 7, 7)
valid:
(2, 1, 3, 3)
Output for scipy.
full:
(2, 5, 7, 7)
valid:
(2, 1, 3, 3)
this is,scipy.signal.fftconvolve()
Performs the convolution operation on the axes of the number of images and the number of channels.theano.tensor.nnet.conv()
This is because the image width and height dimensions are used only, and the number of images and the number of channels are processed independently. Andtheano.tensor.nnet.conv()
The output of[Number of images, number of filters, height, width]
So the second shape is 1. Also,theano.tensor.nnet.conv()
As mentioned above, it is necessary to match the number of channels with the image and the filter. For example
m_arr = arange(2*3*5*5).reshape((2, 3, 5, 5)).astype(float32)
w_arr = ones((1,1,3,3)).astype(float32)
When the number of channels of the image is 3 and the number of channels of the filter is 1, as intheano.tensor.nnet.conv()
Will output the following error.
ValueError: GpuDnnConv images and kernel must have the same stack size
However,scipy.signal.fftconvolve()
Then the shape of the array is
Output for scipy.
full:
(2, 3, 7, 7)
valid:
(2, 3, 3, 3)
Returns the result of.full
Thenvalid
Then
Finally, try with 2 images, 3 filters, and 1 channel. We also reduced the number of elements in the array.
m = T.tensor4()
w = T.tensor4()
# Must be rank 4.
o_full = nnet.conv.conv2d(m, w,
border_mode='full')
o_valid = nnet.conv.conv2d(m, w,
border_mode='valid')
m_arr = arange(2*1*3*3).reshape((2, 1, 3, 3)).astype(float32)
w_arr = ones((3,1,1,1)).astype(float32)
print("m_arr =")
print(m_arr)
print("w_arr =")
print(w_arr)
print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int))
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int))
print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int))
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int))
m_arr =
[[[[ 0. 1. 2.]
[ 3. 4. 5.]
[ 6. 7. 8.]]]
[[[ 9. 10. 11.]
[ 12. 13. 14.]
[ 15. 16. 17.]]]]
w_arr =
[[[[ 1.]]]
[[[ 1.]]]
[[[ 1.]]]]
Output for Theano.
full:
[[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]]
[[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]]]
valid:
[[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]]
[[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]]]
Output for scipy.
full:
[[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]]
[[[ 9 11 13]
[15 17 19]
[21 23 25]]]
[[[ 9 11 13]
[15 17 19]
[21 23 25]]]
[[[ 9 10 11]
[12 13 14]
[15 16 17]]]]
valid:
ValueError: For 'valid' mode, one must be at least as large as the other in every dimension
scipy.signal.fftconvolve()ofvalid
Has resulted in an error.valid
of場合、画像とフィルタofいずれかが片方よりもすべてof次元で大きくないといけないようです。
The shape of the array is as follows.
Output for Theano.
full:
(2, 3, 3, 3)
valid:
(2, 3, 3, 3)
Output for scipy.
full:
(4, 1, 3, 3)
valid:
theano.tensor.nnet.conv()
Is shape 1,The second is the number of images and the number of filters, respectively, and the restfull
Thenvalid
Thenscipy.signal.fftconvolve()
Is for all axesfull
Then
#stride