--The method "** Score-CAM " of Paper submitted to arxiv on October 3, 2019 has been implemented in keras. - Compared with Grad-CAM and Grad-CAM ++ **. --Implemented ** Faster Score-CAM **, which has been speeded up independently. -Applied to the model trained with ** DAGM dataset **. --The code can be found on github.
――One of the visualization methods of CNN's judgment basis --There is a wonderful summary on here for understanding the basis of judgment. --Previous research includes Grad-CAM and Grad-CAM ++. --Noise, improved stability, ** no longer dependent on gradient calculation **
import cv2
import numpy as np
from keras.models import Model
def ScoreCam(model, img_array, layer_name, max_N=-1):
cls = np.argmax(model.predict(img_array))
act_map_array = Model(inputs=model.input, outputs=model.get_layer(layer_name).output).predict(img_array)
# extract effective maps
if max_N != -1:
act_map_std_list = [np.std(act_map_array[0,:,:,k]) for k in range(act_map_array.shape[3])]
unsorted_max_indices = np.argpartition(-np.array(act_map_std_list), max_N)[:max_N]
max_N_indices = unsorted_max_indices[np.argsort(-np.array(act_map_std_list)[unsorted_max_indices])]
act_map_array = act_map_array[:,:,:,max_N_indices]
input_shape = model.layers[0].output_shape[1:] # get input shape
# 1. upsampled to original input size
act_map_resized_list = [cv2.resize(act_map_array[0,:,:,k], input_shape[:2], interpolation=cv2.INTER_LINEAR) for k in range(act_map_array.shape[3])]
# 2. normalize the raw activation value in each activation map into [0, 1]
act_map_normalized_list = []
for act_map_resized in act_map_resized_list:
if np.max(act_map_resized) - np.min(act_map_resized) != 0:
act_map_normalized = act_map_resized / (np.max(act_map_resized) - np.min(act_map_resized))
else:
act_map_normalized = act_map_resized
act_map_normalized_list.append(act_map_normalized)
# 3. project highlighted area in the activation map to original input space by multiplying the normalized activation map
masked_input_list = []
for act_map_normalized in act_map_normalized_list:
masked_input = np.copy(img_array)
for k in range(3):
masked_input[0,:,:,k] *= act_map_normalized
masked_input_list.append(masked_input)
masked_input_array = np.concatenate(masked_input_list, axis=0)
# 4. feed masked inputs into CNN model and softmax
pred_from_masked_input_array = softmax(model.predict(masked_input_array))
# 5. define weight as the score of target class
weights = pred_from_masked_input_array[:,cls]
# 6. get final class discriminative localization map as linear weighted combination of all activation maps
cam = np.dot(act_map_array[0,:,:,:], weights)
cam = np.maximum(0, cam) # Passing through ReLU
cam /= np.max(cam) # scale 0 to 1.0
return cam
def softmax(x):
f = np.exp(x)/np.sum(np.exp(x), axis = 1, keepdims = True)
return f
Let's apply it to VGG16 that has been trained with ImageNet.
from keras.preprocessing.image import load_img
import matplotlib.pyplot as plt
orig_img = np.array(load_img('./image/hummingbird.jpg'),dtype=np.uint8)
plt.imshow(orig_img)
plt.show()
from keras.applications.vgg16 import VGG16
from gradcamutils import read_and_preprocess_img
import matplotlib.pyplot as plt
model = VGG16(include_top=True, weights='imagenet')
layer_name = 'block5_conv3'
img_array = read_and_preprocess_img('./image/hummingbird.jpg', size=(224,224))
score_cam = ScoreCam(model,img_array,layer_name)
plt.imshow(score_cam)
plt.show()
--model: model instance of keras
--img_array: Pre-processed data of the image for which you want to determine the gaze area. It should be in a form that can execute predict
immediately, such asmodel.predict (img_array)
.
--layer_name: The name of the activation layer immediately after the final convolution layer. If the activation layer is included in the convolution layer, the name of the convolution layer may be used. You can check the layer name with model.summary ()
.
--max_N: Setting value for speeding up that I implemented without permission. If -1
, the original Score-CAM. Specifying a natural number reduces the number of CNN inferences to that number. The recommended value is about 10. A large value will only increase the processing time and will not have much effect on the heatmap, but if it is too small, the heatmap will become strange.
--A model with 3-channel images such as RGB and BGR as input is assumed. --For a model with many layers, ** the coordinates of the array in the final convolution layer and the vertical and horizontal coordinates in the input image may be different **, and a good heat map may not appear. When using well-known models such as ResNet, Xception and MobileNet, ** pay attention to the depth of the layer **.
Let's display the heat map obtained above on top of the original image.
For Grad-CAM and Grad-CAM ++, I used the code of gradcam ++ for keras.
The executable code can be found on github.
――The image displayed as ʻemphasized` shows the gaze area more clearly by adding threshold processing to the heat map. --Score-CAM seems to pick up the gaze area evenly. --Guided Backpropagation is posted for the time being, but as pointed out in here, ** does not reflect neural network information. There is suspicion **. ――If you want to extract the contours you are watching, it is still better to display the gradient as an image **, so the bottom image is the one that is overlaid by calculating the grad as an image.
Only the results are displayed for other images.
Measure the processing speed with Google cola boratory. Use GPU.
print("Grad-CAM")
%timeit grad_cam = GradCam(model, img_array, layer_name)
print("Grad-CAM++")
%timeit grad_cam_plus_plus = GradCamPlusPlus(model, img_array, layer_name)
print("Score-Cam")
%timeit score_cam = ScoreCam(model, img_array, layer_name)
print("Faster-Score-Cam N=10")
%timeit faster_score_cam = ScoreCam(model, img_array, layer_name, max_N=10)
print("Faster-Score-Cam N=3")
%timeit faster_score_cam = ScoreCam(model, img_array, layer_name, max_N=3)
print("Guided-BP}")
%timeit saliency = GuidedBackPropagation(guided_model, img_array, layer_name)
Grad-CAM
1 loop, best of 3: 196 ms per loop
Grad-CAM++
1 loop, best of 3: 221 ms per loop
Score-Cam
1 loop, best of 3: 5.24 s per loop
Faster-Score-Cam N=10
1 loop, best of 3: 307 ms per loop
Faster-Score-Cam N=3
The slowest run took 4.45 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 238 ms per loop
Guided-BP}
1 loop, best of 3: 415 ms per loop
As you can see, ** Score-CAM is very heavy **. ** It takes more than 25 times longer than Grad-CAM **.
When I experimented with the output of the final convolution layer (512 channels for VGG16), I thought that several channels were dominant in the generation of the final heatmap, and ** the latent variable of each channel. Faster-Score-CAM is the one with the processing of ** that preferentially uses the one with large map dispersion as the mask image. (I gave the name arbitrarily. If you set max_N = -1
, it will be the original Score-CAM)
The effect is as described in ** Comparison of processing speed **, and it is possible to increase the speed by ** 10 times or more **. Still, Grad-CAM ++ is faster.
To confirm its practicality, we apply Score-CAM to our own model trained on an open dataset.
DAGM data set is used as the data set, and ResNet (shallow one with about 80 layers) is used as the model.
Please rewrite the path dagm_path
of the location where you downloaded and unzipped DAGM dataset as appropriate.
from keras.utils import to_categorical
import numpy as np
import glob
from sklearn.model_selection import train_test_split
from gradcamutils import read_and_preprocess_img
num_classes = 2
img_size = (224,224)
dagm_path = "./DAGM"
def get_dagm_data(names):
x = []
y = []
for i, name in enumerate(names):
for path in glob.glob(f"{dagm_path}/{name}/*.png "):
img_array = read_and_preprocess_img(path, size=img_size)
x.append(img_array)
y.append(i)
x = np.concatenate(x, axis=0)
y = np.array(y)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=111)
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
return x_train, x_test, y_train, y_test
x_train, x_test, y_train, y_test = get_dagm_data(["Class1","Class1_def"])
I wear it sideways and use the one cut from ResNet in keras applications. (It's not very good writing, but it works, so it's good)
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.optimizers import Adam
from keras.layers import Dense, Input, Activation, GlobalAveragePooling2D
from keras.callbacks import EarlyStopping, ModelCheckpoint
def build_ResNet():
model = ResNet50(include_top=True, input_tensor=Input(shape=(img_size[0],img_size[1],3)))
x = model.layers[-98].output
x = Activation('relu', name="act_last")(x)
x = GlobalAveragePooling2D()(x)
x = Dense(2, name="dense_out")(x)
outputs = Activation('softmax')(x)
model = Model(model.input, outputs)
# model.summary()
model.compile(loss='binary_crossentropy',
optimizer=Adam(amsgrad=True),
metrics=['accuracy'])
return model
model = build_ResNet()
es_cb = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')
chkpt = './resnet_weight_DAGM.h5'
cp_cb = ModelCheckpoint(filepath = chkpt, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=True, mode='auto')
epochs = 15
batch_size = 32
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test),
callbacks=[es_cb,cp_cb],
class_weight={0: 1., 1: 6.},
shuffle=True)
#Load weight
model.load_weights('./resnet_weight_DAGM.h5')
import matplotlib.pyplot as plt
import cv2
import numpy as np
from gradcamutils import GradCam, GradCamPlusPlus, ScoreCam, GuidedBackPropagation, superimpose, read_and_preprocess_img
def build_ResNet_and_load():
model = build_ResNet()
model.load_weights('./resnet_weight_DAGM.h5')
return model
img_path = f'{dagm_path}/Class1_def/12.png'
orig_img = np.array(load_img(img_path),dtype=np.uint8)
img_array = read_and_preprocess_img(img_path, size=(224,224))
layer_name = "act_last"
grad_cam=GradCam(model,img_array,layer_name)
grad_cam_superimposed = superimpose(img_path, grad_cam)
grad_cam_emphasized = superimpose(img_path, grad_cam, emphasize=True)
grad_cam_plus_plus=GradCamPlusPlus(model,img_array,layer_name)
grad_cam_plus_plus_superimposed = superimpose(img_path, grad_cam_plus_plus)
grad_cam_plus_plus_emphasized = superimpose(img_path, grad_cam_plus_plus, emphasize=True)
score_cam=ScoreCam(model,img_array,layer_name)
score_cam_superimposed = superimpose(img_path, score_cam)
score_cam_emphasized = superimpose(img_path, score_cam, emphasize=True)
faster_score_cam=ScoreCam(model,img_array,layer_name, max_N=10)
faster_score_cam_superimposed = superimpose(img_path, faster_score_cam)
faster_score_cam_emphasized = superimpose(img_path, faster_score_cam, emphasize=True)
guided_model = build_guided_model(build_ResNet_and_load)
saliency = GuidedBackPropagation(guided_model, img_array, layer_name)
saliency_resized = cv2.resize(saliency, (orig_img.shape[1], orig_img.shape[0]))
grad_cam_resized = cv2.resize(grad_cam, (orig_img.shape[1], orig_img.shape[0]))
guided_grad_cam = saliency_resized * grad_cam_resized[..., np.newaxis]
grad_cam_plus_plus_resized = cv2.resize(grad_cam_plus_plus, (orig_img.shape[1], orig_img.shape[0]))
guided_grad_cam_plus_plus = saliency_resized * grad_cam_plus_plus_resized[..., np.newaxis]
score_cam_resized = cv2.resize(score_cam, (orig_img.shape[1], orig_img.shape[0]))
guided_score_cam = saliency_resized * score_cam_resized[..., np.newaxis]
faster_score_cam_resized = cv2.resize(faster_score_cam, (orig_img.shape[1], orig_img.shape[0]))
guided_faster_score_cam = saliency_resized * faster_score_cam_resized[..., np.newaxis]
img_gray = cv2.imread(img_path, 0)
dx = cv2.Sobel(img_gray, cv2.CV_64F, 1, 0, ksize=3)
dy = cv2.Sobel(img_gray, cv2.CV_64F, 0, 1, ksize=3)
grad = np.sqrt(dx ** 2 + dy ** 2) #Get image gradient
grad = cv2.dilate(grad,kernel=np.ones((5,5)), iterations=1) #Fattening process
grad -= np.min(grad)
grad /= np.max(grad) # scale 0. to 1.
grad_times_grad_cam = grad * grad_cam_resized
grad_times_grad_cam_plus_plus = grad * grad_cam_plus_plus_resized
grad_times_score_cam = grad * score_cam_resized
grad_times_faster_score_cam = grad * faster_score_cam_resized
fig, ax = plt.subplots(nrows=4,ncols=5, figsize=(18, 16))
ax[0,0].imshow(orig_img)
ax[0,0].set_title("input image")
ax[0,1].imshow(grad_cam_superimposed)
ax[0,1].set_title("Grad-CAM")
ax[0,2].imshow(grad_cam_plus_plus_superimposed)
ax[0,2].set_title("Grad-CAM++")
ax[0,3].imshow(score_cam_superimposed)
ax[0,3].set_title("Score-CAM")
ax[0,4].imshow(faster_score_cam_superimposed)
ax[0,4].set_title("Faster-Score-CAM")
ax[1,0].imshow(orig_img)
ax[1,0].set_title("input image")
ax[1,1].imshow(grad_cam_emphasized)
ax[1,1].set_title("Grad-CAM emphasized")
ax[1,2].imshow(grad_cam_plus_plus_emphasized)
ax[1,2].set_title("Grad-CAM++ emphasized")
ax[1,3].imshow(score_cam_emphasized)
ax[1,3].set_title("Score-CAM emphasized")
ax[1,4].imshow(faster_score_cam_emphasized)
ax[1,4].set_title("Faster-Score-CAM emphasized")
ax[2,0].imshow(saliency_resized)
ax[2,0].set_title("Guided-BP")
ax[2,1].imshow(guided_grad_cam)
ax[2,1].set_title("Guided-Grad-CAM")
ax[2,2].imshow(guided_grad_cam_plus_plus)
ax[2,2].set_title("Guided-Grad-CAM++")
ax[2,3].imshow(guided_score_cam)
ax[2,3].set_title("Guided-Score-CAM")
ax[2,4].imshow(guided_faster_score_cam)
ax[2,4].set_title("Guided-Faster-Score-CAM")
ax[3,0].imshow(grad, 'gray')
ax[3,0].set_title("grad")
ax[3,1].imshow(grad_times_grad_cam, 'gray')
ax[3,1].set_title("grad * Grad-CAM")
ax[3,2].imshow(grad_times_grad_cam_plus_plus, 'gray')
ax[3,2].set_title("grad * Grad-CAM++")
ax[3,3].imshow(grad_times_score_cam, 'gray')
ax[3,3].set_title("grad * Score-CAM")
ax[3,4].imshow(grad_times_faster_score_cam, 'gray')
ax[3,4].set_title("grad * Faster-Score-CAM")
for i in range(4):
for j in range(5):
ax[i,j].axis('off')
plt.show()
――It seems that all the methods can detect the position of the scratch well. ――It seems difficult to display something that emphasizes only scratches. (It can't be helped because the information of the final conv layer is taken out)
Only the results of threshold processing will be posted for 5 sheets in each class.
Class2
Class3
Class4
Class5
Class6
It seems that the position of the scratch can be represented almost correctly.
-In ** anomaly detection **, it can be used for rough visualization of abnormal parts. --It is very inconvenient for Grad-CAM, ++, and Score-CAM to ** limit the models that can be used **. --The coordinates of the final conv layer and the coordinates of the input image must correspond well, making it difficult to use a model with a large number of layers.
Recommended Posts