Overview

--The method "** Score-CAM " of Paper submitted to arxiv on October 3, 2019 has been implemented in keras. - Compared with Grad-CAM and Grad-CAM ++ **. --Implemented ** Faster Score-CAM **, which has been speeded up independently. -Applied to the model trained with ** DAGM dataset **. --The code can be found on github.

What is Score-CAM?

――One of the visualization methods of CNN's judgment basis --There is a wonderful summary on here for understanding the basis of judgment. --Previous research includes Grad-CAM and Grad-CAM ++. --Noise, improved stability, ** no longer dependent on gradient calculation **

environment

Python 3.6.8
Keras 2.2.4
tensorflow-gpu 1.14.0

Score-CAM procedure

Get the activation map and expand it with bilinear interpolation to the same size as the first input
Normalize each channel to the interval [0,1]
For the input image, prepare as many masked images as the number of channels by multiplying each channel.
Pass each masked image through CNN to get the array after softmax operation.
The importance of each channel is defined by the score of the target class after softmax.
The final score map is obtained by adding the importance of each channel and performing the ReLU calculation. (I'm not interested in negative elements through ReLU)

Implemented with keras

import cv2
import numpy as np
from keras.models import Model

def ScoreCam(model, img_array, layer_name, max_N=-1):

    cls = np.argmax(model.predict(img_array))
    act_map_array = Model(inputs=model.input, outputs=model.get_layer(layer_name).output).predict(img_array)
    
    # extract effective maps
    if max_N != -1:
        act_map_std_list = [np.std(act_map_array[0,:,:,k]) for k in range(act_map_array.shape[3])]
        unsorted_max_indices = np.argpartition(-np.array(act_map_std_list), max_N)[:max_N]
        max_N_indices = unsorted_max_indices[np.argsort(-np.array(act_map_std_list)[unsorted_max_indices])]
        act_map_array = act_map_array[:,:,:,max_N_indices]

    input_shape = model.layers[0].output_shape[1:]  # get input shape
    # 1. upsampled to original input size
    act_map_resized_list = [cv2.resize(act_map_array[0,:,:,k], input_shape[:2], interpolation=cv2.INTER_LINEAR) for k in range(act_map_array.shape[3])]
    # 2. normalize the raw activation value in each activation map into [0, 1]
    act_map_normalized_list = []
    for act_map_resized in act_map_resized_list:
        if np.max(act_map_resized) - np.min(act_map_resized) != 0:
            act_map_normalized = act_map_resized / (np.max(act_map_resized) - np.min(act_map_resized))
        else:
            act_map_normalized = act_map_resized
        act_map_normalized_list.append(act_map_normalized)
    # 3. project highlighted area in the activation map to original input space by multiplying the normalized activation map
    masked_input_list = []
    for act_map_normalized in act_map_normalized_list:
        masked_input = np.copy(img_array)
        for k in range(3):
            masked_input[0,:,:,k] *= act_map_normalized
        masked_input_list.append(masked_input)
    masked_input_array = np.concatenate(masked_input_list, axis=0)
    # 4. feed masked inputs into CNN model and softmax
    pred_from_masked_input_array = softmax(model.predict(masked_input_array))
    # 5. define weight as the score of target class
    weights = pred_from_masked_input_array[:,cls]
    # 6. get final class discriminative localization map as linear weighted combination of all activation maps
    cam = np.dot(act_map_array[0,:,:,:], weights)
    cam = np.maximum(0, cam)  # Passing through ReLU
    cam /= np.max(cam)  # scale 0 to 1.0
    
    return cam

def softmax(x):
    f = np.exp(x)/np.sum(np.exp(x), axis = 1, keepdims = True)
    return f

Score-CAM with VGG16

Let's apply it to VGG16 that has been trained with ImageNet.

Image loading

from keras.preprocessing.image import load_img
import matplotlib.pyplot as plt

orig_img = np.array(load_img('./image/hummingbird.jpg'),dtype=np.uint8)
plt.imshow(orig_img)
plt.show()

Map obtained by Score-CAM

from keras.applications.vgg16 import VGG16
from gradcamutils import read_and_preprocess_img
import matplotlib.pyplot as plt

model = VGG16(include_top=True, weights='imagenet')
layer_name = 'block5_conv3'
img_array = read_and_preprocess_img('./image/hummingbird.jpg', size=(224,224))

score_cam = ScoreCam(model,img_array,layer_name)

plt.imshow(score_cam)
plt.show()

Argument explanation

--model: model instance of keras --img_array: Pre-processed data of the image for which you want to determine the gaze area. It should be in a form that can execute predict immediately, such asmodel.predict (img_array). --layer_name: The name of the activation layer immediately after the final convolution layer. If the activation layer is included in the convolution layer, the name of the convolution layer may be used. You can check the layer name with model.summary (). --max_N: Setting value for speeding up that I implemented without permission. If -1, the original Score-CAM. Specifying a natural number reduces the number of CNN inferences to that number. The recommended value is about 10. A large value will only increase the processing time and will not have much effect on the heatmap, but if it is too small, the heatmap will become strange.

Other cautions

--A model with 3-channel images such as RGB and BGR as input is assumed. --For a model with many layers, ** the coordinates of the array in the final convolution layer and the vertical and horizontal coordinates in the input image may be different **, and a good heat map may not appear. When using well-known models such as ResNet, Xception and MobileNet, ** pay attention to the depth of the layer **.

Compare with Grad-CAM, Grad-CAM ++

Let's display the heat map obtained above on top of the original image.

For Grad-CAM and Grad-CAM ++, I used the code of gradcam ++ for keras.

The executable code can be found on github.

――The image displayed as ʻemphasized` shows the gaze area more clearly by adding threshold processing to the heat map. --Score-CAM seems to pick up the gaze area evenly. --Guided Backpropagation is posted for the time being, but as pointed out in here, ** does not reflect neural network information. There is suspicion **. ――If you want to extract the contours you are watching, it is still better to display the gradient as an image **, so the bottom image is the one that is overlaid by calculating the grad as an image.

Only the results are displayed for other images.

Processing speed comparison

Measure the processing speed with Google cola boratory. Use GPU.

print("Grad-CAM")
%timeit grad_cam = GradCam(model, img_array, layer_name)
print("Grad-CAM++")
%timeit grad_cam_plus_plus = GradCamPlusPlus(model, img_array, layer_name)
print("Score-Cam")
%timeit score_cam = ScoreCam(model, img_array, layer_name)
print("Faster-Score-Cam N=10")
%timeit faster_score_cam = ScoreCam(model, img_array, layer_name, max_N=10)
print("Faster-Score-Cam N=3")
%timeit faster_score_cam = ScoreCam(model, img_array, layer_name, max_N=3)
print("Guided-BP}")
%timeit saliency = GuidedBackPropagation(guided_model, img_array, layer_name)

Grad-CAM
1 loop, best of 3: 196 ms per loop
Grad-CAM++
1 loop, best of 3: 221 ms per loop
Score-Cam
1 loop, best of 3: 5.24 s per loop
Faster-Score-Cam N=10
1 loop, best of 3: 307 ms per loop
Faster-Score-Cam N=3
The slowest run took 4.45 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 238 ms per loop
Guided-BP}
1 loop, best of 3: 415 ms per loop

As you can see, ** Score-CAM is very heavy **. ** It takes more than 25 times longer than Grad-CAM **.

Improvement of processing speed (Faster-Score-CAM)

When I experimented with the output of the final convolution layer (512 channels for VGG16), I thought that several channels were dominant in the generation of the final heatmap, and ** the latent variable of each channel. Faster-Score-CAM is the one with the processing of ** that preferentially uses the one with large map dispersion as the mask image. (I gave the name arbitrarily. If you set max_N = -1, it will be the original Score-CAM)

The effect is as described in ** Comparison of processing speed **, and it is possible to increase the speed by ** 10 times or more **. Still, Grad-CAM ++ is faster.

When using your own model

To confirm its practicality, we apply Score-CAM to our own model trained on an open dataset.

DAGM data set is used as the data set, and ResNet (shallow one with about 80 layers) is used as the model.

DAGM dataset preparation

Please rewrite the path dagm_path of the location where you downloaded and unzipped DAGM dataset as appropriate.

from keras.utils import to_categorical
import numpy as np  
import glob
from sklearn.model_selection import train_test_split  
from gradcamutils import read_and_preprocess_img

num_classes = 2                               
img_size = (224,224)
dagm_path = "./DAGM"

def get_dagm_data(names):
    x = []
    y = []
    for i, name in enumerate(names):
        for path in glob.glob(f"{dagm_path}/{name}/*.png "):    
            img_array = read_and_preprocess_img(path, size=img_size)
            x.append(img_array)  
            y.append(i) 

    x = np.concatenate(x, axis=0)   
    y = np.array(y)  

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=111)

    y_train = to_categorical(y_train, num_classes)
    y_test = to_categorical(y_test, num_classes)

    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')
    return x_train, x_test, y_train, y_test

x_train, x_test, y_train, y_test = get_dagm_data(["Class1","Class1_def"])

ResNet preparation

I wear it sideways and use the one cut from ResNet in keras applications. (It's not very good writing, but it works, so it's good)

from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.optimizers import Adam
from keras.layers import Dense, Input, Activation, GlobalAveragePooling2D
from keras.callbacks import EarlyStopping, ModelCheckpoint

def build_ResNet():
    model = ResNet50(include_top=True, input_tensor=Input(shape=(img_size[0],img_size[1],3)))

    x = model.layers[-98].output
    x = Activation('relu', name="act_last")(x)
    x = GlobalAveragePooling2D()(x)
    x = Dense(2, name="dense_out")(x)
    outputs = Activation('softmax')(x)

    model = Model(model.input, outputs)
    # model.summary()

    model.compile(loss='binary_crossentropy',
                  optimizer=Adam(amsgrad=True),
                  metrics=['accuracy'])
    return model

model = build_ResNet()

es_cb = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')
chkpt = './resnet_weight_DAGM.h5'
cp_cb = ModelCheckpoint(filepath = chkpt, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=True, mode='auto')

epochs = 15
batch_size = 32

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    validation_data=(x_test, y_test),
                    callbacks=[es_cb,cp_cb],
                    class_weight={0: 1., 1: 6.},
                    shuffle=True)

#Load weight
model.load_weights('./resnet_weight_DAGM.h5')

Apply Grad-CAM, ++, Score-CAM to scratched images

import matplotlib.pyplot as plt
import cv2
import numpy as np
from gradcamutils import GradCam, GradCamPlusPlus, ScoreCam, GuidedBackPropagation, superimpose, read_and_preprocess_img

def build_ResNet_and_load():
    model = build_ResNet()
    model.load_weights('./resnet_weight_DAGM.h5')
    return model

img_path = f'{dagm_path}/Class1_def/12.png'
orig_img = np.array(load_img(img_path),dtype=np.uint8)
img_array = read_and_preprocess_img(img_path, size=(224,224))

layer_name = "act_last"

grad_cam=GradCam(model,img_array,layer_name)
grad_cam_superimposed = superimpose(img_path, grad_cam)
grad_cam_emphasized = superimpose(img_path, grad_cam, emphasize=True)

grad_cam_plus_plus=GradCamPlusPlus(model,img_array,layer_name)
grad_cam_plus_plus_superimposed = superimpose(img_path, grad_cam_plus_plus)
grad_cam_plus_plus_emphasized = superimpose(img_path, grad_cam_plus_plus, emphasize=True)

score_cam=ScoreCam(model,img_array,layer_name)
score_cam_superimposed = superimpose(img_path, score_cam)
score_cam_emphasized = superimpose(img_path, score_cam, emphasize=True)

faster_score_cam=ScoreCam(model,img_array,layer_name, max_N=10)
faster_score_cam_superimposed = superimpose(img_path, faster_score_cam)
faster_score_cam_emphasized = superimpose(img_path, faster_score_cam, emphasize=True)

guided_model = build_guided_model(build_ResNet_and_load)
saliency = GuidedBackPropagation(guided_model, img_array, layer_name)
saliency_resized = cv2.resize(saliency, (orig_img.shape[1], orig_img.shape[0]))

grad_cam_resized = cv2.resize(grad_cam, (orig_img.shape[1], orig_img.shape[0]))
guided_grad_cam = saliency_resized * grad_cam_resized[..., np.newaxis]

grad_cam_plus_plus_resized = cv2.resize(grad_cam_plus_plus, (orig_img.shape[1], orig_img.shape[0]))
guided_grad_cam_plus_plus = saliency_resized * grad_cam_plus_plus_resized[..., np.newaxis]

score_cam_resized = cv2.resize(score_cam, (orig_img.shape[1], orig_img.shape[0]))
guided_score_cam = saliency_resized * score_cam_resized[..., np.newaxis]

faster_score_cam_resized = cv2.resize(faster_score_cam, (orig_img.shape[1], orig_img.shape[0]))
guided_faster_score_cam = saliency_resized * faster_score_cam_resized[..., np.newaxis]

img_gray = cv2.imread(img_path, 0)
dx = cv2.Sobel(img_gray, cv2.CV_64F, 1, 0, ksize=3)
dy = cv2.Sobel(img_gray, cv2.CV_64F, 0, 1, ksize=3)
grad = np.sqrt(dx ** 2 + dy ** 2)  #Get image gradient
grad = cv2.dilate(grad,kernel=np.ones((5,5)), iterations=1)  #Fattening process
grad -= np.min(grad)
grad /= np.max(grad)  # scale 0. to 1.

grad_times_grad_cam = grad * grad_cam_resized
grad_times_grad_cam_plus_plus = grad * grad_cam_plus_plus_resized
grad_times_score_cam = grad * score_cam_resized
grad_times_faster_score_cam = grad * faster_score_cam_resized

fig, ax = plt.subplots(nrows=4,ncols=5, figsize=(18, 16))
ax[0,0].imshow(orig_img)
ax[0,0].set_title("input image")
ax[0,1].imshow(grad_cam_superimposed)
ax[0,1].set_title("Grad-CAM")
ax[0,2].imshow(grad_cam_plus_plus_superimposed)
ax[0,2].set_title("Grad-CAM++")
ax[0,3].imshow(score_cam_superimposed)
ax[0,3].set_title("Score-CAM")
ax[0,4].imshow(faster_score_cam_superimposed)
ax[0,4].set_title("Faster-Score-CAM")
ax[1,0].imshow(orig_img)
ax[1,0].set_title("input image")
ax[1,1].imshow(grad_cam_emphasized)
ax[1,1].set_title("Grad-CAM emphasized")
ax[1,2].imshow(grad_cam_plus_plus_emphasized)
ax[1,2].set_title("Grad-CAM++ emphasized")
ax[1,3].imshow(score_cam_emphasized)
ax[1,3].set_title("Score-CAM emphasized")
ax[1,4].imshow(faster_score_cam_emphasized)
ax[1,4].set_title("Faster-Score-CAM emphasized")
ax[2,0].imshow(saliency_resized)
ax[2,0].set_title("Guided-BP")
ax[2,1].imshow(guided_grad_cam)
ax[2,1].set_title("Guided-Grad-CAM")
ax[2,2].imshow(guided_grad_cam_plus_plus)
ax[2,2].set_title("Guided-Grad-CAM++")
ax[2,3].imshow(guided_score_cam)
ax[2,3].set_title("Guided-Score-CAM")
ax[2,4].imshow(guided_faster_score_cam)
ax[2,4].set_title("Guided-Faster-Score-CAM")
ax[3,0].imshow(grad, 'gray')
ax[3,0].set_title("grad")
ax[3,1].imshow(grad_times_grad_cam, 'gray')
ax[3,1].set_title("grad * Grad-CAM")
ax[3,2].imshow(grad_times_grad_cam_plus_plus, 'gray')
ax[3,2].set_title("grad * Grad-CAM++")
ax[3,3].imshow(grad_times_score_cam, 'gray')
ax[3,3].set_title("grad * Score-CAM")
ax[3,4].imshow(grad_times_faster_score_cam, 'gray')
ax[3,4].set_title("grad * Faster-Score-CAM")
for i in range(4):
    for j in range(5):
        ax[i,j].axis('off')
plt.show()

――It seems that all the methods can detect the position of the scratch well. ――It seems difficult to display something that emphasizes only scratches. (It can't be helped because the information of the final conv layer is taken out)

From Class 2 to Class 6

Only the results of threshold processing will be posted for 5 sheets in each class.

Class2

Class3

Class4

Class5

Class6

It seems that the position of the scratch can be represented almost correctly.

Impressions

-In ** anomaly detection **, it can be used for rough visualization of abnormal parts. --It is very inconvenient for Grad-CAM, ++, and Score-CAM to ** limit the models that can be used **. --The coordinates of the final conv layer and the coordinates of the input image must correspond well, making it difficult to use a model with a large number of layers.

reference

-Original paper

gradcam++ for keras

Score-CAM implementation with keras. Comparison with Grad-CAM