[Survey] Kaggle --Data Science Bowl 2017, 2nd place solution

Kaggle's lung cancer detection competition Data Science Bowl 2017 [^ 1](hereinafter referred to as DSB2017) is a survey of the second solution.

Names: Julian & Daniel
Title: Very quick 1st summary of julian's part of 2nd place solution.
URL: https://www.kaggle.com/c/data-science-bowl-2017/discussion/31551

What is DSB2017

--Lung cancer detection contest --It is possible only to detect the presence or absence of each case (it is not necessary to specify the position)

As a flow of detection, there is a whitish mass called a lung nodule [^ 7] on the CT image, and if it is malignant, there may be lung cancer. Therefore, in order for the computer to detect lung cancer,

Detect nodular shadows
Determine if you have lung cancer

It seems that there are many methods that take the approach of detecting by two-step judgment.

External dataset

LUNA16^[1]
LIDC-IDRI^[2]

LUNA16

--Pulmonary nodule dataset (unknown if it is a malignant tumor) --888 CT data --Labeled lung nodules with 3 patterns: "None", "3 mm or less", and "3 mm or more" --Excludes CT with slice thickness of 2. mm or more --Automatic detection of lung nodules for large datasets such as LIDC and IDRI is a challenge ――The diagnosis results of 4 radiologists are described.

LIDC-IDRI

--Lung image dataset --1018 CT data --Lung nodules are labeled with 3 patterns: "None", "3 mm or less", and "3 mm or more". -There is also information on malignant tumor.

Julian

Code: https://github.com/juliandewit/kaggle_ndsb2017/
Paper: http://juliandewit.github.io/kaggle-ndsb2017/

environment

GTX-980
Keras
Windows(64bit)

approach

--Using a malignant tumor labeled by a LIDC doctor --Use the negative label of LUNA16 (normal patient) (The positive data cannot be used for this contest because LUNA16 has only lung nodule information and no data set of whether it is malignant or not.) --1 Voxel conversion to 1 mm isometric voxel --Learning with the original image instead of the data extracted only from the lung field so as not to overlook the lung nodule at the boundary --Scale conversion so that the coordinates fit within 0 to 1 --Check with your own viewer whether the positive and negative examples are really correct

label

--All the parts that even one of the doctors judged to be positive were treated as positive (the judgment by 4 radiologists is done by LUNA16). --In LUNA16, lung nodules larger than 3 cm were not labeled (segmentation?), So data with nodules 3 cm or larger in LUNA16 were excluded.

model

--32x32x32 3D ConvNet detects lung nodules and determines whether they are malignant at the same time. (Daniel divided into two stages) --VGG-like C3D [^ 4] structure --No time is spent optimizing network parameters. --Upsample positive so that positive: negative = 1:20 (originally positive: negative = 1: 200) --Uses 3 zoom levels (1.0, 1.5, 2.0mm / voxel) to detect large nodules --LUNA16 usage data: --Candidate score: 400,000 --Non-lung tissue: 100,000 --False positives: 10,000 (lung nodules but not malignant) --LIDC positive: 2,000 --Since the location of the malignant tumor is not described in NDSB, I marked the location by myself. --LUNA16 did not handle lung nodules larger than 3 cm, so it was excluded from the dataset to avoid network disruption. -Determine whether there is a malignant tumor by dividing CT images at 12 mm intervals. --Only the maximum malignant tumor degree and z coordinate are used as the features of the final prediction. ――In other words, with 240x240x240 CT data, 20x20x20 = 8000 predictions -Enabled detection of large lung nodules at 3 zoom levels (1.0, 1.5mm, 2.0mm / voxel)

Strange tissue detection

――Apart from the judgment by 3D ConvNet, we also try to detect strange tissues. --There were about 10 strange tissues in the training data, and half of them were malignant tumors. ――It is unknown what definition is used for "strange tissue".

Final decision

Xgboost^[3] --Maximum malignant tumor level at 3 zoom levels --z coordinate --min_child_weight: 60 (to prevent overfitting) --Submitted on average with Daniels' solution

Daniels

Code: https://github.com/dhammack/DSB2017/
Paper: https://github.com/dhammack/DSB2017/blob/master/dsb_2017_daniel_hammack.pdf

approach

--Using 64x64x64 ResNet-like 3D ConvNet --Use LIDC malignancies and nodule information --Use U-Net [^ 6] to extract suspicious areas

References

LUNA16, 2016. ↩︎
LIDC-IDRI, 2016. ↩︎
dmlc, Xgboost ↩︎