I tried fMRI data analysis with python (Introduction to brain information decoding)

There is no analysis example of fMRI data in Qiita ... !!

The other day, I thought, "There are various articles about qiita ... I think there aren't many examples of analyzing ** fMRI data ** using python or machine learning." Therefore, I searched qiita for an example of actually analyzing fMRI data of fMRI.

スクリーンショット 2020-04-23 15.18.02.png

that? Not really ...? !! There are commentary articles in the fields of tools and neuroscience, but I could not find any analysis using actual fMRI data.

So ** this time, we will analyze machine learning using fMRI data . Even though it is a brain image, it is a three-dimensional image after all. A cross-sectional view of a two-dimensional image obtained by slicing the brain in half is formed by overlapping, and each pixel has information in a unit called " voxel **". I'm just there. When it comes to 4D brain images, it's just that each three-dimensional voxel information contains time-series information (when scanning with fMRI).

Brain images are often published in a special format called the NIfTY format, which needs to be loaded. This time, instead of Matlab, which is said to be often used by students and researchers, brain images are read and analyzed using a library called ** Nilearn ** of python (keep in mind that anyone can analyze it). Therefore, I stopped paying Matlab). The machine learning analysis itself is learned and predicted by the ** scikit-learn ** SVM that everyone who has done machine learning will have used or heard.

In addition, although it is an approach using machine learning, this time we will use fMRI data that masks a specific brain region to perform an analysis that predicts "what was being seen by fMRI at that time from brain activity" * *. In the brain analysis industry, it seems to be called ** brain information decoding **, but it's just machine learning (sorry for the people in the industry). In the past, we used an approach such as encoding, in which the behavior performed in some experimental task was used as an explanatory variable, and the brain region that responded statistically significantly when performed by fMRI was detected as an independent variable. I think it came from the fact that it was called decoding because it took the opposite approach (that is, instead of looking at the brain region that was statistically significantly excited by the task, the activity of one brain region. Predict tasks from).

About the data actually used

This time, I analyzed it by referring to the tutorial of decoding analysis of Nilearn: Machine learning for Neuro-Imaging in Python, which is used for reading fMRI data, etc. ** The Haxby 2001 experiment ** Use the brain image published in.

A introduction tutorial to fMRI decoding

If you follow the tutorial above, it says that you can easily collect a dataset of subjects from The Haxby 2001 experiment with the following code.

from nilearn import datasets
#By default, the data for the second subject is retrieved
haxby_dataset = datasets.fetch_haxby()

However, it seems that there is a problem with the database of NIRTC that provides the data, but I could not download it, so I took the means to download it directly and save it locally. You can download subj2-2010.01.14.tar.gz, which contains the data of the second subject, from this URL, so if you want to analyze the data, please download it from here (even data of other subject numbers). does not matter).

http://data.pymvpa.org/datasets/haxby2001/

In this experiment, subjects are presented with various categories of visual stimuli (scissors, human faces, cats, etc.) in an fMRI scanner. In this tutorial, fMRI brain activity recorded within the area of the ventral cortical visual tract (which is involved in the representation of color and shape) predicts which category the subject is viewing with fMRI.

Analysis procedure

In the source code, the name of the downloaded folder is changed to subj2.

Load and visualize images

First, let's read the 4D fMRI data. As mentioned above, this is a format that contains time-series information when performing an experimental task in a brain image that is voxel data (three-dimensional data).

However, since it is 4D fMRI data, it cannot be visualized by processing corresponding to 3D fMRI data (without time series information) such as nilearn.plotting.plot_epi. Therefore, the visualized 4D fMRI data is averaged and converted into one 3D fMRI data. This can be achieved by importing from nilearn.image import mean_img.

fmri_filename = "~/Desktop/nilearn/subj2/bold.nii.gz"

from nilearn import plotting
from nilearn.image import mean_img
plotting.view_img(mean_img(fmri_filename), threshold=None)

result スクリーンショット 2020-04-23 22.10.45.png

First of all, we were able to visualize the brain image. It's great to be able to visualize in just a few lines.

Extract features of fMRI data and convert to numpy format data

Maybe some people were relieved to see the word numpy. Here, we use a function called nilearn.input_data.NiftiMasker to mask the fMRI data of the ventral cortical visual tract and convert it to numpy format.

First, let's visualize the fMRI data of only the ventral cortical visual tract.

mask_filename = "~/Desktop/nilearn/subj2/mask4_vt.nii.gz"
anat_filename = "~/Desktop/nilearn/subj2/anat.nii.gz"

#Try to display the mask range against the background of the subject's anatomical image
plotting.plot_roi(mask_filename, bg_img=anat_filename,
                 cmap='Paired')

result スクリーンショット 2020-04-23 22.25.35.png

Next, let's create a mask. I would like to standardize the data as a way to improve the accuracy of machine learning. The mask will be extracted in the form of a 2D array when it is ready for machine learning with nilearn.

from nilearn.input_data import NiftiMasker

masker = NiftiMasker(mask_img=mask_filename, standardize=True) #Standardize the mask
fmri_masked = masker.fit_transform(fmri_filename) #Apply the BOLD signal to the standardized mask

By the way, by executing masker.generate_report (), you can show the NIfTY image of the mask and check various parameters and so on.

You have now extracted the masked data matrix in the form of fmri_masked as numpy format. I will actually try it.

# fmri_masked is in the form of a numpy array.
print(fmri_masked)
#The size of this data is the number of voxels and time series points(Number of scans)It consists of the total number of.
# shape[0]Is the number of scans, shape[1]Is the number of voxels
print(fmri_masked.shape)

Output result

[[ 7.6757896e-01  2.3108697e+00 -2.0519443e-01 ... -1.0261143e+00
   8.7993503e-02  2.0720518e+00]
 [ 5.5640817e-01  1.6833434e+00 -2.4644937e-01 ... -7.0238107e-01
  -3.4570047e-01  2.0341001e+00]
 [ 7.6757896e-01  1.9186659e+00  1.0802225e-03 ... -9.9374104e-01
  -2.7630943e-01  2.1479552e+00]
 ...
 [-4.2905563e-01 -1.6896105e+00 -7.4150854e-01 ... -1.5440876e+00
   1.8054217e+00 -1.6709718e-01]
 [-1.4749455e-01 -1.8072717e+00 -2.4644937e-01 ... -1.7707009e+00
   1.5452052e+00  7.8169477e-01]
 [-2.1788482e-01 -1.4542881e+00  1.0802225e-03 ... -1.6412076e+00
   1.2676411e+00  8.9554977e-01]]
(1452, 464)

As confirmed by fmri_masked.shape, 464 stores the number of voxels in the masked brain region, and 1452 stores time series information (fMRI obtains fMRI data with a time resolution of TR, Normally, it is set at intervals of about 1 to 2.5 seconds, but decoding that analyzes using information on a specific brain region and neurofeedback that measures brain activity in real time and trains the subject In the method called, I think that a fairly short TR was often set).

As a test, the time series information of the first three voxels is extracted as follows.

#Display the first three voxels selected from the matrix.
import matplotlib.pyplot as plt
plt.plot(fmri_masked[5:150, :3])

plt.title('Voxel Time Series')
plt.xlabel('Scan number')
plt.ylabel('Normalized signal')
plt.tight_layout()

スクリーンショット 2020-04-23 22.44.52.png

By the way, sharp people may wonder why the fmri_masked [5: 150,: 3] part is not in the form of fmri_masked [: 150,: 3]. This is called a ** dummy scan **, because the first scan that begins to image fMRI has a high signal value and is often not used in actual analysis. The signal value is high because the T1 effect is not stable (please google on other sites for details lol).

Capture action labels during experimental tasks

Now that we have extracted the fMRI data of the ventral cortical visual tract in numpy format, we will use this data for machine learning. ** This time we will be supervised learning, so we need a label in addition to the data **. Therefore, we will work to extract the information of the category that the subject was looking at at the time of the experiment in the downloaded data as a label.

import pandas as pd
#Read the action result information. There are as many labels as there are scans
behavioral = pd.read_csv('~/Desktop/nilearn/subj2/labels.txt', delimiter=' ')

Output result

	labels	chunks
0	rest	0
1	rest	0
2	rest	0
3	rest	0
4	rest	0
5	rest	0
6	scissors	0
7	scissors	0
8	scissors	0
9	scissors	0
10	scissors	0
11	scissors	0
12	scissors	0
13	scissors	0
14	scissors	0
...	...	...
1426	cat	11
1427	cat	11
1428	cat	11
1429	cat	11
1430	cat	11
1431	cat	11
1432	rest	11
1433	rest	11
1434	rest	11
1435	rest	11
1436	rest	11
1437	scissors	11
1438	scissors	11
1439	scissors	11
1440	scissors	11
1441	scissors	11
1442	scissors	11
1443	scissors	11
1444	scissors	11
1445	scissors	11
1446	rest	11
1447	rest	11
1448	rest	11
1449	rest	11
1450	rest	11
1451	rest	11
1452 rows × 2 columns

Since this task is a visual recognition task as described above, the label shows the experimental conditions (category of the object shown to the subject). chunks represent the number of sessions (because if you let the subject do the task forever, the physical load is not even, so usually you do similar tasks multiple times, checking your physical condition at session breaks. I will continue).

conditions = behavioral['labels']

Output result

0           rest
1           rest
2           rest
3           rest
4           rest
5           rest
6       scissors
7       scissors
8       scissors
9       scissors
10      scissors
11      scissors
12      scissors
13      scissors
14      scissors
15          rest
16          rest
17          rest
18          rest
19          rest
20          rest
21          face
22          face
          ...   
1431         cat
1432        rest
1433        rest
1434        rest
1435        rest
1436        rest
1437    scissors
1438    scissors
1439    scissors
1440    scissors
1441    scissors
1442    scissors
1443    scissors
1444    scissors
1445    scissors
1446        rest
1447        rest
1448        rest
1449        rest
1450        rest
1451        rest
Name: labels, Length: 1452, dtype: object

This time, following the tutorial, we will use only the experimental conditions for cats and faces. There are many other labels, so if you are interested, please try to verify them in other categories.

#Fetch voxels corresponding to the number of scans under experimental conditions in which the face and cat are presented to the masked brain image
condition_mask = conditions.isin(['face','cat'])
fmri_masked = fmri_masked[condition_mask]
print(fmri_masked.shape)

#Apply the same conditions to labels
conditions = conditions[condition_mask]
print(conditions.shape)

Decoding with SVM

This time, we will use scikit-learn's machine learning toolbox for the retrieved fmri_masked data. I am using an SVM linear kernel as the decoder. Since it is scikit-learn, you can do everything from learning to prediction in just a few lines.

from sklearn.svm import SVC
svc = SVC(kernel='linear')

#Learn and make predictions
svc.fit(fmri_masked, conditions)
prediction = svc.predict(fmri_masked)

However, this is completely useless for analysis, so cross-validation is performed (obviously).

Perform cross-validation to calculate the predicted score

Perform KFold cross-validation

from sklearn.model_selection import KFold
import numpy as np
cv = KFold(n_splits=5)
scores = []
for train, test in cv.split(X=fmri_masked):
  svc.fit(fmri_masked[train], conditions.values[train])
  prediction = svc.predict(fmri_masked[test])
  scores.append((prediction == conditions.values[test]).sum()
        / float(len(conditions.values[test])))
#Average predictive performance
print(np.mean(scores))

Output result:
0.7628964059196617

It is also possible to use the cross_val_score function to perform distributed processing using multiple CPUs on the machine for calculation processing of evaluation for each subset. (It can be specified in the form of n_jobs = n, and if set to -1, parallel processing can be performed using all CPUs available on the machine).

from sklearn.model_selection import cross_val_score
cv_score = cross_val_score(svc, fmri_masked, conditions)
print(cv_score)

Output result:
[0.86363636 0.76744186 0.74418605 0.69767442 0.81395349]

** However, this is still insufficient for verification. The reason is that the performance evaluation does not take into account the impact of each session. ** **

Evaluate prediction accuracy for each session

The fMRI data is acquired on a session-by-session basis, and the noise is autocorrelated for each specific session. Therefore, when cross-validating with these implications in mind, it is necessary to make predictions across sessions.

from sklearn.model_selection import LeaveOneGroupOut
cv = LeaveOneGroupOut()
cv_score = cross_val_score(svc,
                           fmri_masked,
                           conditions,
                           cv=cv,
                           groups=session_label,
                           )
#Get the prediction accuracy in the verification data for each session
print(cv_score)

Output result:
[0.55555556 1.         0.66666667 0.66666667 0.77777778 0.72222222
 0.88888889 0.38888889 0.66666667 0.5        0.77777778 0.66666667]

Estimate the weight of the trained model

Finally, the model weights are estimated and displayed. To do this, we first convert the weights to NIfTY images.

coef_ = svc.coef_

#It is in the form of a numpy array, with one coefficient per voxel.(weight)have
print(coef_.shape)

Output result:
(1, 464)

coef_img = masker.inverse_transform(coef_)
print(coef_img)

Output result:
<class 'nibabel.nifti1.Nifti1Image'>
data shape (40, 64, 64, 1)
affine: 
[[  -3.5      0.       0.      68.25 ]
 [   0.       3.75     0.    -118.125]
 [   0.       0.       3.75  -118.125]
 [   0.       0.       0.       1.   ]]
metadata:
<class 'nibabel.nifti1.Nifti1Header'> object, endian='<'
sizeof_hdr      : 348
data_type       : b''
db_name         : b''
extents         : 0
session_error   : 0
regular         : b''
dim_info        : 0
dim             : [ 4 40 64 64  1  1  1  1]
intent_p1       : 0.0
intent_p2       : 0.0
intent_p3       : 0.0
intent_code     : none
datatype        : float64
bitpix          : 64
slice_start     : 0
pixdim          : [-1.    3.5   3.75  3.75  1.    1.    1.    1.  ]
vox_offset      : 0.0
scl_slope       : nan
scl_inter       : nan
slice_end       : 0
slice_code      : unknown
xyzt_units      : 0
cal_max         : 0.0
cal_min         : 0.0
slice_duration  : 0.0
toffset         : 0.0
glmax           : 0
glmin           : 0
descrip         : b''
aux_file        : b''
qform_code      : unknown
sform_code      : aligned
quatern_b       : 0.0
quatern_c       : 1.0
quatern_d       : 0.0
qoffset_x       : 68.25
qoffset_y       : -118.125
qoffset_z       : -118.125
srow_x          : [-3.5   0.    0.   68.25]
srow_y          : [   0.       3.75     0.    -118.125]
srow_z          : [   0.       0.       3.75  -118.125]
intent_name     : b''
magic           : b'n+1'

Save the created NIfTY format image in nii.gz format

# nii.Save in gz file format
coef_img.to_filename('haxby_svc_weights.nii.gz')

Finally, let's visualize the weighting factor as a brain image.

#Actually plot the weight of SVM
from nilearn.plotting import plot_stat_map, show
plot_stat_map(coef_img, bg_img=anat_filename,
              title="SVM weights", display_mode='yx')
show()

スクリーンショット 2020-04-24 0.12.13.png

With the above, we have performed a general fMRI data analysis using machine learning.

Summary

I was planning to post this analysis on my personal blog, but I couldn't find anyone who was analyzing fMRI data on Qiita, so I posted it here (individual blogs are posted on my profile).

Image of fMRI data ... It may seem difficult just to hear it, but in the end it is just more noise than a general image, one more dimension, and removing the part that handles special formats. Image analysis. One goal is to be able to analyze maniac data for any person, and I am posting it through my personal blog and Qiita, so I hope that you will be interested in analyzing data that you rarely touch through this activity.

If you don't mind, LG ... Thank you for your cooperation! Lol

A note named PS

This time, we will analyze using raw fMRI time series data, but it is necessary to convolve what is originally called an HRF function and convert it from task presentation (nerve activity) to blood flow dynamics (BOLD reaction). Furthermore, in order to get a trial-by-trial activity map (beta map), it is necessary to fit at 1st-level using a general linear model. This makes it possible to make predictions between related conditions for each trial. Actually, various processing processes are required for actual fMRI data analysis, but this time it is an introduction aimed at applying machine learning to fMRI data, so I decided to perform the above-mentioned processing (actually). If you follow these steps, it won't fit in one article (sweat).

Please note (I would like to analyze something in earnest if there is an opportunity)