The other day, I thought, "There are various articles about qiita ... I think there aren't many examples of analyzing ** fMRI data ** using python or machine learning." Therefore, I searched qiita for an example of actually analyzing fMRI data of fMRI.
that? Not really ...? !! There are commentary articles in the fields of tools and neuroscience, but I could not find any analysis using actual fMRI data.
So ** this time, we will analyze machine learning using fMRI data . Even though it is a brain image, it is a three-dimensional image after all. A cross-sectional view of a two-dimensional image obtained by slicing the brain in half is formed by overlapping, and each pixel has information in a unit called " voxel **". I'm just there. When it comes to 4D brain images, it's just that each three-dimensional voxel information contains time-series information (when scanning with fMRI).
Brain images are often published in a special format called the NIfTY format, which needs to be loaded. This time, instead of Matlab, which is said to be often used by students and researchers, brain images are read and analyzed using a library called ** Nilearn ** of python (keep in mind that anyone can analyze it). Therefore, I stopped paying Matlab). The machine learning analysis itself is learned and predicted by the ** scikit-learn ** SVM that everyone who has done machine learning will have used or heard.
In addition, although it is an approach using machine learning, this time we will use fMRI data that masks a specific brain region to perform an analysis that predicts "what was being seen by fMRI at that time from brain activity" * *. In the brain analysis industry, it seems to be called ** brain information decoding **, but it's just machine learning (sorry for the people in the industry). In the past, we used an approach such as encoding, in which the behavior performed in some experimental task was used as an explanatory variable, and the brain region that responded statistically significantly when performed by fMRI was detected as an independent variable. I think it came from the fact that it was called decoding because it took the opposite approach (that is, instead of looking at the brain region that was statistically significantly excited by the task, the activity of one brain region. Predict tasks from).
This time, I analyzed it by referring to the tutorial of decoding analysis of Nilearn: Machine learning for Neuro-Imaging in Python, which is used for reading fMRI data, etc. ** The Haxby 2001 experiment ** Use the brain image published in.
A introduction tutorial to fMRI decoding
If you follow the tutorial above, it says that you can easily collect a dataset of subjects from The Haxby 2001 experiment with the following code.
from nilearn import datasets
#By default, the data for the second subject is retrieved
haxby_dataset = datasets.fetch_haxby()
However, it seems that there is a problem with the database of NIRTC that provides the data, but I could not download it, so I took the means to download it directly and save it locally. You can download subj2-2010.01.14.tar.gz, which contains the data of the second subject, from this URL, so if you want to analyze the data, please download it from here (even data of other subject numbers). does not matter).
http://data.pymvpa.org/datasets/haxby2001/
In this experiment, subjects are presented with various categories of visual stimuli (scissors, human faces, cats, etc.) in an fMRI scanner. In this tutorial, fMRI brain activity recorded within the area of the ventral cortical visual tract (which is involved in the representation of color and shape) predicts which category the subject is viewing with fMRI.
First, let's read the 4D fMRI data. As mentioned above, this is a format that contains time-series information when performing an experimental task in a brain image that is voxel data (three-dimensional data).
However, since it is 4D fMRI data, it cannot be visualized by processing corresponding to 3D fMRI data (without time series information) such as nilearn.plotting.plot_epi. Therefore, the visualized 4D fMRI data is averaged and converted into one 3D fMRI data. This can be achieved by importing from nilearn.image import mean_img
.
fmri_filename = "~/Desktop/nilearn/subj2/bold.nii.gz"
from nilearn import plotting
from nilearn.image import mean_img
plotting.view_img(mean_img(fmri_filename), threshold=None)
result
First of all, we were able to visualize the brain image. It's great to be able to visualize in just a few lines.
Maybe some people were relieved to see the word numpy. Here, we use a function called nilearn.input_data.NiftiMasker
to mask the fMRI data of the ventral cortical visual tract and convert it to numpy format.
First, let's visualize the fMRI data of only the ventral cortical visual tract.
mask_filename = "~/Desktop/nilearn/subj2/mask4_vt.nii.gz"
anat_filename = "~/Desktop/nilearn/subj2/anat.nii.gz"
#Try to display the mask range against the background of the subject's anatomical image
plotting.plot_roi(mask_filename, bg_img=anat_filename,
cmap='Paired')
result
Next, let's create a mask. I would like to standardize the data as a way to improve the accuracy of machine learning. The mask will be extracted in the form of a 2D array when it is ready for machine learning with nilearn.
from nilearn.input_data import NiftiMasker
masker = NiftiMasker(mask_img=mask_filename, standardize=True) #Standardize the mask
fmri_masked = masker.fit_transform(fmri_filename) #Apply the BOLD signal to the standardized mask
By the way, by executing masker.generate_report ()
, you can show the NIfTY image of the mask and check various parameters and so on.
You have now extracted the masked data matrix in the form of fmri_masked as numpy format. I will actually try it.
# fmri_masked is in the form of a numpy array.
print(fmri_masked)
#The size of this data is the number of voxels and time series points(Number of scans)It consists of the total number of.
# shape[0]Is the number of scans, shape[1]Is the number of voxels
print(fmri_masked.shape)
Output result
[[ 7.6757896e-01 2.3108697e+00 -2.0519443e-01 ... -1.0261143e+00
8.7993503e-02 2.0720518e+00]
[ 5.5640817e-01 1.6833434e+00 -2.4644937e-01 ... -7.0238107e-01
-3.4570047e-01 2.0341001e+00]
[ 7.6757896e-01 1.9186659e+00 1.0802225e-03 ... -9.9374104e-01
-2.7630943e-01 2.1479552e+00]
...
[-4.2905563e-01 -1.6896105e+00 -7.4150854e-01 ... -1.5440876e+00
1.8054217e+00 -1.6709718e-01]
[-1.4749455e-01 -1.8072717e+00 -2.4644937e-01 ... -1.7707009e+00
1.5452052e+00 7.8169477e-01]
[-2.1788482e-01 -1.4542881e+00 1.0802225e-03 ... -1.6412076e+00
1.2676411e+00 8.9554977e-01]]
(1452, 464)
As confirmed by fmri_masked.shape
, 464 stores the number of voxels in the masked brain region, and 1452 stores time series information (fMRI obtains fMRI data with a time resolution of TR, Normally, it is set at intervals of about 1 to 2.5 seconds, but decoding that analyzes using information on a specific brain region and neurofeedback that measures brain activity in real time and trains the subject In the method called, I think that a fairly short TR was often set).
As a test, the time series information of the first three voxels is extracted as follows.
#Display the first three voxels selected from the matrix.
import matplotlib.pyplot as plt
plt.plot(fmri_masked[5:150, :3])
plt.title('Voxel Time Series')
plt.xlabel('Scan number')
plt.ylabel('Normalized signal')
plt.tight_layout()
By the way, sharp people may wonder why the fmri_masked [5: 150,: 3]
part is not in the form of fmri_masked [: 150,: 3]
. This is called a ** dummy scan **, because the first scan that begins to image fMRI has a high signal value and is often not used in actual analysis. The signal value is high because the T1 effect is not stable (please google on other sites for details lol).
Now that we have extracted the fMRI data of the ventral cortical visual tract in numpy format, we will use this data for machine learning. ** This time we will be supervised learning, so we need a label in addition to the data **. Therefore, we will work to extract the information of the category that the subject was looking at at the time of the experiment in the downloaded data as a label.
import pandas as pd
#Read the action result information. There are as many labels as there are scans
behavioral = pd.read_csv('~/Desktop/nilearn/subj2/labels.txt', delimiter=' ')
Output result
labels chunks
0 rest 0
1 rest 0
2 rest 0
3 rest 0
4 rest 0
5 rest 0
6 scissors 0
7 scissors 0
8 scissors 0
9 scissors 0
10 scissors 0
11 scissors 0
12 scissors 0
13 scissors 0
14 scissors 0
... ... ...
1426 cat 11
1427 cat 11
1428 cat 11
1429 cat 11
1430 cat 11
1431 cat 11
1432 rest 11
1433 rest 11
1434 rest 11
1435 rest 11
1436 rest 11
1437 scissors 11
1438 scissors 11
1439 scissors 11
1440 scissors 11
1441 scissors 11
1442 scissors 11
1443 scissors 11
1444 scissors 11
1445 scissors 11
1446 rest 11
1447 rest 11
1448 rest 11
1449 rest 11
1450 rest 11
1451 rest 11
1452 rows × 2 columns
Since this task is a visual recognition task as described above, the label shows the experimental conditions (category of the object shown to the subject). chunks represent the number of sessions (because if you let the subject do the task forever, the physical load is not even, so usually you do similar tasks multiple times, checking your physical condition at session breaks. I will continue).
conditions = behavioral['labels']
Output result
0 rest
1 rest
2 rest
3 rest
4 rest
5 rest
6 scissors
7 scissors
8 scissors
9 scissors
10 scissors
11 scissors
12 scissors
13 scissors
14 scissors
15 rest
16 rest
17 rest
18 rest
19 rest
20 rest
21 face
22 face
...
1431 cat
1432 rest
1433 rest
1434 rest
1435 rest
1436 rest
1437 scissors
1438 scissors
1439 scissors
1440 scissors
1441 scissors
1442 scissors
1443 scissors
1444 scissors
1445 scissors
1446 rest
1447 rest
1448 rest
1449 rest
1450 rest
1451 rest
Name: labels, Length: 1452, dtype: object
This time, following the tutorial, we will use only the experimental conditions for cats and faces. There are many other labels, so if you are interested, please try to verify them in other categories.
#Fetch voxels corresponding to the number of scans under experimental conditions in which the face and cat are presented to the masked brain image
condition_mask = conditions.isin(['face','cat'])
fmri_masked = fmri_masked[condition_mask]
print(fmri_masked.shape)
#Apply the same conditions to labels
conditions = conditions[condition_mask]
print(conditions.shape)
This time, we will use scikit-learn's machine learning toolbox for the retrieved fmri_masked data. I am using an SVM linear kernel as the decoder. Since it is scikit-learn, you can do everything from learning to prediction in just a few lines.
from sklearn.svm import SVC
svc = SVC(kernel='linear')
#Learn and make predictions
svc.fit(fmri_masked, conditions)
prediction = svc.predict(fmri_masked)
However, this is completely useless for analysis, so cross-validation is performed (obviously).
from sklearn.model_selection import KFold
import numpy as np
cv = KFold(n_splits=5)
scores = []
for train, test in cv.split(X=fmri_masked):
svc.fit(fmri_masked[train], conditions.values[train])
prediction = svc.predict(fmri_masked[test])
scores.append((prediction == conditions.values[test]).sum()
/ float(len(conditions.values[test])))
#Average predictive performance
print(np.mean(scores))
Output result:
0.7628964059196617
It is also possible to use the cross_val_score function to perform distributed processing using multiple CPUs on the machine for calculation processing of evaluation for each subset. (It can be specified in the form of n_jobs = n, and if set to -1, parallel processing can be performed using all CPUs available on the machine).
from sklearn.model_selection import cross_val_score
cv_score = cross_val_score(svc, fmri_masked, conditions)
print(cv_score)
Output result:
[0.86363636 0.76744186 0.74418605 0.69767442 0.81395349]
** However, this is still insufficient for verification. The reason is that the performance evaluation does not take into account the impact of each session. ** **
The fMRI data is acquired on a session-by-session basis, and the noise is autocorrelated for each specific session. Therefore, when cross-validating with these implications in mind, it is necessary to make predictions across sessions.
from sklearn.model_selection import LeaveOneGroupOut
cv = LeaveOneGroupOut()
cv_score = cross_val_score(svc,
fmri_masked,
conditions,
cv=cv,
groups=session_label,
)
#Get the prediction accuracy in the verification data for each session
print(cv_score)
Output result:
[0.55555556 1. 0.66666667 0.66666667 0.77777778 0.72222222
0.88888889 0.38888889 0.66666667 0.5 0.77777778 0.66666667]
Finally, the model weights are estimated and displayed. To do this, we first convert the weights to NIfTY images.
coef_ = svc.coef_
#It is in the form of a numpy array, with one coefficient per voxel.(weight)have
print(coef_.shape)
Output result:
(1, 464)
coef_img = masker.inverse_transform(coef_)
print(coef_img)
Output result:
<class 'nibabel.nifti1.Nifti1Image'>
data shape (40, 64, 64, 1)
affine:
[[ -3.5 0. 0. 68.25 ]
[ 0. 3.75 0. -118.125]
[ 0. 0. 3.75 -118.125]
[ 0. 0. 0. 1. ]]
metadata:
<class 'nibabel.nifti1.Nifti1Header'> object, endian='<'
sizeof_hdr : 348
data_type : b''
db_name : b''
extents : 0
session_error : 0
regular : b''
dim_info : 0
dim : [ 4 40 64 64 1 1 1 1]
intent_p1 : 0.0
intent_p2 : 0.0
intent_p3 : 0.0
intent_code : none
datatype : float64
bitpix : 64
slice_start : 0
pixdim : [-1. 3.5 3.75 3.75 1. 1. 1. 1. ]
vox_offset : 0.0
scl_slope : nan
scl_inter : nan
slice_end : 0
slice_code : unknown
xyzt_units : 0
cal_max : 0.0
cal_min : 0.0
slice_duration : 0.0
toffset : 0.0
glmax : 0
glmin : 0
descrip : b''
aux_file : b''
qform_code : unknown
sform_code : aligned
quatern_b : 0.0
quatern_c : 1.0
quatern_d : 0.0
qoffset_x : 68.25
qoffset_y : -118.125
qoffset_z : -118.125
srow_x : [-3.5 0. 0. 68.25]
srow_y : [ 0. 3.75 0. -118.125]
srow_z : [ 0. 0. 3.75 -118.125]
intent_name : b''
magic : b'n+1'
Save the created NIfTY format image in nii.gz format
# nii.Save in gz file format
coef_img.to_filename('haxby_svc_weights.nii.gz')
Finally, let's visualize the weighting factor as a brain image.
#Actually plot the weight of SVM
from nilearn.plotting import plot_stat_map, show
plot_stat_map(coef_img, bg_img=anat_filename,
title="SVM weights", display_mode='yx')
show()
With the above, we have performed a general fMRI data analysis using machine learning.
I was planning to post this analysis on my personal blog, but I couldn't find anyone who was analyzing fMRI data on Qiita, so I posted it here (individual blogs are posted on my profile).
Image of fMRI data ... It may seem difficult just to hear it, but in the end it is just more noise than a general image, one more dimension, and removing the part that handles special formats. Image analysis. One goal is to be able to analyze maniac data for any person, and I am posting it through my personal blog and Qiita, so I hope that you will be interested in analyzing data that you rarely touch through this activity.
If you don't mind, LG ... Thank you for your cooperation! Lol
This time, we will analyze using raw fMRI time series data, but it is necessary to convolve what is originally called an HRF function and convert it from task presentation (nerve activity) to blood flow dynamics (BOLD reaction). Furthermore, in order to get a trial-by-trial activity map (beta map), it is necessary to fit at 1st-level using a general linear model. This makes it possible to make predictions between related conditions for each trial. Actually, various processing processes are required for actual fMRI data analysis, but this time it is an introduction aimed at applying machine learning to fMRI data, so I decided to perform the above-mentioned processing (actually). If you follow these steps, it won't fit in one article (sweat).
Please note (I would like to analyze something in earnest if there is an opportunity)
Recommended Posts