Search / Delete Missing Values in "Kaggle Memorandum"

Purpose

A memo on how to delete a column with a missing value.

Environment / usage data

Data used: Kaggle cources: Rent data for Intermediate Machine Learning --Missing Values

Environment: Kaggle notebook

What i did

Preparation of module / os, reading of data

`DropColumn.py`


#os,Module import
import os
import pandas as pd

#Data reading
X_full=pd.read_csv('../input/train.csv',index_col='Id')

X_full has the following indexes

`DropColumn.py`


X_full.columns

Among them, the Column containing the defect is

`DropColumn.py`


cols_missing=[col for col in X_full.columns
            if X_full[col].isnull().any()]
cols_missing

It seems. Delete these all at once.

`DropColumn.py`


reduced_X_full=X_full.drop(cols_missing,axis=1)
reduced_X_full

Deletion completed.

2. Completion of missing values

How to use scikit-learn's SimpleImputer

SimpleImputer uses statistical values such as median and mean to complement missing values.

For example, if you want to complement with the median Specify as ʻimputer = SimpleImputer (strategy ='median')`.

`ImputeValue.py`


#Definition of imputer
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(strategy='median')

#X_Complement full missing values
imputed_X_full=pd.DataFrame(imputer.fit_transform(X_full))

At this rate, the column names of ʻimputed_X_full` are ordinal.

`ImputeValue.py`


imputer_X_full.columns

Undo column name

`ImputeValue.py`


imputed_X_full.columns=X_full.columns
imputed_X_full.columns

Completion completed.