[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 3: Preparation for missing value complementation)

theme

This is the 3rd time of a project to make a note of the contents of hands-on that everyone will challenge the famous theme "House Price" problem of kaggle. It's more of a memo than a commentary, but I hope it helps someone somewhere.

Today's work

Complementing missing values (finished with preparation)

In conclusion, there were quite a few missing values. However, if you take a closer look, it does not mean that it is not missing, and that "there is no value" is meaningful in the first place.

Understanding and dealing with deficiency situations

Excerpt from a reference article.

When you download the data from Kaggle, you will notice that it also contains a file called "data_description.txt". This file details what data is stored in the variables. Then we know that the majority of deficiencies do not mean that there is no information, but that the deficiencies themselves are information. For example, take a look at PoolQC (pool quality), which has the most defects. The loss of this variable means that the pool does not exist in the house, and the data loss itself is information. For other variables (categorical variables) as well, a deficiency simply means that the facility or equipment does not exist. Also, for numeric variables, the deficiency only means that the occupied area is zero, and it is not without information. Therefore, the following completion is performed for the loss of categorical variables and numeric variables.

Categorical variables

Apparently, it is a system item that expresses the meaning with the code in the DB value. Example) 1: Male, 2: Female, etc. https://www1.doshisha.ac.jp/~mjin/R/Chap_45/45.html

Numeric variable

That is, it looks like data that simply represents a quantitative value that is the opposite of a categorical variable.

Complement missing values for categorical variables

First of all, when storing it, the index value is taken out for each data type (I try to configure it so that I can understand the meaning when I finish it even if I do not know what I am saying).

na_float_cols = alldata[na_col_list].dtypes[alldata[na_col_list].dtypes=='float64'].index.tolist() 

Numeric type completion preparation

That's it.

It's been free for about two weeks, but I'll do my best to update it again. (It's about time I want to input Python from the basics and reorganize it ..., python seems to overdo everything in one line ...)

Recommended Posts

[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 3: Preparation for missing value complementation)
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 2: Checking Missing Values)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (4th: Complementing Missing Values (Complete))
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (7th: Preparing to build a prediction model)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (8th: Building a Forecast Model)
[Hands-on for beginners] Read kaggle's "Predicting House Prices" line by line (6th: Distribution conversion of objective variables)
[Python] Types of statistical values (features) and calculation methods
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (4th: Complementing Missing Values (Complete))
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 2: Checking Missing Values)
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 3: Preparation for missing value complementation)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)
[Hands-on for beginners] Read kaggle's "Predicting House Prices" line by line (6th: Distribution conversion of objective variables)
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (7th: Preparing to build a prediction model)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (8th: Building a Forecast Model)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (4th: Complementing Missing Values (Complete))
Predicting Home Prices (Regression by Linear Regression (kaggle)) ver1.0
[For beginners] Read DB authentication information from environment variables
Predicting Home Prices (Regression by Linear Regression (kaggle)) ver1.0