Easy-to-understand [Pandas] practice / data confirmation method for high school graduates

** Summary of data confirmation method using pandas **

I want to take a quick look

Check the shape of the data

Data shape(Number of rows x number of columns).


print(〇〇.shape)

Check the column (column name)

print(〇〇.columns)

Show only the first 5 lines

If you want to display 5 lines.


print(〇〇.head())

Enter the specified number of lines, in parentheses of head () if you want to see

If you want to display 10 lines.


print(〇〇.head(10))

Kaggle's ** Titanic: Machine Learning from Disasters ** with train_data image.png

I want to see in detail (check the number of rows, columns, elements, type, memory)

Use info () for more details

Get information.


print(〇〇.info())

image.png RangeIndex: (range) is 891, while Age, Cabin, Embarked are few, and you can see that ** data is missing **.

I want to know descriptive statistics (check the tendency and properties of data)

What are descriptive statistics? : One of the statistical methods, which calculates the mean, variance, standard deviation, etc. of the collected data to clarify the distribution, and grasps the tendency and properties of the data.

Use describe () for data descriptive statistics The NaN value is calculated with it removed, and the string is not included in this case.

Calculation of descriptive statistics(Other than character strings).


print(〇〇.describe())

Numerical descriptive statistics are expressed in the following form ** count number of data ** ** mean mean ** ** std standard deviation ** ** min minimum value ** ** 25% Number of 14th parts ** ** 50% Number of 4th site (median) ** ** 75% Number of 3rd and 4th parts ** ** max maximum value ** image.png

Category descriptive statistics count ** unique Number of unique data ** ** top mode ** ** Number of freq tops **

Timestamp descriptive statistics count unique top freq ** first 1st past value ** ** last most recent value **

By the way, you can also do statistical description of character strings

Calculation of descriptive statistics(String only).


print(〇〇.describe(include=['O'])) #Uppercase O The number is not zero!
print(〇〇.describe(include=['object'])) #For lowercase letters, type object properly

The form of descriptive statistics for strings count unique top freq image.png

It is also possible to get all descriptive statistics with (include ='all')

Calculation of descriptive statistics(All).


〇〇.describe(include='all')

image.png