** Summary of data confirmation method using pandas **
Check the shape of the data
Data shape(Number of rows x number of columns).
print(〇〇.shape)
Check the column (column name)
print(〇〇.columns)
Show only the first 5 lines
If you want to display 5 lines.
print(〇〇.head())
Enter the specified number of lines, in parentheses of head () if you want to see
If you want to display 10 lines.
print(〇〇.head(10))
Kaggle's ** Titanic: Machine Learning from Disasters ** with train_data
Use info () for more details
Get information.
print(〇〇.info())
RangeIndex: (range) is 891, while Age, Cabin, Embarked are few, and you can see that ** data is missing **.
Use describe () for data descriptive statistics The NaN value is calculated with it removed, and the string is not included in this case.
Calculation of descriptive statistics(Other than character strings).
print(〇〇.describe())
Numerical descriptive statistics are expressed in the following form ** count number of data ** ** mean mean ** ** std standard deviation ** ** min minimum value ** ** 25% Number of 14th parts ** ** 50% Number of 4th site (median) ** ** 75% Number of 3rd and 4th parts ** ** max maximum value **
Category descriptive statistics count ** unique Number of unique data ** ** top mode ** ** Number of freq tops **
Timestamp descriptive statistics count unique top freq ** first 1st past value ** ** last most recent value **
By the way, you can also do statistical description of character strings
Calculation of descriptive statistics(String only).
print(〇〇.describe(include=['O'])) #Uppercase O The number is not zero!
print(〇〇.describe(include=['object'])) #For lowercase letters, type object properly
The form of descriptive statistics for strings count unique top freq
It is also possible to get all descriptive statistics with (include ='all')
Calculation of descriptive statistics(All).
〇〇.describe(include='all')
Recommended Posts