We have created ** "Pandas 100 Knock for Python Beginners" ** as content to efficiently learn the Python library Pandas, so we will publish it. This content is also in line with the content of the ** Python3 engineer certification data analysis test, so performing these 100 knocks will also be a qualification measure. ** Also, at the end of the knock, there is a survival prediction problem for Titanic passengers, which is also a practice for participating in machine learning competitions such as Kaggle.
No. | Classification | problem |
---|---|---|
1 | Basics | Display the first 5 lines of data read into df |
2 | Basics | Display the last 5 lines of data read into df |
3 | Basics | Check the DataFrame size of df |
4 | Basics | data1 in the input folder.Read csv file, store in df2, display first 5 lines |
5 | Basics | Sorted and displayed in ascending order in the fare column of df |
6 | Basics | df_Copy df to copy to see the first 5 lines |
7 | Basics | ① Check the data type of each column of df ② Check the data type of the cabin column of df |
8 | Basics | ① Check the data type of the pclass column of df with dtype (2) Convert from numeric type to character type and check the data type with dtype |
9 | Basics | Number of records in df(Number of lines)confirm |
10 | Basics | Number of records in df(Number of lines), Check the data type of each column and the presence or absence of missing values |
11 | Basics | df sex,Check the elements of the cabin column |
12 | Basics | Display df column name list in list format |
13 | Basics | Display df index list in ndarray format |
14 | Extraction | Show only column of df name |
15 | Extraction | Show only df name and sex columns |
16 | Extraction | df index(line)の4line目までを表示 |
17 | Extraction | df index(line)の4line目から10line目までを表示 |
18 | Extraction | View entire df using loc |
19 | Extraction | Show all df fare columns using loc |
20 | Extraction | Use loc to display up to the 10th row of the df fare column |
21 | Extraction | Show all df name and ticket columns using loc |
22 | Extraction | Use loc to show all columns from df name to cabin |
23 | Extraction | Display df age column up to 5th row using iloc |
24 | Extraction | df name,age,sexの列のみExtractionしdf2に格納 Then output as a csv file to the output folder |
25 | Extraction | dfのage列の値が30以上のデータのみExtraction |
26 | Extraction | dfのsex列がfemaleのデータのみExtraction |
27 | Extraction | dfのsex列がfemaleでかつageが40以上のデータのみExtraction |
28 | Extraction | queryを用いてdfのsex列がfemaleでかつageが40以上のデータのみExtraction |
29 | Extraction | Display data containing the character string "Mrs" in the name column of df |
30 | Extraction | Show only character type columns in df |
31 | Extraction | Counting the number of unique elements in each column of df |
32 | Extraction | Check the elements of the embarked column of df and the number of occurrences |
33 | processing | Changed age column of df index name "3" from 30 to 40 |
34 | processing | Change male → 0, femlae → 1 in the sex column of df and display the first 5 rows |
35 | processing | Add 100 to the fare column of df to display the first 5 rows |
36 | processing | Multiply the fare column of df by 2 to display the first 5 rows |
37 | processing | Round the fare column of df after the decimal point |
38 | processing | Add a column with column name "test" and all 1 values to df and display the first 5 rows |
39 | processing | Add the cabin and embarked columns to df_Add columns joined by(Column name is "test")And display the first 5 lines |
40 | processing | Add the age and embarked columns to df_Add columns joined by(Column name is "test")And display the first 5 lines |
41 | processing | Remove the body column from df and show the first 5 rows |
42 | processing | Remove the line with index name "3" from df and display the first 5 lines |
43 | processing | The column name of df2'name', 'class', 'Biology', 'Physics', 'Chemistry'change to Show first 5 lines of df2 |
44 | processing | The column name of df2'English'Biology'change to Show first 5 lines of df2 |
45 | processing | Changed index name "1" of df2 to "10" Show first 5 lines of df2 |
46 | processing | Check the number of missing values in all columns of df |
47 | processing | Substitute 30 for the missing value in the df age column After that, check the number of missing values of age |
48 | processing | Delete lines with even one missing value with df After that, check the number of missing values in df |
49 | processing | df survived column in array format(Array)Display with |
50 | processing | Shuffle and display df lines |
51 | processing | Shuffle the df line and reindex to display |
52 | processing | ① Count the number of duplicate lines in df2 |
53 | processing | Convert the name column of df to all uppercase and display |
54 | processing | Convert all df name columns to lowercase and display |
55 | processing | The word "female" in the sex column of df Replaced with "Python" |
56 | processing | "Allen" in the first row of the name column of df, Miss.Elisabeth Walton " Erase "Elisabeth"(need import re) |
57 | processing | Make sure there are no spaces in the prefecture and city columns of df5 「_Combine with(New column name is "test2")And display the first 5 lines |
58 | processing | Swap rows and columns in df2 |
59 | Merge and concatenate | Left join df3 to df2 and store in df2 |
60 | Merge and concatenate | Right-join df3 to df2 and store in df2 |
61 | Merge and concatenate | Innerly join df3 to df2 and store in df2 |
62 | Merge and concatenate | Outer join df3 to df2 and store in df2 |
63 | Merge and concatenate | Concatenate df2 and df4 in the column direction and store in df2 |
64 | Merge and concatenate | df2 and df4 are connected in the column direction and overlap Delete one of the name columns and store it in df2 |
65 | Merge and concatenate | df2 and df2 are connected in the row direction and overlap Delete one of the name columns and store it in df2 |
66 | statistics | Check the average value of the age column of df |
67 | statistics | Check the median of the age column of df |
68 | statistics | ① Total score for each student of df2 (total in row direction) (2) Sum of points for each df2 subject (total in the column direction) |
69 | statistics | Maximum score in English for df2 |
70 | statistics | Minimum score in English for df2 |
71 | statistics | Group by class in df2 and find the maximum, minimum, and average values of the subjects for each class.(Delete the name column) |
72 | statistics | dfの基本statistics量を確認(describe) |
73 | statistics | Between each column of df(Pearson)Check the correlation coefficient |
74 | statistics | scikit-Use learn to standardize df2's English, Mathmatics, and History |
75 | statistics | scikit-Standardize the English column of df2 using learn |
76 | statistics | scikit-Min the English, Mathmatics, and History columns of df2 using learn-Max scale |
77 | statistics | Get the row name of the maximum and minimum values of the fare column of df |
78 | statistics | Get the 0th, 25th, 50th, 75th and 100th percentiles of the df fare column |
79 | statistics | ① Get the mode of the age column of df ②value_counts()Check the number of elements in the age column at, and confirm the validity of the result of ①. |
80 | labeling | Label encode the sex column of df and display the first 5 rows of df |
81 | labeling | One sex column for df-hot encode and display the first 5 lines of df |
82 | Pandas plot | Show histogram of all numeric columns in df |
83 | Pandas plot | Display the age column of df as a histogram |
84 | Pandas plot | Display the total score of 3 subjects for each name of df2 in a bar graph |
85 | Pandas plot | Display 3 subjects for each element of the name column of df2 side by side in a bar graph |
86 | Pandas plot | Display 3 subjects for each element in the name column of df2 as a stacked bar graph |
87 | Pandas plot | Show scatter plot between each column of df |
88 | Pandas plot | Create a scatter plot with the age and fare columns of df |
89 | Pandas plot | In the graph drawn in [88], "age"-fare scatter " Give a graph title |
90 | Titanic Survivor Prediction | df_Label encoding sex and embarked columns of copy |
91 | Titanic Survivor Prediction | df_Check for missing values in copy |
92 | Titanic Survivor Prediction | df_Complement the missing values in the age and fare columns of copy with the average value of each column |
93 | Titanic Survivor Prediction | df_Delete unnecessary lines that are not used in machine learning in copy |
94 | Titanic Survivor Prediction | ①df_Extract pclass, age, sex, fare, embarked columns of copy and convert to ndarray format ②df_Extract the survived column of copy and convert it to ndarray format |
95 | Titanic Survivor Prediction | Divide the features and target created in [94] into training data and test data. |
96 | Titanic Survivor Prediction | Training data(features、target)Perform learning in a random forest using |
97 | Titanic Survivor Prediction | test_X Data Predict Passenger Survival |
98 | Titanic Survivor Prediction | Prediction result is test_y(Answer of survival)And how much Check if it was consistent(Evaluation index is accuracy) |
99 | Titanic Survivor Prediction | Each column in learning(Feature value)Show importance of |
100 | Titanic Survivor Prediction | test_Output the prediction result of X to the output folder with csv (file name is "submission".csv」) |
If you haven't installed Python yet, please install anaconda on your own PC first. In addition to Pandas, libraries such as Scikit-learn are also used in the problem.
After downloading the ZIP folder from GitHub, extract it to the local area of your PC.
Open the ipynb file stored in the "notebook" folder with Jupyter Notebook (try opening "01_Pandas_100_Knocks_for_Begginer_v1.0.ipynb" first).
After opening the ipynb file, execute the first cell to load the answer file and the dataset used in the question. The data set used is passenger data for the Titanic.
Enter the code for each question in the cell of each question.
If you do not know the answer, delete the "#" from the description "#print (ans [])" in the question cell and execute it to display the answer example.
pandas_100_knocks_v1.0 ├ notebook /… Stores 3 ipynb files ├ input /… Contains answer files for 100 questions and datasets used for questions └ output /… Stored here when outputting a file due to a problem
Hopefully, Python beginners can reach level 3 and set the problem (I think you can reach level 2 if you solve it 3 times).
The content can be downloaded from GitHub.
https://github.com/kunishou/Pandas_100_knocks
Range of use Anyone can use it regardless of individual or corporation (When you use it for volunteer study sessions or in-house training, please let us know and it will motivate the author. I am also happy to hear comments such as "This content helped me get the Python certification exam")
Notes Content cannot be redistributed or reorganized
Scratchpad of nbextensions is convenient as an extension of Jupyter Notebook, so we recommend installing it. While working on 100 knocks, it is troublesome to do "Add new cell → df.head ()" to check the data contents stored in the data frame. With Scratchpad, you can call up a disposable cell area with "Ctrl + B".
Please refer to the following for the installation method.
[Python] jupyter notebook extensions ~
If you have any questions or requests regarding this content, please contact us.
Recommended Posts