Youtube Video commentary is also available.
P-026: For the receipt detail data frame (df_receipt), find the newest sales date (sales_ymd) and the oldest sales date for each customer ID (customer_id), and display 10 different data.
code
df_sales_ymd = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()
df_sales_ymd.columns = ['customer_id','sales_ymd_max','sales_ymd_min']
df_sales_ymd.query('sales_ymd_max != sales_ymd_min').head(10)
customer_id | sales_ymd_max | sales_ymd_min | |
---|---|---|---|
1 | CS001114000005 | 20190731 | 20180503 |
2 | CS001115000010 | 20190405 | 20171228 |
3 | CS001205000004 | 20190625 | 20170914 |
4 | CS001205000006 | 20190224 | 20180207 |
13 | CS001214000009 | 20190902 | 20170306 |
14 | CS001214000017 | 20191006 | 20180828 |
16 | CS001214000048 | 20190929 | 20171109 |
17 | CS001214000052 | 20190617 | 20180208 |
20 | CS001215000005 | 20181021 | 20170206 |
21 | CS001215000040 | 20171022 | 20170214 |
-Use this when you want to process data with the same value collectively in Pandas DataFrame / Series and check the total or average of the data with the same value. -**'Groupby' is used when you want to collect data with the same value or character string and perform common operations (total, average, etc.) on each same value or character string. ** ** ・ '.Agg ({'sales_ymd': ['max','min']})' is the maximum value (= newest sales date) and minimum value (= oldest sales date) of'.sales_ymd'. Is displayed. ** ** -'.Reset_index ()'is used when you want to reassign the index numbers that have been separated by'groupby' to serial numbers starting from 0. ** ** -The 2nd and 3rd lines are the code already mentioned, but the column name is specified by'.columns' and the condition is specified by'.query' **
code
df_sales_ymd.columns = ["_".join(pair) for pair in df_sales_ymd.columns]
Recommended Posts