Youtube Video commentary is also available.
P-024: For the receipt detail data frame (df_receipt), find the newest sales date (sales_ymd) for each customer ID (customer_id) and display 10 items.
code
df_receipt.groupby('customer_id').sales_ymd.max().reset_index().head(10)
customer_id | sales_ymd | |
---|---|---|
0 | CS001113000004 | 20190308 |
1 | CS001114000005 | 20190731 |
2 | CS001115000010 | 20190405 |
3 | CS001205000004 | 20190625 |
4 | CS001205000006 | 20190224 |
5 | CS001211000025 | 20190322 |
6 | CS001212000027 | 20170127 |
7 | CS001212000031 | 20180906 |
8 | CS001212000046 | 20170811 |
9 | CS001212000070 | 20191018 |
**-Used when you want to process data with the same value collectively in Pandas DataFrame / Series and check the total or average of the data with the same value. -'Groupby' is used when you want to collect data with the same value or character string and perform a common operation (total or average) for each same value or character string. -'.Sales_ymd.max ()' displays the maximum value (= newest sales date) of'.sales_ymd'. -'.Reset_index ()' is used when you want to perform an operation to reassign the index numbers separated by'groupby' to serial numbers starting from 0. ** **
code
df_receipt.groupby('customer_id').agg({'sales_ymd':'max'}).reset_index().head(10)
Recommended Posts