Youtube Video commentary is also available.
P-025: For the receipt detail data frame (df_receipt), find the oldest sales date (sales_ymd) for each customer ID (customer_id) and display 10 items.
code
df_receipt.groupby('customer_id').sales_ymd.min().reset_index().head(10)
customer_id | sales_ymd | |
---|---|---|
0 | CS001113000004 | 20190308 |
1 | CS001114000005 | 20180503 |
2 | CS001115000010 | 20171228 |
3 | CS001205000004 | 20170914 |
4 | CS001205000006 | 20180207 |
5 | CS001211000025 | 20190322 |
6 | CS001212000027 | 20170127 |
7 | CS001212000031 | 20180906 |
8 | CS001212000046 | 20170811 |
9 | CS001212000070 | 20191018 |
**-Used when you want to process data with the same value collectively in Pandas DataFrame / Series and check the total or average of the data with the same value. -'Groupby' is used when you want to collect data with the same value or character string and perform common operations (total, average, etc.) on each same value or character string. -'.Sales_ymd.min ()' displays the minimum value (= oldest sales date) of'.sales_ymd'. -'.Reset_index ()' is used when you want to perform an operation to reassign the index numbers separated by'groupby' to serial numbers starting from 0. ** **
code
df_receipt.groupby('customer_id').agg({'sales_ymd':'min'}).reset_index().head(10)
Recommended Posts