[Python] 100 knocks on data science (structured data processing) 026 Explanation

Youtube Video commentary is also available.

problem

P-026: For the receipt detail data frame (df_receipt), find the newest sales date (sales_ymd) and the oldest sales date for each customer ID (customer_id), and display 10 different data.

answer

`code`


df_sales_ymd = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()

df_sales_ymd.columns = ['customer_id','sales_ymd_max','sales_ymd_min']

df_sales_ymd.query('sales_ymd_max != sales_ymd_min').head(10)

output

	customer_id	sales_ymd_max	sales_ymd_min
1	CS001114000005	20190731	20180503
2	CS001115000010	20190405	20171228
3	CS001205000004	20190625	20170914
4	CS001205000006	20190224	20180207
13	CS001214000009	20190902	20170306
14	CS001214000017	20191006	20180828
16	CS001214000048	20190929	20171109
17	CS001214000052	20190617	20180208
20	CS001215000005	20181021	20170206
21	CS001215000040	20171022	20170214

Commentary

-Use this when you want to process data with the same value collectively in Pandas DataFrame / Series and check the total or average of the data with the same value. -**'Groupby' is used when you want to collect data with the same value or character string and perform common operations (total, average, etc.) on each same value or character string. ** ** ・ '.Agg ({'sales_ymd': ['max','min']})' is the maximum value (= newest sales date) and minimum value (= oldest sales date) of'.sales_ymd'. Is displayed. ** ** -'.Reset_index ()'is used when you want to reassign the index numbers that have been separated by'groupby' to serial numbers starting from 0. ** ** -The 2nd and 3rd lines are the code already mentioned, but the column name is specified by'.columns' and the condition is specified by'.query' **

The same result is output on the second line even with the following code. However,'customer_id' becomes'customer_id_'

`code`


df_sales_ymd.columns = ["_".join(pair) for pair in df_sales_ymd.columns]

Recommended Posts

[Python] 100 knocks on data science (structured data processing) 018 Explanation

[Python] 100 knocks on data science (structured data processing) 023 Explanation

[Python] 100 knocks on data science (structured data processing) 030 Explanation

[Python] 100 knocks on data science (structured data processing) 022 Explanation

[Python] 100 knocks on data science (structured data processing) 017 Explanation

[Python] 100 knocks on data science (structured data processing) 026 Explanation

[Python] 100 knocks on data science (structured data processing) 016 Explanation

[Python] 100 knocks on data science (structured data processing) 024 Explanation

[Python] 100 knocks on data science (structured data processing) 027 Explanation

[Python] 100 knocks on data science (structured data processing) 029 Explanation

[Python] 100 knocks on data science (structured data processing) 015 Explanation

[Python] 100 knocks on data science (structured data processing) 028 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation

"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation

[Python] Data Science 100 Knock (Structured Data Processing) 001-010 Impressions + Explanation Link Summary

Try "100 knocks on data science" ①

Getting started with Python with 100 knocks on language processing

Preparing to try "Data Science 100 Knock (Structured Data Processing)"

Challenge 100 data science knocks

Data science 100 knock (structured data processing) environment construction (Windows10)

That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 2]

That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 1]

That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 3]

That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 5]

That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 4]

That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 6]

Start data science on the cloud

Image processing with Python 100 knocks # 2 Grayscale

Image processing with Python 100 knocks # 8 Max pooling

I took Udemy's "Practical Python Data Science"

Image processing with Python 100 knocks # 7 Average pooling

Video processing using Python + OpenCV on Mac

Image processing with Python 100 knocks # 9 Gaussian filter

Books on data science to read in 2020

Periodically execute Python Script on AWS Data Pipeline

Folium: Visualize data on a map with Python

[Data science memorandum] Handling of missing values [python]

Try importing MLB data on Mac and Python

TensorFlow: Run data learned in Python on Android

Run Python on Apache to view InfluxDB data

100 language processing knocks 03 ~ 05

100 language processing knocks (2020): 40

100 language processing knocks (2020): 32

[Python] Challenge 100 knocks! (015 ~ 019)

100 language processing knocks (2020): 35

python image processing

100 language processing knocks (2020): 47

100 language processing knocks (2020): 39

Python on Windows

twitter on python3

100 language processing knocks (2020): 22

[Python] Challenge 100 knocks! (030-034)