[Python] 100 knocks on data science (structured data processing) 029 Explanation

Youtube Video commentary is also available.

problem

P-029: Find the mode of the product code (product_cd) for each store code (store_cd) for the receipt details data frame (df_receipt).

answer

`code`


df_receipt.groupby('store_cd').product_cd.apply(lambda x: x.mode()).reset_index() \
.set_index(['store_cd','level_1','product_cd'])

output

store_cd	level_1	product_cd
S12007	0	P060303001
S12013	0	P060303001
S12014	0	P060303001
S12029	0	P060303001
S12030	0	P060303001
S13001	0	P060303001
S13002	0	P060303001
S13003	0	P071401001
S13004	0	P060303001
S13005	0	P040503001
S13008	0	P060303001
S13009	0	P060303001
S13015	0	P071401001
S13016	0	P071102001
S13017	0	P060101002
S13018	0	P071401001
S13019	0	P071401001
S13020	0	P071401001
S13031	0	P060303001
S13032	0	P060303001
S13035	0	P040503001
S13037	0	P060303001
S13038	0	P060303001
S13039	0	P071401001
S13041	0	P071401001
S13043	0	P060303001
S13044	0	P060303001
S13051	0	P050102001
	1	P071003001
	2	P080804001
S13052	0	P050101001
S14006	0	P060303001
S14010	0	P060303001
S14011	0	P060101001
S14012	0	P060303001
S14021	0	P060101001
S14022	0	P060303001
S14023	0	P071401001
S14024	0	P060303001
S14025	0	P060303001
S14026	0	P071401001
S14027	0	P060303001
S14028	0	P060303001
S14033	0	P071401001
S14034	0	P060303001
S14036	0	P040503001
	1	P060101001
S14040	0	P060303001
S14042	0	P050101001
S14045	0	P060303001
S14046	0	P060303001
S14047	0	P060303001
S14048	0	P050101001
S14049	0	P060303001
S14050	0	P060303001

Commentary

-Pandas DataFrame / Series. -Use this when you want to process data with the same value together and check the total or average of the data with the same value. -'Groupby' is used when you want to collect data with the same value or character string and perform common operations (total, average, etc.) on each same value or character string. ** ** -'.Apply (lambda x: )'is a method ** that applies (= apply) the function to the column specified immediately before. Pass the argument x to the apply method by saying'lambda x: '. ** lambda is called an anonymous function and is used in a lambda expression that simply declares the function unnamed . (For more information on the apply method, click here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html), for more information on the mode method, click here (https:: //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mode.html)) -The automatically generated'level_1'column renumbers the index number from '0' when there are multiple modes of product_cd for each store_cd. -'.Reset_index ()'is used when you want to reassign the index numbers that have been separated by'groupby' to serial numbers starting from 0. ** ** -It doesn't matter if you don't have'.set_index ()', but you can use it to set'store_cd' to Multi-index (display as index over multiple lines).

For details on Multi-index, please see here **
In the case of the problem flow so far, you will want to answer with this code, but it will return an error. Please note that'mode' cannot be calculated with'.agg'.

`code`


df_receipt.groupby('store_cd').agg({'product_cd':'mode'}).reset_index()