Recently I started working on kaggle and there was a method to simplify the processing of columns that I had been trying hard to make by hand, so I will summarize it as a memorandum. Only the usage used in the competition I'm doing is summarized briefly, so please jump to the article I referred to for detailed usage.
In the competition I'm doing this time, the given data existed as train_data and train_label, and there were duplicate items in the two csv. Ultimately, these two data must be merged and given to the model, so duplicate content must be thinned out before being merged.
``) Check if the value you want to check is included in the DataFrame. The return value is a bool type, and False is returned by default. If you add
~ to the beginning, True will be returned. --where (
target condition,
true,
False,
option) Perform each process for the index that matches the target conditions. With the option ʻinplace = True
, it will be reflected in the original DataFrame.
If the 2nd and 3rd arguments are omitted, the corresponding index will be returned.--groupby (['first column name you want to group'
, 'second column name you want to group'
]) .Process that you want to apply
.mean () or its side Calculate the average price of group B that belongs to group A. Use it like this. There will be no duplication of the specified column name.
--agg ({' Column name to be processed'
: ['What you want to process 1 (min, max, etc.)'
, What you want to process 2
]})
Convenient to use after groupby
note.nkmk.me CUBE SUGAR CONTAINER
Recommended Posts