I want to give a group_id to a pandas data frame

Introduction

If you want to delete duplicates with pandas, or if you want to aggregate, you can use drop_duplicates or groupby.

How to remove duplicate elements in Pandas DataFrame or Series Python How to use Pandas groupby

However, there are times when I want to assign a group_id to each group under the same conditions as when groupby, but I didn't know how to do it, so I implemented it. (It may not be a best practice, but it was easy to implement)

Granting group_id

#pandas import
import pandas as pd

#Preparing the data frame
df = pd.DataFrame({
    'building_name': ['Building A', 'Building A', 'B building', 'C building', 'B building', 'B building', 'D building'],
    'property_scale': ['large', 'large', 'small', 'small', 'small', 'small', 'large'],
    'city_code': [1, 1, 1, 2, 1, 1, 1]
})
df
building_name property_scale city_code
Building A large 1
Building A large 1
B building small 1
C building small 2
B building small 1
B building small 1
D building large 1
#Group objectization
group_info = df.groupby(['property_scale', 'city_code'])
#Let's take a look at the contents
group_info.groups

{('large', 1): Int64Index([0, 1, 6], dtype='int64'), ('small', 1): Int64Index([2, 4, 5], dtype='int64'), ('small', 2): Int64Index([3], dtype='int64')}

#See also
group_info.get_group(('large', 1))
building_name property_scale city_code
Building A large 1
Building A large 1
D building large 1
# group_Granting id
df = pd.concat([
    group_info.get_group(group_name).assign(group_id=group_id)
    for group_id, group_name
    in enumerate(group_info.groups.keys())])
df
building_name property_scale city_code group_id
Building A large 1 0
Building A large 1 0
D building large 1 0
B building small 1 1
B building small 1 1
B building small 1 1
C building small 2 2

I will also make it a function

import pandas as pd
from pandas.core.frame import DataFrame

def add_group_id(df: DataFrame, by: list) -> DataFrame:
    """Group for records with duplicate by values_Give id.

    Args:
        df (DataFrame):Arbitrary data frame
        by (list):Column name to group

    Returns:
        DataFrame

    """
    #Already group_If the id column is included, group in by_Add id as well
    if 'group_id' in df.columns:
        by += ['group_id']
    group_info = df.groupby(by=by)
    new_df = pd.concat([
        group_info.get_group(group_name).assign(group_id=group_id)
        for group_id, group_name
        in enumerate(group_info.groups.keys())])
    return new_df

Postscript

Thanks to @r_beginners for commenting, it seems that groupby has a group_id calculation function in the first place.

import pandas as pd
from pandas.core.frame import DataFrame

def add_group_id(df: DataFrame, by: list) -> DataFrame:
    """Group for records with duplicate by values_Give id.

    Args:
        df (DataFrame):Arbitrary data frame
        by (list):Column name to group

    Returns:
        DataFrame

    """
    #Already group_If the id column is included, group in by_Add id as well
    if 'group_id' in df.columns:
        by += ['group_id']
    new_df = df.assign(group_id =df.groupby(by).ngroup())
    return new_df

As @nkay commented, pd.factorize () seems to work as well.

pandas methods Let's study more. ..

Recommended Posts

I want to give a group_id to a pandas data frame
I want to do ○○ with Pandas
I tried scraping food recall information with Python to create a pandas data frame
I want to print in a comprehension
I want to build a Python environment
I want to make matplotlib a dark theme
I want to easily create a Noise Model
I want to INSERT a DataFrame into MSSQL
I want to create a window in Python
Anyway, I want to check JSON data easily
I want to knock 100 data sciences with Colaboratory
I want to make a game with Python
I don't want to take a coding test
I want to get League of Legends data ③
I want to get League of Legends data ②
I want to create a plug-in type implementation
Make holiday data into a data frame with pandas
I want to easily find a delicious restaurant
I want to get League of Legends data ①
I want to write to a file with Python
I want to upload a Django app to heroku
I want to create a web application that uses League of Legends data ①
Change the data frame of pandas purchase data (id x product) to a dictionary
I want to embed a variable in a Python string
I want to easily implement a timeout in python
100 image processing knocks !! (021-030) I want to take a break ...
I want to generate a UUID quickly (memorandum) ~ Python ~
I want to transition with a button in flask
I want to climb a mountain with reinforcement learning
I want to write in Python! (2) Let's write a test
I want to find a popular package on PyPi
I want to randomly sample a file in Python
I want to easily build a model-based development environment
I want to work with a robot in python.
I want to split a character string with hiragana
I want to install a package of Php Redis
[Python] I want to make a nested list a tuple
I want to manually create a legend with matplotlib
I want to send a business start email automatically
I want to say that there is data preprocessing ~
I want to run a quantum computer with Python
I want to bind a local variable with lambda
I want to use a python data source in Re: Dash to get query results
I want a mox generator
I want to solve Sudoku (Sudoku)
I want a mox generator (2)
I want to be able to analyze data with Python (Part 3)
I want to be able to analyze data with Python (Part 1)
I want to make a blog editor with django admin
I want to start a jupyter environment with one command
[Python] I want to get a common set between numpy
I want to start a lot of processes from python
I want to make a click macro with pyautogui (desire)
I want to automatically generate a modern metal band name
Ingenuity to handle data with Pandas in a memory-saving manner
NikuGan ~ I want to see a lot of delicious meat! !!
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
[Introduction to Pandas] I tried to increase exchange data by data interpolation ♬
I want to make a click macro with pyautogui (outlook)
I want to use a virtual environment with jupyter notebook!