What i did

Even if the ntile function that was often used in SQL during decyl analysis is searched, It didn't come out, so I created a function.

Dedicated to the older brothers who felt "ntile is the best. Why isn't it Python .." in SQL.

ntile function (edited a little)

Well, as a usage, Have df = pandas dataframe inserted, col = Decyl Enter the column name of the target you want to analyze. n = 10 is set, but if you want to make 100 groups, set n = 100 etc. If n_tile = True, return the dataframe given as columns for the number of decades, If set to False, a dataframe with various aggregate functions will be returned. agg = Aggregate function you want to use when n_tile False above, For those who say "just sum", use agg = ["sum"] as an argument.

Comments are in English because I am studying English.

def ntile(df, col, n=10, n_tile=True, ascending=False, aggs=["sum", "count", "min", "max", "median", "mean", "std"]):
    # check pandas dataframe type
    if type(df) == pd.core.frame.DataFrame:
        # check int or float
        if type(df[f"{col}"].max()) == np.int64 or type(df[f"{col}"].max()) == np.float64:
            ntile_df = pd.DataFrame()
            length = len(df)
            df = df.sort_values(col, ascending=ascending).reset_index(drop=True)
            div_n = math.ceil(len(df) / n)

            # mark N group 
            for idx, num in enumerate(range(0, length, div_n)):
                n_df = df[num: num+div_n]
                n_df["decile"] = idx + 1
                ntile_df = ntile_df.append(n_df)
            
            # return ntile_df if need only decile data
            if n_tile:
                return ntile_df
            
            # otherwise aggregate col
            agg_df = ntile_df.groupby("decile").agg({f"{col}": aggs}).reset_index()
            #  clean columns
            agg_df.columns = ["".join(c) for c in agg_df.columns]
            # merge two data
            ntile_df = ntile_df.merge(agg_df, on="decile").copy()
            
            return ntile_df
        
        raise TypeError(f"`{col}` must be int or float type")
    raise TypeError(f"`{type(df)}` must be pandas.core.frame.DataFrame type")

Create sample data

import pandas as pd
import random
#Create a random price guy
price = random.sample(range(10000, 900000), k=10000)
#Create a dataframe using the above
data = pd.DataFrame(price, columns=["price"])

Contents is like this.

スクリーンショット 2019-11-10 13.14.05.png

Use it like this

#Put data in the function. n=Leave the default of 10
data = ntile(data, "price")
#Group by with decile and apply various aggregate functions.
data.groupby("decile").agg({"price": ["sum", "count", "min", "max", "median", "mean"]}).reset_index()

n=10 スクリーンショット 2019-11-10 13.17.16.png

n=100 スクリーンショット 2019-11-10 13.17.52.png

If n_tile = False

data = ntile(data, "price", n=5 , n_tile=False)

data.pricemin.value_counts().sort_index().reset_index().rename(columns={"index": "price_segment", "pricemin": "count"})

And look at this.

スクリーンショット 2019-11-10 14.52.49.png

Have you ever done something like the following? And "Isn't there a function that creates a class with n% of all classes in the target column?" Have you ever felt that? スクリーンショット 2019-11-10 14.48.44.png

I'm a beginner, so I've thought about the above several times. I looked at ntile every time and corrected it by hand, but from now on, it is possible to change it just by changing the part of n = 10. After all, python has a smaller turn than SQL.

At the end

I don't know if the search is insufficient, but it is strange that there is no ntile function in SQL like python, Well, I tried to make it as a practice, so if you have changed jobs from SQLer to pythoner, please use it.