This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

** Be careful if you are trying to do it as it includes spoilers **

It may not be possible to update for a while. When it disappears, Sumanne

This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.

This solution is wrong! This interpretation is different! Please comment if you have any.

This time from 57 to 62. [Last time] 52-56 [First time with table of contents]

57th

P-057: Combine the extraction result of the previous question and gender, and create new category data that represents the combination of gender and age. The value of the category representing the combination is arbitrary. The first 10 items should be displayed.

Show only your program and the first line

df=df_customer.copy() df_bins=pd.cut(df.age,[10,20,30,40,50,60,150],right=False,labels=[10,20,30,40,50,60]) df=pd.concat([df[['customer_id','birth_day']],df_bins],axis=1) df.head(10)


 >|customer_id 	|birth_day 	|age|
 |--:|--:|--:|
 |CS021313000114 	|1981-04-29 	|30|


#### **`mine57.py`**
```py

df=pd.concat([df_customer[['customer_id','birth_day','gender_cd']],df_bins],axis=1)
df['age_gen']=df.gender_cd.astype('str')+df.age.astype('str')
df.head(10)

'''Model answer'''
df_customer_era['era_gender'] = df_customer['gender_cd'] + df_customer_era['age'].astype('str')
df_customer_era.head(10)

Since I did pd.concat, I felt that I didn't have to divert it from the last time.

still, Add a gender digit to 30 in this ʻagecolumn 1 (female) + 30 (age) = 130` Is the purpose of this time

I didn't understand what I wrote

`miss57.py`


df=pd.concat([df_customer[['customer_id','birth_day','gender_cd']],df_bins],axis=1)
df=df.groupby(['age','gender_cd']).agg({'customer_id':'count'})
pd.pivot_table(df,index='age',columns='gender_cd')

~~ I accidentally cross-tabulated ~~

58th

P-058: Make the gender code (gender_cd) of the customer data frame (df_customer) a dummy variable and extract it together with the customer ID (customer_id). You can display 10 results.

`mine58.py`


df=df_customer.copy()
pd.concat([df['customer_id'],pd.get_dummies(df['gender_cd'])],axis=1).head(10)

'''Model answer'''
pd.get_dummies(df_customer[['customer_id', 'gender_cd']], columns=['gender_cd']).head(10)

What is a dummy variable? I thought, I checked It seems that the corresponding item is created in the first column and the presence or absence of the element is indicated by trueʻorfalse` in the table.

Or rather, it's faster to look at the table

male	Female	unknown
0	1	0
0	0	1
0	1	0
0	1	0
0	1	0

things like this

59th

P-059: The sales amount (amount) of the receipt detail data frame (df_receipt) is totaled for each customer ID (customer_id), and the total sales amount is ** standardized ** to an average of 0 and a standard deviation of 1, and the customer ID. , Display with the total sales amount. The standard deviation used for standardization may be either unbiased standard deviation or sample standard deviation. However, if the customer ID starts with "Z", it represents a non-member, so exclude it from the calculation. You can display 10 results.

…

……

………

What is standardization?

I read various sites and tried to understand, but ~~ I was skipping math at that time ~~ I couldn't catch up with my understanding.

Shall I ask my seniors to tell me the reference website? Also write

df['hyou1'] =df['amount_sum'] - df.amount_sum.mean()

(Total-Average) / 1 (Standard deviation)

I make a mistake.

I was trying to look up the answer preprocessing.scale in a reverse lookup to try to understand it.

https://note.nkmk.me/python-list-ndarray-dataframe-normalize-standardize/ Second half

Normalization / standardization of pandas.DataFrame and pandas.Series Use pandas methods ~ Omitted ~

In the program print( (df.T - df.T.mean()) / df.T.std() ) # col1 col2 col3 # a -1.0 0.0 1.0 # b -1.0 0.0 1.0 # c -1.0 0.0 1.0

This is it In other words (Data / .mean ()) /. Std () If so

`mine59.py`


df=df_receipt.copy()
df=df.query('not customer_id.str.startswith("Z")',engine='python')
df=df.groupby('customer_id').agg({'amount':'sum'}).reset_index()

df['hyou1'] =(df['amount'] - df.amount.mean()) / df.amount.std()
df.head(10)

'''Model answer'''
#sklearn preprocessing.Calculated with sample standard deviation to use scale
df_sales_amount = df_receipt.query('not customer_id.str.startswith("Z")', engine='python'). \
    groupby('customer_id').agg({'amount':'sum'}).reset_index()
df_sales_amount['amount_ss'] = preprocessing.scale(df_sales_amount['amount'])
df_sales_amount.head(10)

It matched.

60th

P-060: The sales amount (amount) of the receipt detail data frame (df_receipt) is totaled for each customer ID (customer_id), and the total sales amount is normalized to the minimum value 0 and the maximum value 1 to the customer ID and sales amount. Display with the total. However, if the customer ID starts with "Z", it represents a non-member, so exclude it from the calculation. You can display 10 results.

At same site

print((df - df.min()) / (df.max() - df.min())) # col1 col2 col3 # a 0.0 0.0 0.0 # b 0.5 0.5 0.5 # c 1.0 1.0 1.0

Because there is, this is diverted

`mine60.py`


df=df_receipt.copy()
df=df.query('not customer_id.str.startswith("Z")',engine='python')
df=df.groupby('customer_id').agg({'amount':'sum'}).reset_index()
df['minmax'] =(df['amount'] - df.amount.min()) / (df.amount.max()-df.amount.min())
df.head(10)

'''Model answer'''
#sklearn preprocessing.Calculated with sample standard deviation to use scale
df_sales_amount = df_receipt.query('not customer_id.str.startswith("Z")', engine='python'). \
    groupby('customer_id').agg({'amount':'sum'}).reset_index()
df_sales_amount['amount_mm'] = preprocessing.minmax_scale(df_sales_amount['amount'])
df_sales_amount.head(10)

61st and 62nd

P-061: The sales amount (amount) of the receipt detail data frame (df_receipt) is totaled for each customer ID (customer_id), and the total sales amount is converted to the common logarithm (base = 10) to total the customer ID and sales amount. Display with. However, if the customer ID starts with "Z", it represents a non-member, so exclude it from the calculation. You can display 10 results.

P-062: The sales amount (amount) of the receipt detail data frame (df_receipt) is totaled for each customer ID (customer_id), and the total sales amount is converted to natural logarithm (base = e) to total the customer ID and sales amount. Display with. However, if the customer ID starts with "Z", it represents a non-member, so exclude it from the calculation. You can display 10 results.

Logarithmization is use exponential function

`mine61_62.py`


df=df_receipt.copy()
df=df.query('not customer_id.str.startswith("Z")',engine='python')
df=df.groupby('customer_id').agg({'amount':'sum'}).reset_index()

#60, common logarithm ratio
df['jouyou']=df.amount.apply(lambda x: math.log10(x))
#61, natural logarithm ratio
df['shizen']=df.amount.apply(lambda x: math.log(x))

df.head(10)

Can be put out with

`mohan61_62.py`


#sklearn preprocessing.Calculated with sample standard deviation to use scale
df_sales_amount = df_receipt.query('not customer_id.str.startswith("Z")', engine='python'). \
    groupby('customer_id').agg({'amount':'sum'}).reset_index()
df_sales_amount['amount_log10'] = np.log10(df_sales_amount['amount'] + 1)
df_sales_amount.head(10)

#sklearn preprocessing.Calculated with sample standard deviation to use scale
df_sales_amount = df_receipt.query('not customer_id.str.startswith("Z")', engine='python'). \
    groupby('customer_id').agg({'amount':'sum'}).reset_index()
df_sales_amount['amount_loge'] = np.log(df_sales_amount['amount'] + 1)
df_sales_amount.head(10)

………

What is + 1?

Up to here for this time

The log is the part that I started to get confused in high school mathematics, so I'm really sorry. If you know about this + 1, please comment. Really Niwa Karanai

Data Science 100 Knock ~ Battle for less than beginners part11

57th

miss57.py