Before Join csv normalized by Python pandas and process it for easy confirmation
I introduced the article. In case of practical use
I was quite addicted to the following data I will post it.
I'm sure there is a smarter way, Once the purpose was achieved.
Suppose you have the following two data. each Main key (required) Subkey (optional) There is.
The transaction data is in Excel format, and the abbreviation of the subkey is "" There is a certain space. The master data is in CSV format, and the subkey abbreviation is stored without "" spaces.
If you store it as an RDBMS and join with a trim, it will be one shot. Realization with Python alone
Python 3.8.2 Visual Studio Code 1.45.0
As shown below, join and finally output Excel for confirmation
subkey_join.py
#Join the main key (required) and subkey (optional).
import pandas as pd
import re
def main():
#Body processing
#Read data
new_acc_master = pd.read_csv(".\\subkey_join\\Master.csv",encoding="cp932", dtype=str)
value_trn = pd.read_excel(".\\subkey_join\\transaction.xlsx", dtype=str)
#Delete without sub value
value_trn['Subcode'] = value_trn['Subcode'].str.strip()
new_acc_master['Subcode'] = new_acc_master['Subcode'].str.strip()
#Replaced Tran's no sub-complete value with no value. Data with a blank source was OK.
value_trn.Subcode[value_trn.Subcode == ""] = "No value"
print(type(value_trn))
print(value_trn.dtypes)
#☆☆ NG I tried to do the same on the master side, but it is not changed by the following logic
#new_acc_masternew_acc_master.sub[new_acc_master.sub == ""] = "No value"
#Master's subcode without subvalues
#new_acc_master = new_acc_master.replace('^$',{'Subcode':'No value'},regex=True)
print(new_acc_master['Subcode'].dtype)
#Specify the Python character type, astype
#The type on pandas does not change with object, but where there is no value,"nan"To change to
new_acc_master['Subcode'] = new_acc_master['Subcode'].astype(str)
print(new_acc_master['Subcode'].dtype)
##☆☆ This is OK I think there is a smarter way. ..
new_acc_master.Subcode[new_acc_master.Subcode == "nan"] = "No value"
print(type(new_acc_master))
print(new_acc_master.dtypes)
#Left join with the main key and sub key.
jpin_pd = pd.merge(value_trn, new_acc_master, how="left", left_on=['Main code','Subcode'], right_on=['Main code','Subcode'],suffixes=('_trn','_new'))
#Excel output
jpin_pd.to_excel(".\\subkey_join\\view_data.xlsx")
if __name__ == '__main__': main()
Recommended Posts