Join data with main key (required) and subkey (optional) in Python pandas

Join data with main key (required) and subkey (optional) in Python pandas

Overview

Before Join csv normalized by Python pandas and process it for easy confirmation

I introduced the article. In case of practical use

I was quite addicted to the following data I will post it.

I'm sure there is a smarter way, Once the purpose was achieved.

Suppose you have the following two data. each Main key (required) Subkey (optional) There is.

The transaction data is in Excel format, and the abbreviation of the subkey is "" There is a certain space. The master data is in CSV format, and the subkey abbreviation is stored without "" spaces.

If you store it as an RDBMS and join with a trim, it will be one shot. Realization with Python alone

Usage environment

Python 3.8.2 Visual Studio Code 1.45.0

Source code

As shown below, join and finally output Excel for confirmation

subkey_join.py



#Join the main key (required) and subkey (optional).

import pandas as pd
import re

def main():
    #Body processing
    #Read data
    new_acc_master = pd.read_csv(".\\subkey_join\\Master.csv",encoding="cp932", dtype=str)
    value_trn = pd.read_excel(".\\subkey_join\\transaction.xlsx", dtype=str)
        
    #Delete without sub value
    value_trn['Subcode'] = value_trn['Subcode'].str.strip()
    new_acc_master['Subcode'] = new_acc_master['Subcode'].str.strip()

    #Replaced Tran's no sub-complete value with no value. Data with a blank source was OK.
    value_trn.Subcode[value_trn.Subcode == ""] = "No value"
    print(type(value_trn))
    print(value_trn.dtypes)

    
    #☆☆ NG I tried to do the same on the master side, but it is not changed by the following logic
    #new_acc_masternew_acc_master.sub[new_acc_master.sub == ""] = "No value"
    
    #Master's subcode without subvalues
    #new_acc_master = new_acc_master.replace('^$',{'Subcode':'No value'},regex=True)
    print(new_acc_master['Subcode'].dtype)
    
    #Specify the Python character type, astype
    #The type on pandas does not change with object, but where there is no value,"nan"To change to
    new_acc_master['Subcode'] = new_acc_master['Subcode'].astype(str)
    print(new_acc_master['Subcode'].dtype)

    ##☆☆ This is OK I think there is a smarter way. ..
    new_acc_master.Subcode[new_acc_master.Subcode == "nan"] = "No value"
    
    print(type(new_acc_master))
    print(new_acc_master.dtypes)
    
    #Left join with the main key and sub key.
    jpin_pd = pd.merge(value_trn, new_acc_master, how="left", left_on=['Main code','Subcode'], right_on=['Main code','Subcode'],suffixes=('_trn','_new'))
    

    #Excel output
    jpin_pd.to_excel(".\\subkey_join\\view_data.xlsx")
    

if __name__ == '__main__': main()


Recommended Posts

Join data with main key (required) and subkey (optional) in Python pandas
[Python] Join two tables with pandas
[Python3] Save the mean and covariance matrix in json with pandas
Get additional data in LDAP with python
Data pipeline construction with Python and Luigi
Try working with binary data in Python
Dealing with "years and months" in Python
Easily graph data in shell and Python
Working with 3D data structures in pandas
Python variables and data types learned in chemoinformatics
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Receive and display HTML form data in Python
[Python] Swapping rows and columns in Numpy data
Read table data in PDF file with Python
Interactively visualize data with TreasureData, Pandas and Jupyter.
[Python] Read Japanese csv with pandas without garbled characters (and extract columns written in Japanese)
Full-width and half-width processing of CSV data in Python
Data analysis environment construction with Python (IPython notebook + Pandas)
Process csv data with python (count processing using pandas)
Start numerical calculation in Python (with Homebrew and pip)
Investigate Java and python data exchange with Apache Arrow
Data analysis with python 2
Key input in Python
Data visualization with pandas
Data manipulation with Pandas!
Key input in Python
Data analysis with Python
Create a decision tree from 0 with Python and understand it (3. Data analysis library Pandas edition)
Try scraping the data of COVID-19 in Tokyo with Python
Design and test Verilog in Python only with Veriloggen and cocotb.
Notes on handling large amounts of data with python + pandas
Ingenuity to handle data with Pandas in a memory-saving manner
Get rid of dirty data with Python and regular expressions
[Python] Plotly draws Pandas dataframes in one shot with Cufflinks
Solve the spiral book (algorithm and data structure) with python!
[Python] Strengths and weaknesses of DataFrame in terms of time required
Comparison of data frame handling in Python (pandas), R, Pig
Play with Mastodon's archive in Python 2 Count replies and favourites
[Python] Random data extraction / combination from DataFrame using random and pandas
How to create dataframes and mess with elements in pandas
Data science companion in python, how to specify elements in pandas
Install CaboCha in Ubuntu environment and call it with Python.
Get additional data to LDAP with python (Writer and Reader)
How to log in to AtCoder with Python and submit automatically
"Introduction to data analysis by Bayesian statistical modeling starting with R and Stan" implemented in Python
Sample data created with python
Programming with Python and Tkinter
Handle Ambient data in Python
Read csv with python pandas
Python and hardware-Using RS232C with Python-
Working with LibreOffice in Python
Scraping with chromedriver in python
Display UTM-30LX data in Python
Get Youtube data with python
Working with sounds in Python
Scraping with Selenium in Python
Key additions to pandas 1.1.0 and 1.0.0
[Python] Change dtype with pandas
Scraping with Tor in Python
Combined with permutations in Python
python with pyenv and venv