Is there something like this? I have obtained toB company information, but CRM does not have a corporate number. There is basic information such as the corporate name and the year of establishment. What should be used as the join key when joining with a dataset of another list in such a case? I think I'm worried.
All company information can be combined into one piece of information as long as there is a corporate number set by the government. For example, suppose you may want to join table A and table B as follows:
Table A (company information)
company name | Year of establishment | Prefectures |
---|---|---|
hoge | fuga | 3 |
Table B (contact information)
company name | Estimated amount | Order status |
---|---|---|
hoge | 3000 | First connection |
Anyone who has come into contact with database languages such as sql will know. The DB load is applied to the combination of character strings. As long as you have a corporate number, you can use it as a common combination key. The Ministry of Economy, Trade and Industry's gbizinfo is convenient for obtaining a corporate number. It has this service REST API! Therefore, it is very easy to get a corporate number. https://info.gbiz.go.jp/api/index.html
Later, you will need X-hojinInfo-api-token for headers information. You need to apply for API usage in advance.
Suppose you have data that contains only the company name and the year of establishment, as shown below. I would like to include the corporate number in this data. The method at the time of request is GET
company name | Year of establishment |
---|---|
Rakuten Mobile, Inc. | 2018 |
Matsuya Foods Co., Ltd. | 2018 |
request.py
import json
import pandas as pd
import requests
class CorporateNumbers:
def __init__(self):
self.headers = {
"Accept": "application/json",
"X-hojinInfo-api-token": "###token###"
}
self.endpoint_url = 'https://info.gbiz.go.jp/hojin/v1/hojin'
def _create_taeger_company_dataframe(self):
df = pd.read_clipboard()
return df
def _get_corporate_number(self,df):
#df = self._create_taeger_company_dataframe()
name = df.name
founded_year = df.founded
results = []
for name,founded in zip(name,founded_year):
data = {
'name':name,
'founded':founded
}
res = requests.get(
url = self.endpoint_url,
headers = self.headers,
params = data
)
json = res.json()['hojin-infos']
results.extend(json)
df = pd.io.json.json_normalize(results)
return df
def _merge_dataframe(self):
df1 = self._create_taeger_company_dataframe()
df2 = self._get_corporate_number(df = df1)
df3 = pd.merge(df1,df2,on='name',how='left')
return df3
Recommended Posts