Purpose

Perform (non-) financial analysis using "CoARiJ"
Changes in business performance and ESG writing over time
Verification of Ito report
Actual ESG investment rather than correlation with business performance

See below for "CoARiJ" https://www.tis.co.jp/news/2019/tis_news/20191114_1.html https://github.com/chakki-works/CoARiJ/blob/master/README.md

Last time

https://qiita.com/vbnshin/items/09be86b4793c68f70172

things to do

Check dataset

Summary

Is the financial data incorrect? Please be careful when analyzing! !!

data

The data provided by "CoARiJ" is as follows

Financial data
financial data (from financial results information)
Is there an error? </ b>

stock data (from monthly market price list (domestic stocks))

The list is as follows

Non-financial data

Annual report (from EDINET, XBRL file format)

File parsed the above item by item (txt format)

CSR report (pdf format)

Not available in txt format </ b>

The types of documents obtained from EDINET are as follows (FY 2018)

Points to note in analysis

There is duplicate data

df_14 = pd.read_csv('../data/finance_reports/2014/2014/documents.csv', sep='\t') dup_name = df_14[df_14.duplicated()].iloc[0]['filer_name'] df_14[df_14['filer_name'] == dup_name]

edinet_code sec_code jcn filer_name fiscal_year fiscal_period submit_date period_start period_end doc_id ... operating_income_on_sales ordinary_income_on_sales capital_ratio dividend_payout_ratio doe open high low close average 55 E00091 19710 2010001034861 Chuo Built Industry Co., Ltd. 2014 FY 2015-06-24 2014-04-01 2015-03-31 S10053TB ... 7.78 7.41 31.99 14.01 1.69 139.0 208.0 108.0 118.0 139.25 56 E00091 19710 2010001034861 Chuo Built Industry Co., Ltd. 2014 FY 2015-06-24 2014-04-01 2015-03-31 S10053TB ... 7.78 7.41 31.99 14.01 1.69 139.0 208.0 108.0 118.0 139.25

Edinet code fluctuates

df_14 = pd.read_csv('../data/finance_reports/2014/2014/documents.csv', sep='\t') df_14 = df_14.groupby('edinet_code').max().reset_index() df_14_part = df_14[['filer_name', 'fiscal_year', 'roa']] dup_name = df_14_part[df_14_part['filer_name'].duplicated()].iloc[0]['filer_name'] df_14[df_14_part['filer_name'] == dup_name][['edinet_code', 'sec_code', 'jcn', 'filer_name', 'fiscal_year', 'fiscal_period', 'submit_date']]

edinet_code sec_code jcn filer_name fiscal_year fiscal_period submit_date 245 E00484 28140 5180001075845 Sato Foods Industries, Ltd. 2014 FY 2015-06-26 263 E00510 29230 8110001002068 Sato Foods Industries, Ltd. 2014 FY 2015-07-24

No ROE minus company (miss?)

df_14 = pd.read_csv('../data/finance_reports/2014/2014/documents.csv', sep='\t') df_14 = df_14.groupby('edinet_code').max().reset_index() df_15 = pd.read_csv('../data/finance_reports/2015/2015/documents.csv', sep='\t') df_15 = df_15.groupby('edinet_code').max().reset_index() df_16 = pd.read_csv('../data/finance_reports/2016/2016/documents.csv', sep='\t') df_16 = df_16.groupby('edinet_code').max().reset_index() df_17 = pd.read_csv('../data/finance_reports/2017/2017/documents.csv', sep='\t') df_17 = df_17.groupby('edinet_code').max().reset_index() df_18 = pd.read_csv('../data/finance_reports/2018/2018/documents.csv', sep='\t') df_18 = df_18.groupby('edinet_code').max().reset_index() df = pd.concat([df_14, df_15, df_16, df_17, df_18]) df = df[~df.duplicated()] df[df['filer_name'].isin(['Sato Foods Industry Co., Ltd.', 'Alpha Corporation', 'FUJI CORPORATION'])] print(len(df[df['roe'] < 0])) >>> 0

.. ..

Matching with positive data

ROE (Return on Equity) of Japan Display

[Securities Report-16th Term (April 1, 2017-March 31, 2018)] (https://disclosure.edinet-fsa.go.jp/E01EW/download?uji.verb=W0EZA104CXP001003Action&uji.bean=ee.bean.parent.EECommonSearchBean&PID=W1E63011&SESSIONKEY=1575770510504&lgKbn=2&pkbn=0&skbn=1&dskbxxxaskb= = & preId = 1 & mul = Japan Display & fls = on & cal = 2 & yer = 2018 & mon = & pfs = 5 & row = 100 & idx = 0 & str = & kbn = 1 & flg = & syoruiKanriNo = & s = S100D87L)

Value of "CoARiJ"

df[df['edinet_code'] == 'E30481'][['edinet_code', 'filer_name', 'fiscal_year', 'roe']]

edinet_code filer_name fiscal_year roe 3160 E30481 Japan Display Co., Ltd. 2014 4.13 3196 E30481 Japan Display Co., Ltd. 2015 2.92 3270 E30481 Japan Display Co., Ltd. 2016 10.64 2884 E30481 Japan Display Co., Ltd. 2018 734.39

All ROE is +, and there is no FY2017 data in the first place.

Does the value change whether it is concatenated or single?

Even so, it is strange that there are no ROE minus companies.

from now on

The accuracy of the data is not good, so no further analysis will be conducted at this time.

Since the CSR report is in pdf format, it takes several steps to use it for analysis.

Thank you for including the edinet code in the file name (with this, it is easy to link with other information).

I thought I'd try to extract information from the color usage of the CSR report, the number of photos, the number of characters, and so on, but how much would it cost for GCP?

In any case, I don't know if the performance data to be matched is correct, so let's stop the analysis.

Please let me know if there is an error in the analysis.

I don't think there is any mistake only for TIS. .. ..

Recommended Posts
Let's play with the corporate analysis data set "CoARiJ" created by TIS ①

Let's play with the corporate analysis data set "CoARiJ" created by TIS ②

Play with the power usage API provided by Yahoo

Let's analyze the questionnaire survey data [4th: Sentiment analysis]

Let's look at the scatter plot before data analysis

Let's visualize the rainfall data released by Shimane Prefecture

Data analysis with python 2

Data analysis with Python

A network diagram was created with the data of COVID-19.

Let's make the analysis of the Titanic sinking data like that

Let's try analysis! ~ Data scientists also started coding ~ By Fringe81

Sample data created with python

Let's play with 4D 4th

Let's play with Amedas data-Part 1

Gzip the data by streaming

Let's play with Amedas data-Part 4

Let's play with Amedas data-Part 3

Let's play with Amedas data-Part 2

Let's visualize the river water level data released by Shimane Prefecture

The first time a programming beginner tried simple data analysis by programming

Prepare a high-speed analysis environment by hitting mysql from the data analysis environment