See below for "CoARiJ" https://www.tis.co.jp/news/2019/tis_news/20191114_1.html https://github.com/chakki-works/CoARiJ/blob/master/README.md
https://qiita.com/vbnshin/items/09be86b4793c68f70172
The data provided by "CoARiJ" is as follows
df_14 = pd.read_csv('../data/finance_reports/2014/2014/documents.csv', sep='\t')
dup_name = df_14[df_14.duplicated()].iloc[0]['filer_name']
df_14[df_14['filer_name'] == dup_name]
edinet_code sec_code jcn filer_name fiscal_year fiscal_period submit_date period_start period_end doc_id ... operating_income_on_sales ordinary_income_on_sales capital_ratio dividend_payout_ratio doe open high low close average
55 E00091 19710 2010001034861 Chuo Built Industry Co., Ltd. 2014 FY 2015-06-24 2014-04-01 2015-03-31 S10053TB ... 7.78 7.41 31.99 14.01 1.69 139.0 208.0 108.0 118.0 139.25
56 E00091 19710 2010001034861 Chuo Built Industry Co., Ltd. 2014 FY 2015-06-24 2014-04-01 2015-03-31 S10053TB ... 7.78 7.41 31.99 14.01 1.69 139.0 208.0 108.0 118.0 139.25
df_14 = pd.read_csv('../data/finance_reports/2014/2014/documents.csv', sep='\t')
df_14 = df_14.groupby('edinet_code').max().reset_index()
df_14_part = df_14[['filer_name', 'fiscal_year', 'roa']]
dup_name = df_14_part[df_14_part['filer_name'].duplicated()].iloc[0]['filer_name']
df_14[df_14_part['filer_name'] == dup_name][['edinet_code', 'sec_code', 'jcn', 'filer_name', 'fiscal_year', 'fiscal_period', 'submit_date']]
edinet_code sec_code jcn filer_name fiscal_year fiscal_period submit_date
245 E00484 28140 5180001075845 Sato Foods Industries, Ltd. 2014 FY 2015-06-26
263 E00510 29230 8110001002068 Sato Foods Industries, Ltd. 2014 FY 2015-07-24
df_14 = pd.read_csv('../data/finance_reports/2014/2014/documents.csv', sep='\t')
df_14 = df_14.groupby('edinet_code').max().reset_index()
df_15 = pd.read_csv('../data/finance_reports/2015/2015/documents.csv', sep='\t')
df_15 = df_15.groupby('edinet_code').max().reset_index()
df_16 = pd.read_csv('../data/finance_reports/2016/2016/documents.csv', sep='\t')
df_16 = df_16.groupby('edinet_code').max().reset_index()
df_17 = pd.read_csv('../data/finance_reports/2017/2017/documents.csv', sep='\t')
df_17 = df_17.groupby('edinet_code').max().reset_index()
df_18 = pd.read_csv('../data/finance_reports/2018/2018/documents.csv', sep='\t')
df_18 = df_18.groupby('edinet_code').max().reset_index()
df = pd.concat([df_14, df_15, df_16, df_17, df_18])
df = df[~df.duplicated()]
df[df['filer_name'].isin(['Sato Foods Industry Co., Ltd.', 'Alpha Corporation', 'FUJI CORPORATION'])]
print(len(df[df['roe'] < 0]))
>>> 0
ROE (Return on Equity) of Japan Display
df[df['edinet_code'] == 'E30481'][['edinet_code', 'filer_name', 'fiscal_year', 'roe']]
edinet_code filer_name fiscal_year roe
3160 E30481 Japan Display Co., Ltd. 2014 4.13
3196 E30481 Japan Display Co., Ltd. 2015 2.92
3270 E30481 Japan Display Co., Ltd. 2016 10.64
2884 E30481 Japan Display Co., Ltd. 2018 734.39
The accuracy of the data is not good, so no further analysis will be conducted at this time.
Since the CSR report is in pdf format, it takes several steps to use it for analysis.
Thank you for including the edinet code in the file name (with this, it is easy to link with other information).
I thought I'd try to extract information from the color usage of the CSR report, the number of photos, the number of characters, and so on, but how much would it cost for GCP?
In any case, I don't know if the performance data to be matched is correct, so let's stop the analysis.
Please let me know if there is an error in the analysis.
I don't think there is any mistake only for TIS. .. ..
Recommended Posts