Convert excel data to JSON with python.
The excel data to be converted is from Standard Tables of Food Composition in Japan. I would like to use the following fruit excel file. (Click to download the file.) https://www.mext.go.jp/component/a_menu/science/detail/__icsFiles/afieldfile/2016/01/15/1365344_1-0207r.xlsx
I will try two methods, ʻexcel2json and ʻopenpyxl
.
(ʻExcel2json` is not working well.)
excel2json is a module that converts excel data to JSON with python, but its functions are very limited. There is no choice but to read excel and create a json file.
I will install it for the time being. Note that if you do not include the -3
at the end, another module will be installed.
$ pip install excel2json-3
Let's use it once it is installed. This is the only code
excel2json.py
from excel2json import convert_from_file
convert_from_file("PATH of excel file you want to convert")
When executed, a json file will be generated in the same directory as the excel file. The name of the json file is the name of the excel sheet (fixed).
When I put the downloaded excel file as it is and execute it, the json file could not be created successfully, so delete the extra part on the table and try again.
The modified excel file looks like this ↓
When I open the created json file and see ...
{
"\u98df\u54c1\u7fa4": "07",
"\u98df\u54c1\u756a\u53f7": "07001",
"\u7d22\u5f15\u756a\u53f7": 751.0,
...
}
It has become unicode like this. When I tried to fix it with extensions called "encode decode" of vscode, it became like this.
{
"Food group": "07",
"Food number": "07001",
"Index number": 751.0,
...
}
If the amount of code is small and the data is already in place, it can be done very easily, but it is a little inconvenient because you cannot specify the data to be acquired. Regarding the character code, I could not find any method other than the above, so I will try other methods for the time being.
openpyxl is a module for operating excel in python. First, install openpyxl
$ pip install openpyxl
I will try using it once it is installed.
Please note that the excell file is the same as the downloaded one. This time, we will get 7 items of food number, food name, dietary fiber, potassium, iron, vitamin B1 and vitamin C.
import openpyxl
import json
load_book = openpyxl.load_workbook('excel file PATH')
sheet = load_book['07 Fruits']
json_path = 'json file PATH'
fruits_list = [{
"food_id": 0,
"name": ""
}]
for i in range(9, 183):
food_id = sheet.cell(row = i, column = 2).value
name = sheet.cell(row = i, column = 4).value
dietary_fiber = sheet.cell(row = i, column = 21).value
potassium = sheet.cell(row = i, column = 24).value
iron = sheet.cell(row = i, column = 28).value
vitamin_b1 = sheet.cell(row = i, column = 48).value
vitamin_c = sheet.cell(row = i, column = 56).value
food_id = int(food_id)
if dietary_fiber == 'Tr':
dietary_fiber = 0
if potassium == 'Tr':
potassium = 0
if iron == 'Tr':
iron = 0
if vitamin_b1 == 'Tr':
vitamin_b1 = 0
if vitamin_c == 'Tr':
vitamin_c = 0
if name.split(" ")[0][0] == '(' or name.split(" ")[0][0] == '(':
name = name.split(" ")[1]
else:
name = name.split(" ")[0]
if fruits_list[-1]['name'] != name:
fruits_list.append({
"food_id": food_id,
"name": name,
"dietary_fiber": dietary_fiber,
"potassium": potassium,
"iron": iron,
"vitamin_b1": vitamin_b1,
"vitamin_c": vitamin_c,
})
fruits_list.pop(0)
data_dict = {
"data": "fruits",
"fruits": fruits_list
}
with open(json_path, mode = 'w', encoding = 'utf-8') as f:
f.write(json.dumps(data_dict, ensure_ascii = False, indent = 4))
I will explain the part related to the article.
import openpyxl
import json
load_book = openpyxl.load_workbook('excel file PATH')
sheet = load_book['sheet name']
json_path = 'json file PATH'
Since I want to operate openpyxl and json, I also import the json module. Read the excel file with ʻopenpyxl.load_workbook ()and get the sheet with
load_book ['sheet name']`.
food_id = sheet.cell(row = 1, column = 2).value
Get the data by specifying the cell of excel by row and column. With this code, the data of 1 row and 2 columns is acquired.
fruits_list.append({
"food_id": food_id,
"name": name,
"dietary_fiber": dietary_fiber,
"potassium": potassium,
"iron": iron,
"vitamin_b1": vitamin_b1,
"vitamin_c": vitamin_c,
})
Collect the acquired data in a dictionary and add it to the array.
data_dict = {
"data": "fruits",
"fruits": fruits_list
}
with open(json_path, mode = 'w', encoding = 'utf-8') as f:
f.write(json.dumps(data_dict, ensure_ascii = False, indent = 4))
Finally, make the previous data into a dictionary and write it to the json file.
mode ='a'
in ʻopen ()specifies that the file should be opened in write mode. By the way, if
mode is
r, it will be in read mode, and if it is ʻa
, it will be in append mode.
Recommended Posts