When converting a csv file containing Japanese to a json file, I had a problem with character code conversion.
UnicodeDecodeError: 'ascii' codec can't decode byte...
I solved it with the following code assuming that the character code of the csv file to be read has been converted to utf-8n.
csvToJson.py
# -*- coding: utf-8 -*-
#module
import codecs
import csv
import json
#Open CSV file
f = open('Specify csv file to read', 'r')
reader = csv.DictReader(f, fieldnames = ("Field name 1", "Field name 2", ...,"Field name N"))
print "CSV loaded."
#Convert to JSON format
out = json.dumps( [row for row in reader], sort_keys=False, ensure_ascii=False, encoding='utf_8' )
print "JSON parserd."
#Save to JSON file
f=codecs.open('Specify json file to export','w', 'utf_8')
f.write(out)
print "JSON saved."
There are many things that python beginners do not understand, but do you prevent garbled characters with the following internal processing? When reading a file with open, do not specify the character code, and use csv.DictReader to read the unicode object Create a dictionary. By setting ensure_ascii = False in json.dumps, unicode object can be handled, and by specifying utf-8 in encoding, the content of out object is set to utf-8, and Japanese is output without garbled characters. Are you able to do it?
Recommended Posts