I think that the handling of character codes can always be mentioned as a demon gate of Python. When I first started using Python, I was tired of handling this character code and would I use it again! There was a time when I thought. I'm used to it now. .. ..
What were you trying to do
It's just this.
If there was a Japanese comment in the query log, the UnicodeEncodeError at the beginning might occur when writing.
I will write the situation and solution. Python is 2.7. I'm sorry this time. .. .. ..
When I got it with DB API, I was able to get the query log without any problem and deleted unnecessary character strings. I was also able to write a trial log! So I thought, "Oh, then I just write it to a file!"
Now let's write that to a file.
# -*- coding: utf-8 -*-
#Actually, the log is acquired by API, but here we will log it as a character string to check the operation.
log = "aaa Japanese"
with open("test.txt", "a") as f:
f.write(log + "\r\n")
Looking at the file test.txt
aaa Japanese
It is written properly. Now all you have to do is get the query log from the API and run it. I was wondering. So when I get the log from the API and write it, the following error occurs. .. (I'm sorry, but I will omit the API part)
Traceback (most recent call last):
File "writetest.py", line 11, in <module>
f.write(log + "\r\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-6: ordinal not in range(128)
__ In conclusion, the string obtained by API was handled as Unicode type. Apparently in python2, when writing with write, it seems to be an error caused by trying to write with the default ascii. __ It hasn't been investigated in great detail, but it seems like that.
How did you solve it?
log = log.encode("utf_8")
I converted it to utf-8 once and wrote it.
# -*- coding: utf-8 -*-
#Get logs with API
log = (Get logs with API)
log = log.encode("utf_8")
with open("test.txt", "a") as f:
f.write(log + "\r\n")
By the way,
f.write(log.encode("utf_8") + "\r\n")
But it was a similar error.
Including the above solution
There seems to be a method such as.
By the way, let's check the simple operation of Unicode type.
# coding: utf-8
str_1 = "Japanese"
str_2 = u"Japanese"
print str_1
print str_2
print type(str_1)
print type(str_2)
print len(str_1)
print len(str_2)
print ("Book" in str_1)
print (u"Book" in str_2)
print str_1.find("Book")
print str_2.find(u"Book")
When you do this
Japanese# print str_1
Japanese# print str_2
<type 'str'> # print type(str_1)
<type 'unicode'> # print type(str_2)
9 # print len(str_1)
3 # print len(str_2)
True # print ("Book" in str_1)
True # print (u"Book" in str_2)
3 # print str_1.find("Book")
1 # print str_2.find(u"Book")
A quick look, In str type, the character string is handled in byte format. The unicode type is treated as a character (which one can intuitively judge) You can see that.
From this, I think that unicode is easier to handle when dealing with character strings.
For that reason, you can understand that the log when the log was acquired by the API by python at the beginning was acquired as unicode type.
By the way, I didn't know this movement at all.
Recommended Posts