The python2.x series is confusing because the str object and the unicode object are separate. After researching various things, it became like this. The python3.x series seems to be easier because the text is unicode processed.
MacOS X 10.6.8 Python 2.6.1
python
# coding: UTF-8
import codecs
import string
import re
f_in = codecs.open('test.txt', 'r', 'utf-8')
f_out = codecs.open('test_out.txt', 'w', 'utf-8')
lines = f_in.readlines() #Read
lines2 = []
for line in lines:
line = string.replace(line,u'text',u'text') #text置換
line = re.sub(r'(\d)(?=(\d{3})+(?!\d))', r'\1', line) #Regular expression replacement
lines2.append(line) #Make a separate list
else:
f_out.write(string.join(lines2,'')) #writing
f_in.close()
f_out.close()
test.txt
This is sample text.
Insert a comma every 3 digits.
iPad mini 36800 yen
test_out.txt
This is a sample text.
Insert a comma every 3 digits.
iPad mini 36,800 yen
Postscript: I wrote the code that works with python3.3. After all, python3 also uses the codecs module, Is replace done by a function of str object and just not using u'' literal?
python
from __future__ import unicode_literals
If you add, all strings are treated as unicode even if there is no u'' literal, so It works normally with python2.6. That might be the best at the moment.
python
# coding: UTF-8
from __future__ import unicode_literals # <-Treat all character strings as unicode. Not required for 3 series
import codecs
import re
f_in = codecs.open('test.txt', 'r', 'utf-8')
f_out = codecs.open('test_out.txt', 'w', 'utf-8')
lines = f_in.readlines() #Read
lines2 = []
for line in lines:
line = line.replace('text','text') #text置換
line = re.sub(r'(\d)(?=(\d{3})+(?!\d))', r'\1,', line) #Regular expression replacement
lines2.append(line) #Make a separate list
else:
f_out.write(''.join(lines2)) #writing
f_in.close()
Recommended Posts