A record of solving the problems in the first half of Chapter 2. The execution result of UNIX command is also shown.
The target file is hightemp.txt as shown on the web page.
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Count the number of lines. Use the wc command for confirmation.
# -*- coding: utf-8 -*-
__author__ = 'todoroki'
f = open('hightemp.txt')
lines = f.readlines()
print len(lines)
f.close()
#=> 24
Read the target file line by line and count the number of lines.
cat hightemp.txt | grep -c ""
#=> 24
Display the text with cat
and pipe it to grep
to count the number of lines.
By the way, you can count the number of lines in the same way with wc -l
instead of grep -c ""
after the pipe, but with wc
there is space in the output. I will join.
It's often better to use grep -c ""
, as spaces may be annoying when piped the output of the number of lines to another process.
Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.
# -*- coding: utf-8 -*-
__author__ = 'todoroki'
import re
inputfile = 'hightemp.txt'
outputfile = 'hightemp_tab2space.txt'
f = open(inputfile)
lines = f.readlines()
g = open(outputfile, 'w')
for line in lines:
line = re.sub('\t', ' ', line)
g.write(line)
print line
f.close()
g.close()
#=>Kochi Prefecture Ekawasaki 41 2013-08-12
#=>40 Kumagaya, Saitama Prefecture.9 2007-08-16
#=>40 Tajimi, Gifu Prefecture.9 2007-08-16
#=>Yamagata 40 Yamagata.8 1933-07-25
#=>Yamanashi Prefecture Kofu 40.7 2013-08-10
#=>Wakayama Prefecture Katsuragi 40.6 1994-08-08
#=>Shizuoka Prefecture Tenryu 40.6 1994-08-04
#=>40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
#=>40 Koshigaya, Saitama Prefecture.4 2007-08-16
#=>Gunma Prefecture Tatebayashi 40.3 2007-08-16
#=>40 Kamisatomi, Gunma Prefecture.3 1998-07-04
#=>Aisai 40, Aichi Prefecture.3 1994-08-05
#=>Chiba Prefecture Ushiku 40.2 2004-07-20
#=>40 Sakuma, Shizuoka Prefecture.2 2001-07-24
#=>40 Uwajima, Ehime Prefecture.2 1927-07-22
#=>40 Sakata, Yamagata Prefecture.1 1978-08-03
#=>Gifu Prefecture Mino 40 2007-08-16
#=>Gunma Prefecture Maebashi 40 2001-07-24
#=>39 Mobara, Chiba.9 2013-08-11
#=>39 Hatoyama, Saitama Prefecture.9 1997-07-05
#=>Toyonaka 39, Osaka.9 1994-08-08
#=>Yamanashi Prefecture Otsuki 39.9 1990-07-19
#=>39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
#=>Aichi Prefecture Nagoya 39.9 1942-08-02
Replace the tab character \ t
with a space.
cat hightemp.txt | tr "\t" " " > hightemp_tr.txt
#=> (Output is the same as above)
Save the extracted version of only the first column of each row as col1.txt and the extracted version of only the second column as col2.txt. Use the cut command for confirmation.
# -*- coding: utf-8 -*-
__author__ = 'todoroki'
inputfile = 'hightemp.txt'
outputfile1 = 'col1.txt'
outputfile2 = 'col2.txt'
f = open(inputfile)
lines = f.readlines()
g = open(outputfile1, "w")
h = open(outputfile2, "w")
for line in lines:
line = line.split('\t')
g.write(line[0].strip('\n') + '\n')
h.write(line[1].strip('\n') + '\n')
f.close()
g.close()
h.close()
# (col1.txt)
#=>Kochi Prefecture
#=>Saitama
#=>Gifu Prefecture
#=>Yamagata Prefecture
#=>Yamanashi Prefecture
#=>Wakayama Prefecture
#=>Shizuoka Prefecture
#=>Yamanashi Prefecture
#=>Saitama
#=>Gunma Prefecture
#=>Gunma Prefecture
#=>Aichi prefecture
#=>Chiba
#=>Shizuoka Prefecture
#=>Ehime Prefecture
#=>Yamagata Prefecture
#=>Gifu Prefecture
#=>Gunma Prefecture
#=>Chiba
#=>Saitama
#=>Osaka
#=>Yamanashi Prefecture
#=>Yamagata Prefecture
#=>Aichi prefecture
# (col2.txt)
#=>Ekawasaki
#=>Kumagaya
#=>Tajimi
#=>Yamagata
#=>Kofu
#=>Katsuragi
#=>Tenryu
#=>Katsunuma
#=>Koshigaya
#=>Tatebayashi
#=>Kamisatomi
#=>Aisai
#=>Ushiku
#=>Sakuma
#=>Uwajima
#=>Sakata
#=>Mino
#=>Maebashi
#=>Mobara
#=>Hatoyama
#=>Toyonaka
#=>Otsuki
#=>Tsuruoka
#=>Nagoya
Split by tab delimiter and output each target to a file
cut -f 1 hightemp.txt > hightemp_cut1.txt
cut -f 2 hightemp.txt > hightemp_cut2.txt
#=> (Same as above, so output is omitted)
Combine the col1.txt and col2.txt created in 12 to create a text file in which the first and second columns of the original file are arranged tab-delimited. Use the paste command for confirmation.
# -*- coding: utf-8 -*-
__author__ = 'todoroki'
inputfile1 = 'col1.txt'
inputfile2 = 'col2.txt'
outputfile = 'col_merge.txt'
f = open(inputfile1)
g = open(inputfile2)
h = open(outputfile, "w")
lines1 = f.readlines()
lines2 = g.readlines()
for a, b in zip(lines1, lines2):
h.write(a.strip() + '\t' + b.strip() + '\n')
f.close()
g.close()
h.close()
#=>Kochi Prefecture Ekawasaki
#=>Kumagaya, Saitama Prefecture
#=>Gifu Prefecture Tajimi
#=>Yamagata Prefecture Yamagata
#=>Yamanashi Prefecture Kofu
#=>Wakayama Prefecture Katsuragi
#=>Shizuoka Prefecture Tenryu
#=>Yamanashi Prefecture Katsunuma
#=>Koshigaya, Saitama Prefecture
#=>Gunma Prefecture Tatebayashi
#=>Kamisatomi, Gunma Prefecture
#=>Aisai, Aichi Prefecture
#=>Chiba Prefecture Ushiku
#=>Sakuma, Shizuoka Prefecture
#=>Uwajima, Ehime Prefecture
#=>Yamagata Prefecture Sakata
#=>Gifu Prefecture Mino
#=>Gunma Prefecture Maebashi
#=>Mobara, Chiba
#=>Hatoyama, Saitama Prefecture
#=>Toyonaka, Osaka
#=>Yamanashi Prefecture Otsuki
#=>Yamagata Prefecture Tsuruoka
#=>Aichi Prefecture Nagoya
Read two files and process sequence objects in parallel with the zip function.
paste col1.txt col2.txt > hightemp_paste.txt
#=> (Output is the same as above)
Receive the natural number N by means such as a command line argument and display only the first N lines of the input. Use the head command for confirmation.
# -*- coding: utf-8 -*-
__author__ = 'todoroki'
import sys
if len(sys.argv) == 3:
N = int(sys.argv[1])
f = open(sys.argv[2])
lines = f.readlines()
for i in xrange(N):
print lines[i].strip()
f.close()
else:
print "please input \'N\' and \'FILENAME\'"
# (python problem14.py 5 hightemp.txt)
#=>Kochi Prefecture Ekawasaki 41 2013-08-12
#=>40 Kumagaya, Saitama Prefecture.9 2007-08-16
#=>40 Tajimi, Gifu Prefecture.9 2007-08-16
#=>Yamagata 40 Yamagata.8 1933-07-25
#=>Yamanashi Prefecture Kofu 40.7 2013-08-10
Output the number of lines read as many as the number of lines received.
head -n 5 hightemp.txt
#=> (Output is the same as above)
Recommended Posts