Chapter 2 The code is now posted on github.
The text files used in this range are as follows
hightemp.txt
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
Count the number of lines. Use the wc command for confirmation.
10.py
#Try using list comprehension
num_lines = len([line for line in open('./hightemp.txt')])
print(num_lines)
#For checking results
import subprocess
output = subprocess.check_output(['wc','-l','./hightemp.txt'])
print(output)
Code execution result and command execution result
$ python 10.py
24
b'24 ./hightemp.txt\n'
If you look at other people's code after the end of Chapter 1, the for sentence is short! I thought If you look it up, there is a list comprehension notation.
Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.
11.py
#Convert tabs on each line to single character spaces in list comprehension
space_text = [line.expandtabs(1) for line in open('./hightemp.txt')]
[print(line) for line in space_text]
#Confirmation command
# $ cat ./hightemp.txt | tr '\t' ' '
$ python 11.py
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
$ cat ./hightemp.txt | tr '\t' ' '
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
I'm a little worried that there is a line break
Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.
12.py
#Slice each row with tabs in list comprehension to list the first and second columns
col1 = [line.split('\t')[0] for line in open('./hightemp.txt')]
col2 = [line.split('\t')[1] for line in open('./hightemp.txt')]
#File open in write mode
f1 = open('./col1.txt','w')
f2 = open('./col2.txt','w')
#Join with line feed code
f1.write('\n'.join(col1))
f2.write('\n'.join(col2))
#Confirmation command
# $ cat ./hightemp.txt | cut -f 1
# $ cat ./hightemp.txt | cut -f 2
$ python 12.py ← col1.txt,col2.txt is created
$ cat ./hightemp.txt | cut -f 1
Kochi Prefecture
Saitama
Gifu Prefecture
Yamagata Prefecture
Yamanashi Prefecture
Wakayama Prefecture
Shizuoka Prefecture
Yamanashi Prefecture
Saitama
Gunma Prefecture
Gunma Prefecture
Aichi prefecture
Chiba
Shizuoka Prefecture
Ehime Prefecture
Yamagata Prefecture
Gifu Prefecture
Gunma Prefecture
Chiba
Saitama
Osaka
Yamanashi Prefecture
Yamagata Prefecture
Aichi prefecture
$ cat ./hightemp.txt | cut -f 2
Ekawasaki
Kumagaya
Tajimi
Yamagata
Kofu
Katsuragi
Tenryu
Katsunuma
Koshigaya
Tatebayashi
Kamisatomi
Aisai
Ushiku
Sakuma
Uwajima
Sakata
Mino
Maebashi
Mobara
Hatoyama
Toyonaka
Otsuki
Tsuruoka
Nagoya
col1.txt
Kochi Prefecture
Saitama
Gifu Prefecture
Yamagata Prefecture
Yamanashi Prefecture
Wakayama Prefecture
Shizuoka Prefecture
Yamanashi Prefecture
Saitama
Gunma Prefecture
Gunma Prefecture
Aichi prefecture
Chiba
Shizuoka Prefecture
Ehime Prefecture
Yamagata Prefecture
Gifu Prefecture
Gunma Prefecture
Chiba
Saitama
Osaka
Yamanashi Prefecture
Yamagata Prefecture
Aichi prefecture
col2.txt
Ekawasaki
Kumagaya
Tajimi
Yamagata
Kofu
Katsuragi
Tenryu
Katsunuma
Koshigaya
Tatebayashi
Kamisatomi
Aisai
Ushiku
Sakuma
Uwajima
Sakata
Mino
Maebashi
Mobara
Hatoyama
Toyonaka
Otsuki
Tsuruoka
Nagoya
Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.
13.py
#Read for the time being
col1 = [line for line in open('./col1.txt')]
col2 = [line for line in open('./col2.txt')]
new_file = open('./new_file.txt','w')
#Concatenate and process two lists with zip
#Concatenate two lists with tabs, delete line feed code as one line
#Last break and go to the next line
for col in zip(col1, col2):
new_file.write('\t'.join(col).replace('\n',''))
new_file.write('\n')
#Confirmation command
# $ paste ./col1.txt ./col2.txt
$ python 13.py
$ cat new_file.txt
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
Yamagata Prefecture Yamagata
Yamanashi Prefecture Kofu
Wakayama Prefecture Katsuragi
Shizuoka Prefecture Tenryu
Yamanashi Prefecture Katsunuma
Koshigaya, Saitama Prefecture
Gunma Prefecture Tatebayashi
Kamisatomi, Gunma Prefecture
Aisai, Aichi Prefecture
Chiba Prefecture Ushiku
Sakuma, Shizuoka Prefecture
Uwajima, Ehime Prefecture
Yamagata Prefecture Sakata
Gifu Prefecture Mino
Gunma Prefecture Maebashi
Mobara, Chiba
Hatoyama, Saitama Prefecture
Toyonaka, Osaka
Yamanashi Prefecture Otsuki
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya
$ paste ./col1.txt ./col2.txt
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
Yamagata Prefecture Yamagata
Yamanashi Prefecture Kofu
Wakayama Prefecture Katsuragi
Shizuoka Prefecture Tenryu
Yamanashi Prefecture Katsunuma
Koshigaya, Saitama Prefecture
Gunma Prefecture Tatebayashi
Kamisatomi, Gunma Prefecture
Aisai, Aichi Prefecture
Chiba Prefecture Ushiku
Sakuma, Shizuoka Prefecture
Uwajima, Ehime Prefecture
Yamagata Prefecture Sakata
Gifu Prefecture Mino
Gunma Prefecture Maebashi
Mobara, Chiba
Hatoyama, Saitama Prefecture
Toyonaka, Osaka
Yamanashi Prefecture Otsuki
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya
Receive the natural number N by means such as command line arguments, and display only the first N lines of the input. Use the head command for confirmation.
14.py
# input()Standard input is accepted at
# int()Convert to integer type with
input = int(input())
lines = [line for line in open('./hightemp.txt')]
print(''.join(lines[:input]))
#Confirmation command
# $ head ./hightemp.txt -n
$ python 14.py
4 ← Standard input
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
$ head ./hightemp.txt -n 4
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Receive the natural number N by means such as a command line argument, and display only the last N lines of the input. Use the tail command for confirmation.
15.py
#Same feeling as 14
input = int(input())
lines = [line for line in open('./hightemp.txt')]
#Slice review
#If you give a negative number, you can handle it in order from the end
print(''.join(lines[-input:]))
#Confirmation command
# $ tail ./hightemp.txt -n
$ python 15.py
3 ← Standard input
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
$ tail ./hightemp.txt -n 3
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command.
16.py
input = int(input())
lines = [line for line in open('./hightemp.txt')]
sublist = [''.join(lines[i:i+input]) for i in range(0,len(lines),input)]
#python for checking results
for i in sublist:
print(i)
#Confirmation command
# $ split -l N ./hightemp.txt
$ python 16.py
9 ← Standard input
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
$ split -l 9 ./hightemp.txt
$ ls
10.py 12.py 14.py 16.py 18.py col1.txt hightemp.txt xaa xac
11.py 13.py 15.py 17.py 19.py col2.txt new_file.txt xab
$ cat xaa
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Find the type of character string in the first column (set of different character strings). Use the sort and uniq commands for confirmation.
17.py
#Extract the data in the first column
col1 = [line.split('\t')[0] for line in open('./hightemp.txt')]
output = []
#Add to another list, do not enter duplicate values
for c in col1:
if c not in output:
output.append(c)
print(output)
# $ cat hightemp.txt | cut -f 1 | sort -k1 | uniq
$ python 17.py
['Kochi Prefecture', 'Saitama', 'Gifu Prefecture', 'Yamagata Prefecture', 'Yamanashi Prefecture', 'Wakayama Prefecture', 'Shizuoka Prefecture', 'Gunma Prefecture', 'Aichi prefecture', 'Chiba', 'Ehime Prefecture', 'Osaka']
$ cat hightemp.txt | cut -f 1 | sort -k1 | uniq
Aichi prefecture
Ehime Prefecture
Gifu Prefecture
Gunma Prefecture
Kochi Prefecture
Saitama
Yamagata Prefecture
Yamanashi Prefecture
Shizuoka Prefecture
Chiba
Osaka
Wakayama Prefecture
Arrange each row in the reverse order of the numbers in the third column (Note: sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).
18.py
#Take a list in the third column and a list for each row
values = [line.split('\t')[2] for line in open('./hightemp.txt')]
keys = [line for line in open('./hightemp.txt')]
dic = dict(zip(keys,values))
#Sort by the value in the third column
sort_dic = sorted(dic.items(), key=lambda x:x[1])
for k,v in sort_dic:
print(k)
#Confirmation command
# $ cat hightemp.txt | sort -k3
$ python 18.py
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
40 Sakata, Yamagata Prefecture.1 1978-08-03
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
40 Koshigaya, Saitama Prefecture.4 2007-08-16
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
Yamanashi Prefecture Kofu 40.7 2013-08-10
Yamagata 40 Yamagata.8 1933-07-25
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Kochi Prefecture Ekawasaki 41 2013-08-12
$ cat hightemp.txt | sort -k3
Aichi Prefecture Nagoya 39.9 1942-08-02
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Yamanashi Prefecture Otsuki 39.9 1990-07-19
Toyonaka 39, Osaka.9 1994-08-08
39 Hatoyama, Saitama Prefecture.9 1997-07-05
39 Mobara, Chiba.9 2013-08-11
Gunma Prefecture Maebashi 40 2001-07-24
Gifu Prefecture Mino 40 2007-08-16
40 Sakata, Yamagata Prefecture.1 1978-08-03
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
Chiba Prefecture Ushiku 40.2 2004-07-20
Aisai 40, Aichi Prefecture.3 1994-08-05
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Koshigaya, Saitama Prefecture.4 2007-08-16
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
Shizuoka Prefecture Tenryu 40.6 1994-08-04
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Yamanashi Prefecture Kofu 40.7 2013-08-10
Yamagata 40 Yamagata.8 1933-07-25
40 Tajimi, Gifu Prefecture.9 2007-08-16
40 Kumagaya, Saitama Prefecture.9 2007-08-16
Kochi Prefecture Ekawasaki 41 2013-08-12
Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, and sort commands for confirmation.
19.py
from collections import Counter
#Extract the data in the first column
col1 = [line.split('\t')[0] for line in open('./hightemp.txt')]
# collections.Use Counter
counter = Counter(col1)
for word,count in counter.most_common():
print(word+', '+str(count))
#Confirmation command
# $ cat ./hightemp.txt | cut -f 1 | sort | uniq -c | sort -r
$ python 19.py
Saitama, 3
Yamagata Prefecture, 3
Yamanashi Prefecture, 3
Gunma Prefecture, 3
Gifu Prefecture, 2
Shizuoka Prefecture, 2
Aichi prefecture, 2
Chiba, 2
Kochi Prefecture, 1
Wakayama Prefecture, 1
Ehime Prefecture, 1
Osaka, 1
$ cat ./hightemp.txt | cut -f 1 | sort | uniq -c | sort -r
3 Yamanashi Prefecture
3 Yamagata Prefecture
3 Saitama Prefecture
3 Gunma Prefecture
2 Chiba
2 Shizuoka Prefecture
2 Gifu Prefecture
2 Aichi prefecture
1 Wakayama Prefecture
1 Osaka
1 Kochi prefecture
1 Ehime prefecture
Corrected because I received a comment
10.py
#Try using list comprehension
# num_lines = len(list(open('./hightemp.txt')))
#Generator comprehension
#Read line by line and count the number of lines
#Memory usage is reduced because the list is not created like the list comprehension notation.
num_lines = sum(1 for line in open('./hightemp.txt'))
print(num_lines)
In list comprehension notation, all file data is temporarily stored in memory. Because you end up creating a list in vain It seems better to use generator comprehension as much as possible
11.py
#Convert tabs on each line to single character spaces in list comprehension
space_text = [line.expandtabs(1) for line in open('./hightemp.txt')]
#In the print function from python3, end is the second argument=""so""Can be treated as a terminating character
# end=""Then there will be no line breaks
print(''.join(space_text),end='')
If you use end = "" in the second argument of the print function You can set the terminator Eliminates strange line breaks
12.py
#Slice each row with tabs in list comprehension to list the first and second columns
# col1 = [line.split('\t')[0] for line in open('./hightemp.txt')]
# col2 = [line.split('\t')[1] for line in open('./hightemp.txt')]
#Generator comprehension
col1 = '\n'.join(line.split('\t')[0] for line in open('./hightemp.txt'))
col2 = '\n'.join(line.split('\t')[1] for line in open('./hightemp.txt'))
#File open in write mode
f1 = open('./col1.txt','w')
f2 = open('./col2.txt','w')
#Join with line feed code
f1.write(col1)
f2.write(col2)
14.py
# input()Standard input is accepted at
# int()Convert to integer type with
input_num = int(input())
lines = [line for line in open('./hightemp.txt')]
print(''.join(lines[:input_num]))
15.py
#Same feeling as 14
input_num = int(input())
lines = [line for line in open('./hightemp.txt')]
#Slice review
#If you give a negative number, you can handle it in order from the end
print(''.join(lines[-input_num:]))
16.py
input_num = int(input())
lines = [line for line in open('./hightemp.txt')]
sublist = [''.join(lines[i:i+input_num]) for i in range(0,len(lines),input_num)]
#python for checking results
for i in sublist:
print(i)
If input = int (input ()), the input function cannot be used after that. Change variable name to input_num
18.py
#Take a list in the third column and a list for each row
values = [line.split('\t')[2] for line in open('./hightemp.txt')]
keys = [line for line in open('./hightemp.txt')]
dic = dict(zip(keys,values))
#Sort by the value in the third column
sort_dic = sorted(dic.items(), key=lambda x:x[1])
for k,v in sort_dic:
print(k,end='')
I erased strange line breaks with print (**, end ='')
19.py
from collections import Counter
#Extract the data in the first column
# col1 = [line.split('\t')[0] for line in open('./hightemp.txt')]
# collections.Use Counter
# counter = Counter(col1)
#Is it better to write generator comprehensions on one line?
counter = Counter(line.split('\t')[0] for line in open('./hightemp.txt'))
for word,count in counter.most_common():
print(word+', '+str(count))
Except for processing the list after reading the file as a whole All modified to generator comprehension
Recommended Posts