Please refer to First Post
9/24 added
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Receive the natural number N by means such as command line arguments, and display only the last N lines of the input. Use the tail command for confirmation.
tail_015.py
#-*- coding:utf-8 -*-
import codecs
import subprocess
def tail(data,N):
max = len(data)
print(''.join(data[max-N:]))
if __name__=="__main__":
filename = 'hightemp.txt'
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
f = codecs.open(filename,'r','utf-8')
N=3
tail(f.readlines(),N)
#Confirm with tail command
output=subprocess.check_output(["tail","-n",str(N),basepath+filename])
print(output.decode('utf-8'))
result
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
Impression: The point of ingenuity is how to specify the line to start join.
Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command
split_016.py
-*- coding:utf-8 -*-
import codecs
import subprocess
import math
def split(data,N):
index=0
#Calculate the number of files to export
page=math.ceil(len(data)/N)
for i in range(0,page):
#Write the data to write the list as a character string_Added to data
write_data=''.join(data[index:N+index])
index+=N
f=codecs.open('write_data'+str(index),'w','utf-8')
f.write(write_data)
if __name__ == "__main__":
filename = 'hightemp.txt'
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
N = 15
f=codecs.open(filename,'r','utf-8')
split(f.readlines(),N)
output=subprocess.check_output(["split","-l",str(N),basepath+filename])
result
write by split function_data15 and write_The file of data30 was output
write_data15
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
(Omitted because the result is long)
write_data30
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
(Omitted because the result is long)
The xaa and xab files were output by the split command.
xaa
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
(Omitted because the result is long)
xab
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
(Omitted because the result is long)
Process finished with exit code 0
Impressions: When creating the split function, I was worried about how to calculate the number of pages and how to name the file when creating one file with N few lines. .. ..
Find the type of character string in the first column (set of different character strings). Use the sort and uniq commands for confirmation.
sort_uniq_017.py
# -*- coding:utf-8 -*-
import codecs
import subprocess
def sort_uniq(data):
cut_temp = []
sort_temp = []
uniq_temp = []
#cut -How f 1 works
for temp in data:
cut_temp.append(temp.split()[:1])
#How sort works
sort_temp = sorted(cut_temp)
#How uniq works
for temp in sort_temp:
if temp not in uniq_temp:
uniq_temp.append(temp)
#After converting list to str, delete extra characters and display
sort_uniq_data = map(str,uniq_temp)
for temp in sort_uniq_data:
print(''.join(temp).lstrip("['").rstrip("']"))
if __name__ == "__main__":
filename = 'hightemp.txt'
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
f = codecs.open(filename,'r','utf-8')
sort_uniq(f.readlines())
print('\n')
cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
sort = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
uniq = subprocess.Popen(["uniq"],stdin=sort.stdout,stdout=subprocess.PIPE)
end_of_pipe = uniq.stdout
for line in end_of_pipe:
print(line.decode('utf-8').rstrip('\n'))
result
Chiba
Wakayama Prefecture
Saitama
Osaka
(Omitted because the result is long)
Chiba
Wakayama Prefecture
Saitama
Osaka
Yamagata Prefecture
(Omitted because the result is long)
Process finished with exit code 0
Impressions: I didn't know how to write a pipe using the subprocess module. .. .. With linux, you only need |, but if you program it, you can see what parameters are needed.
Sort each row in descending order of the numbers in the 3rd column Arrange each row in the reverse order of the numbers in the 3rd column (Note: Sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).
r_sort_018.py
#-*- conding:utf-8 -*-
import codecs
import subprocess
import operator
def r_sort(data):
cut_temp = []
sort_temp = []
#List
for temp in data:
cut_temp.append(temp.split())
#How sort works
sort_temp = sorted(cut_temp,key=operator.itemgetter(2),reverse=True)
#After converting list to str, delete extra characters and display
sort_data = map(str, sort_temp)
for temp in sort_data:
print(''.join(temp).lstrip("['").rstrip("']"))
if __name__=="__main__":
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
filename = 'hightemp.txt'
with codecs.open(filename,'r','utf-8') as f:
r_sort(f.readlines())
print('\n')
sort= subprocess.check_output(["sort","-r","-k","3",basepath+filename])
print(sort.decode('utf-8'))
result
Kochi Prefecture', 'Ekawasaki', '41', '2013-08-12
Saitama', 'Kumagaya', '40.9', '2007-08-16
Gifu Prefecture', 'Tajimi', '40.9', '2007-08-16
(Omitted because the result is long)
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Tajimi, Gifu Prefecture.9 2007-08-16
40 Kumagaya, Saitama Prefecture.9 2007-08-16
(Omitted because the result is long)
Process finished with exit code 0
Impressions: The itemgetter function of the operator module was useful.
Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, and sort commands for confirmation.
frequency_019.py
#-*- coding:utf-8 -*-
import codecs
import subprocess
import collections
import operator
def frequency(data):
cut_temp = []
sort_temp = []
count_dict={}
# cut -How f 1 works
for temp in data:
cut_temp.append(temp.split()[:1])
#How sort works
sort_temp = sorted(cut_temp)
#Count the number of elements in list
# uniq -c+How sort works
count_dict = collections.Counter(map(str,sort_temp))
for value,count in sorted(count_dict.items(),key=operator.itemgetter(1),reverse=True):
print(count,str(value).lstrip("['").rstrip("']"))
if __name__=="__main__":
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
filename = 'hightemp.txt'
with codecs.open(filename,'r','utf-8') as f:
frequency(f.readlines())
print('\n')
cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
sort1 = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
uniq = subprocess.Popen(["uniq","-c"],stdin=sort1.stdout,stdout=subprocess.PIPE)
sort2 = subprocess.Popen(["sort","-r"],stdin=uniq.stdout,stdout=subprocess.PIPE)
end_of_pipe = sort2.stdout
for line in end_of_pipe:
print(line.decode('utf-8').lstrip(' ').rstrip('\n'))
result
3 Yamanashi Prefecture
3 Yamagata Prefecture
3 Gunma Prefecture
3 Saitama Prefecture
2 Gifu Prefecture
2 Chiba
(Omitted because the result is long)
3 Gunma Prefecture
3 Yamanashi Prefecture
3 Yamagata Prefecture
3 Saitama Prefecture
2 Shizuoka Prefecture
2 Aichi prefecture
(Omitted because the result is long)
Process finished with exit code 0
Impressions: It was difficult to handle and sort dictionaries.
Recommended Posts