Please refer to First Post
9/24 added
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Count the number of lines. Use the wc command for confirmation.
wc_010.py
#-*- coding:utf-8 -*-
import subprocess
import codecs
if __name__=="__main__":
filename = 'hightemp.txt'
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
f = codecs.open(filename,'r','utf-8')
#\Count the number of n. The array starts at 0, so at the end+1
for index,data in enumerate(f):
data.split('\n')
print("The number of lines in the file",index+1)
#Check the output with the wc command
output = subprocess.check_output(["wc","-l",basepath+filename])
print(output.decode('utf-8'))
result
24 lines in the file
24 /Users/masassy/PycharmProjects/Pywork/training/hightemp.txt
Impressions: Open the file and count the line feed code by index. Codecs that can be read by specifying the character code is convenient.
Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.
tab2space_011.py
-*- coding:utf-8 -*-
import subprocess
import codecs
if __name__=="__main__":
filename = 'hightemp.txt'
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
f = codecs.open(filename,'r','utf-8')
#read reads all characters, readline reads one line, readlines reads all lines
r = f.read()
space_data=''
for tab_data in r:
if(tab_data=='\t'):
space_data += " "
continue
else:
space_data += tab_data
print(space_data)
#Check the output with the sed command
output =subprocess.check_output(["sed","-e" ,"s/\t/ /g",basepath+filename])
print(output.decode('utf-8'))
result
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
(Omitted because the result is long)
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
(Omitted because the result is long)
Process finished with exit code 0
Impressions: I was able to confirm the difference between read (), readline () and readlines (). The subprocess that can use commands is really convenient.
Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.
cut_012.py
# -*- coding:utf-8 -*-
import codecs
import subprocess
if __name__ == "__main__":
filename = 'hightemp.txt'
writename1='col1.txt'
writename2='col2.txt'
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
f = codecs.open(filename,'r','utf-8')
r = f.readlines()
word_list1= []
word_list2= []
#with split\Add to the list separately for each t
for temp1 in r:
word_list1.append(temp1.split('\t')[0])
f.close
f = codecs.open(writename1,'w','utf-8')
for word in word_list1:
f.write(word+'\n')
f.close
for temp2 in r:
word_list2.append(temp2.split('\t')[1])
f.close
f = codecs.open(writename2,'w','utf-8')
for word in word_list2:
f.write(word+'\n')
f.close
#Check the output with the cut command
output = subprocess.check_output(["cut","-f","1,2",basepath+filename])
print(output.decode('utf-8'))
result
*The cut command outputs the 1st and 2nd columns at the same time.
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
(Omitted because the result is long)
Process finished with exit code 0
col1.txt
Kochi Prefecture
Saitama
Gifu Prefecture
(Omitted because the result is long)
col2.txt
Ekawasaki
Kumagaya
Tajimi
(Omitted because the result is long)
Impression: I divided the processing into col1.txt and col2.txt, but there seems to be some good processing ...
Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.
merge_013.py
#-*- conding:utf-8 -*-
import codecs
import subprocess
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
filename1 = 'col1.txt'
filename2 = 'col2.txt'
filename3 = 'col3.txt'
#Read files with readlines and list them
f1 = codecs.open(filename1,'r','utf-8')
r1 = f1.readlines()
f1.close()
f2 = codecs.open(filename2,'r','utf-8')
r2 = f2.readlines()
f2.close()
s_r1=''
s_r2=''
#Change the list to a string, r1\n is\Change to t(\t becomes a sentinel)
for data in r1:
s_r1 += str(data)
s_r1=s_r1.replace('\n','\t')
#Change list to string(\n is left as it is because it is a sentinel)
for data in r2:
s_r2 += str(data)
address=''
i=0
#s_Evaluate r1 character by character and guard(\t)Add data to address until
for temp in s_r1:
if(temp!='\t'):
address+=temp
else:
#s to address_Sentinel data for r2(\n)Add until
address+='\t'
while(s_r2[i]!='\n'):
address+=s_r2[i]
i+=1
else:
address+='\n'
i+=1
continue
f3=codecs.open(filename3,'w','utf-8')
f3.write(address)
f3.close()
output=subprocess.check_output(["paste",basepath+filename1,basepath+filename2])
print(output.decode('utf-8'))
result
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
(Omitted because the result is long)
Process finished with exit code 0
Impressions: Add data in a double loop.
Receive the natural number N by means such as command line arguments, and display only the first N lines of the input. Use the head command for confirmation.
head_014.py
#-*- coding:utf-8 -*-
import codecs
import subprocess
def head(data,N):
i=0
j=0
msg=''
while(i<N):
for temp in data[j]:
if(temp!='\n'):
msg += temp
j+=1
else:
msg += '\n'
i+=1
j+=1
break
else:
return msg
if __name__=="__main__":
filename = 'hightemp.txt'
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
f = codecs.open(filename,'r','utf-8')
r=f.read()
N=4
msg = head(r,N)
print(msg)
#Confirm with head command
output=subprocess.check_output(["head","-n",str(N),basepath+filename])
print(output.decode('utf-8'))
result
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Process finished with exit code 0
Impressions: It has become something like C language. .. ..
Recommended Posts