About the history so far

Please refer to First Post

Knock status

9/24 added

Chapter 2: UNIX Command Basics

hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.

010. Counting the number of lines

Count the number of lines. Use the wc command for confirmation.

`wc_010.py`


#-*- coding:utf-8 -*-

import subprocess
import codecs

if __name__=="__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')

#\Count the number of n. The array starts at 0, so at the end+1
    for index,data in enumerate(f):
        data.split('\n')

    print("The number of lines in the file",index+1)

#Check the output with the wc command
    output = subprocess.check_output(["wc","-l",basepath+filename])
    print(output.decode('utf-8'))

`result`


24 lines in the file
      24 /Users/masassy/PycharmProjects/Pywork/training/hightemp.txt

Impressions: Open the file and count the line feed code by index. Codecs that can be read by specifying the character code is convenient.

011. Replace tabs with spaces

Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.

`tab2space_011.py`


-*- coding:utf-8 -*-

import subprocess
import codecs

if __name__=="__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
#read reads all characters, readline reads one line, readlines reads all lines
    r = f.read()
    space_data=''
    for tab_data in r:
        if(tab_data=='\t'):
            space_data += " "
            continue
        else:
            space_data += tab_data

    print(space_data)
#Check the output with the sed command
    output =subprocess.check_output(["sed","-e" ,"s/\t/ /g",basepath+filename])
    print(output.decode('utf-8'))

`result`


Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
(Omitted because the result is long)

Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
(Omitted because the result is long)

Process finished with exit code 0

Impressions: I was able to confirm the difference between read (), readline () and readlines (). The subprocess that can use commands is really convenient.

012. Save the first column in col1.txt and the second column in col2.txt

Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.

`cut_012.py`


# -*- coding:utf-8 -*-

import codecs
import subprocess

if __name__ == "__main__":
    filename = 'hightemp.txt'
    writename1='col1.txt'
    writename2='col2.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    r = f.readlines()
    word_list1= []
    word_list2= []

#with split\Add to the list separately for each t
    for temp1 in r:
        word_list1.append(temp1.split('\t')[0])
    f.close
    f = codecs.open(writename1,'w','utf-8')
    for word in word_list1:
        f.write(word+'\n')
    f.close

    for temp2 in r:
        word_list2.append(temp2.split('\t')[1])
    f.close
    f = codecs.open(writename2,'w','utf-8')
    for word in word_list2:
        f.write(word+'\n')
    f.close

#Check the output with the cut command
    output = subprocess.check_output(["cut","-f","1,2",basepath+filename])
    print(output.decode('utf-8'))

`result`


*The cut command outputs the 1st and 2nd columns at the same time.
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
(Omitted because the result is long)

Process finished with exit code 0

col1.txt
Kochi Prefecture
Saitama
Gifu Prefecture
(Omitted because the result is long)

col2.txt
Ekawasaki
Kumagaya
Tajimi
(Omitted because the result is long)

Impression: I divided the processing into col1.txt and col2.txt, but there seems to be some good processing ...

013. Merge col1.txt and col2.txt

Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.

`merge_013.py`


#-*- conding:utf-8 -*-

import codecs
import subprocess
basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
filename1 = 'col1.txt'
filename2 = 'col2.txt'
filename3 = 'col3.txt'

#Read files with readlines and list them
f1 = codecs.open(filename1,'r','utf-8')
r1 = f1.readlines()
f1.close()

f2 = codecs.open(filename2,'r','utf-8')
r2 = f2.readlines()
f2.close()

s_r1=''
s_r2=''

#Change the list to a string, r1\n is\Change to t(\t becomes a sentinel)
for data in r1:
    s_r1 += str(data)
    s_r1=s_r1.replace('\n','\t')

#Change list to string(\n is left as it is because it is a sentinel)
for data in r2:
    s_r2 += str(data)

address=''
i=0
#s_Evaluate r1 character by character and guard(\t)Add data to address until
for temp in s_r1:
    if(temp!='\t'):
        address+=temp
    else:
#s to address_Sentinel data for r2(\n)Add until
        address+='\t'
        while(s_r2[i]!='\n'):
            address+=s_r2[i]
            i+=1
        else:
            address+='\n'
            i+=1
            continue

f3=codecs.open(filename3,'w','utf-8')
f3.write(address)
f3.close()

output=subprocess.check_output(["paste",basepath+filename1,basepath+filename2])
print(output.decode('utf-8'))

`result`


Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
(Omitted because the result is long)
Process finished with exit code 0

Impressions: Add data in a double loop.

014. Output N lines from the beginning

Receive the natural number N by means such as command line arguments, and display only the first N lines of the input. Use the head command for confirmation.

`head_014.py`


#-*- coding:utf-8 -*-

import codecs
import subprocess

def head(data,N):
    i=0
    j=0
    msg=''
    while(i<N):
        for temp in data[j]:
            if(temp!='\n'):
                msg += temp
                j+=1
            else:
                msg += '\n'
                i+=1
                j+=1
                break
    else:
        return msg

if __name__=="__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    r=f.read()
    N=4
    msg = head(r,N)
    print(msg)

#Confirm with head command
    output=subprocess.check_output(["head","-n",str(N),basepath+filename])
    print(output.decode('utf-8'))

`result`


Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9	2007-08-16
40 Tajimi, Gifu Prefecture.9	2007-08-16
Yamagata 40 Yamagata.8	1933-07-25

Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9	2007-08-16
40 Tajimi, Gifu Prefecture.9	2007-08-16
Yamagata 40 Yamagata.8	1933-07-25

Process finished with exit code 0

Impressions: It has become something like C language. .. ..

[Python] Challenge 100 knocks! (010-014)

About the history so far

Knock status

Chapter 2: UNIX Command Basics

010. Counting the number of lines

wc_010.py

result

011. Replace tabs with spaces

tab2space_011.py

result

012. Save the first column in col1.txt and the second column in col2.txt

cut_012.py

result

013. Merge col1.txt and col2.txt

merge_013.py

result

014. Output N lines from the beginning

head_014.py

result

`wc_010.py`

`result`

`tab2space_011.py`

`result`

`cut_012.py`

`result`

`merge_013.py`

`result`

`head_014.py`

`result`