About the history so far

Please refer to First Post

Knock status

9/24 added

Chapter 2: UNIX Command Basics

hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.

015. Output the last N lines

Receive the natural number N by means such as command line arguments, and display only the last N lines of the input. Use the tail command for confirmation.

`tail_015.py`


#-*- coding:utf-8 -*-

import codecs
import subprocess

def tail(data,N):
    max = len(data)
    print(''.join(data[max-N:]))

if __name__=="__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    N=3
    tail(f.readlines(),N)

#Confirm with tail command
    output=subprocess.check_output(["tail","-n",str(N),basepath+filename])
    print(output.decode('utf-8'))

`result`


Yamanashi Prefecture Otsuki 39.9	1990-07-19
39 Tsuruoka, Yamagata Prefecture.9	1978-08-03
Aichi Prefecture Nagoya 39.9	1942-08-02

Yamanashi Prefecture Otsuki 39.9	1990-07-19
39 Tsuruoka, Yamagata Prefecture.9	1978-08-03
Aichi Prefecture Nagoya 39.9	1942-08-02

Impression: The point of ingenuity is how to specify the line to start join.

016. Divide the file into N

Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command

`split_016.py`


-*- coding:utf-8 -*-

import codecs
import subprocess
import math

def split(data,N):
    index=0
#Calculate the number of files to export
    page=math.ceil(len(data)/N)
    for i in range(0,page):
#Write the data to write the list as a character string_Added to data
        write_data=''.join(data[index:N+index])
        index+=N
        f=codecs.open('write_data'+str(index),'w','utf-8')
        f.write(write_data)

if __name__ == "__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    N = 15
    f=codecs.open(filename,'r','utf-8')
    split(f.readlines(),N)
    output=subprocess.check_output(["split","-l",str(N),basepath+filename])

`result`


write by split function_data15 and write_The file of data30 was output
write_data15
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9	2007-08-16
40 Tajimi, Gifu Prefecture.9	2007-08-16
(Omitted because the result is long)
write_data30
40 Sakata, Yamagata Prefecture.1	1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
(Omitted because the result is long)

The xaa and xab files were output by the split command.
xaa
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9	2007-08-16
40 Tajimi, Gifu Prefecture.9	2007-08-16
(Omitted because the result is long)
xab
40 Sakata, Yamagata Prefecture.1	1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
(Omitted because the result is long)

Process finished with exit code 0

Impressions: When creating the split function, I was worried about how to calculate the number of pages and how to name the file when creating one file with N few lines. .. ..

017. Difference in the character string in the first column

Find the type of character string in the first column (set of different character strings). Use the sort and uniq commands for confirmation.

`sort_uniq_017.py`


# -*- coding:utf-8 -*-

import codecs
import subprocess

def sort_uniq(data):
    cut_temp = []
    sort_temp = []
    uniq_temp = []

#cut -How f 1 works
    for temp in data:
        cut_temp.append(temp.split()[:1])

#How sort works
    sort_temp = sorted(cut_temp)

#How uniq works
    for temp in sort_temp:
        if temp not in uniq_temp:
            uniq_temp.append(temp)

#After converting list to str, delete extra characters and display
    sort_uniq_data = map(str,uniq_temp)
    for temp in sort_uniq_data:
        print(''.join(temp).lstrip("['").rstrip("']"))

if __name__ == "__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    sort_uniq(f.readlines())
    print('\n')

    cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
    sort = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
    uniq = subprocess.Popen(["uniq"],stdin=sort.stdout,stdout=subprocess.PIPE)
    end_of_pipe = uniq.stdout
    for line in end_of_pipe:
        print(line.decode('utf-8').rstrip('\n'))

`result`


Chiba
Wakayama Prefecture
Saitama
Osaka
(Omitted because the result is long)


Chiba
Wakayama Prefecture
Saitama
Osaka
Yamagata Prefecture
(Omitted because the result is long)
Process finished with exit code 0

Impressions: I didn't know how to write a pipe using the subprocess module. .. .. With linux, you only need |, but if you program it, you can see what parameters are needed.

018. Sort

Sort each row in descending order of the numbers in the 3rd column Arrange each row in the reverse order of the numbers in the 3rd column (Note: Sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).

`r_sort_018.py`


#-*- conding:utf-8 -*-
import codecs
import subprocess
import operator

def r_sort(data):
    cut_temp = []
    sort_temp = []

#List
    for temp in data:
        cut_temp.append(temp.split())

#How sort works
    sort_temp = sorted(cut_temp,key=operator.itemgetter(2),reverse=True)

#After converting list to str, delete extra characters and display
    sort_data = map(str, sort_temp)
    for temp in sort_data:
        print(''.join(temp).lstrip("['").rstrip("']"))

if __name__=="__main__":
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    filename = 'hightemp.txt'
    with codecs.open(filename,'r','utf-8') as f:
        r_sort(f.readlines())
    print('\n')

    sort= subprocess.check_output(["sort","-r","-k","3",basepath+filename])
    print(sort.decode('utf-8'))

`result`


Kochi Prefecture', 'Ekawasaki', '41', '2013-08-12
Saitama', 'Kumagaya', '40.9', '2007-08-16
Gifu Prefecture', 'Tajimi', '40.9', '2007-08-16
(Omitted because the result is long)

Kochi Prefecture Ekawasaki 41 2013-08-12
40 Tajimi, Gifu Prefecture.9	2007-08-16
40 Kumagaya, Saitama Prefecture.9	2007-08-16
(Omitted because the result is long)
Process finished with exit code 0

Impressions: The itemgetter function of the operator module was useful.

019. Find the frequency of appearance of the character string in the first column of each line, and arrange them in descending order of frequency of appearance.

Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, and sort commands for confirmation.

`frequency_019.py`


#-*- coding:utf-8 -*-
import codecs
import subprocess
import collections
import operator

def frequency(data):
    cut_temp = []
    sort_temp = []
    count_dict={}

    # cut -How f 1 works
    for temp in data:
        cut_temp.append(temp.split()[:1])

    #How sort works
    sort_temp = sorted(cut_temp)

    #Count the number of elements in list
    # uniq -c+How sort works
    count_dict = collections.Counter(map(str,sort_temp))
    for value,count in sorted(count_dict.items(),key=operator.itemgetter(1),reverse=True):
        print(count,str(value).lstrip("['").rstrip("']"))

if __name__=="__main__":
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    filename = 'hightemp.txt'
    with codecs.open(filename,'r','utf-8') as f:
        frequency(f.readlines())

    print('\n')
    cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
    sort1 = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
    uniq = subprocess.Popen(["uniq","-c"],stdin=sort1.stdout,stdout=subprocess.PIPE)
    sort2 = subprocess.Popen(["sort","-r"],stdin=uniq.stdout,stdout=subprocess.PIPE)
    end_of_pipe = sort2.stdout
    for line in end_of_pipe:
        print(line.decode('utf-8').lstrip(' ').rstrip('\n'))

`result`


3 Yamanashi Prefecture
3 Yamagata Prefecture
3 Gunma Prefecture
3 Saitama Prefecture
2 Gifu Prefecture
2 Chiba
(Omitted because the result is long)

3 Gunma Prefecture
3 Yamanashi Prefecture
3 Yamagata Prefecture
3 Saitama Prefecture
2 Shizuoka Prefecture
2 Aichi prefecture
(Omitted because the result is long)
Process finished with exit code 0

Impressions: It was difficult to handle and sort dictionaries.

[Python] Challenge 100 knocks! (015 ~ 019)

About the history so far

Knock status

Chapter 2: UNIX Command Basics

015. Output the last N lines

tail_015.py

result

016. Divide the file into N

split_016.py

result

017. Difference in the character string in the first column

sort_uniq_017.py

result

018. Sort

r_sort_018.py

result

019. Find the frequency of appearance of the character string in the first column of each line, and arrange them in descending order of frequency of appearance.

frequency_019.py

result

`tail_015.py`

`result`

`split_016.py`

`result`

`sort_uniq_017.py`

`result`

`r_sort_018.py`

`result`

`frequency_019.py`

`result`