It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Find the type of string in the first column (a set of different strings). Use the sort and uniq commands for confirmation.
main.py
# coding: utf-8
fname = 'hightemp.txt'
with open(fname) as data_file:
set_ken = set()
for line in data_file:
cols = line.split('\t')
set_ken.add(cols[0])
for n in set_ken:
print(n)
Execution result 1
Yamagata Prefecture
Wakayama Prefecture
Gifu Prefecture
Osaka
Ehime Prefecture
Saitama
Aichi prefecture
Kochi Prefecture
Gunma Prefecture
Chiba
Yamanashi Prefecture
Shizuoka Prefecture
Execution result 2
Gifu Prefecture
Yamagata Prefecture
Shizuoka Prefecture
Aichi prefecture
Osaka
Kochi Prefecture
Gunma Prefecture
Chiba
Yamanashi Prefecture
Ehime Prefecture
Wakayama Prefecture
Saitama
When I ran it twice, the order changed. This is because the hashes are randomized as described in Command Line -R Option. I think.
test.sh
#!/bin/sh
#Cut out the first column, sort, deduplication
cut --fields=1 hightemp.txt | sort | uniq > result_test.txt
#Run in Python program, sort for diff comparison
python main.py | sort > result.txt
#Check the result
diff --report-identical-files result.txt result_test.txt
Terminal
segavvy@ubuntu:~/document/100 language processing knock 2015/17$ ./test.sh
File result.txt and result_test.txt is the same
That's all for the 18th knock. If you have any mistakes, I would appreciate it if you could point them out.
Recommended Posts