100 Amateur Language Processing Knock: 17

It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).

Chapter 2: UNIX Command Basics

hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.

17. Difference in the character string in the first column

Find the type of string in the first column (a set of different strings). Use the sort and uniq commands for confirmation.

The finished code:

main.py


# coding: utf-8

fname = 'hightemp.txt'
with open(fname) as data_file:

	set_ken = set()
	for line in data_file:
		cols = line.split('\t')
		set_ken.add(cols[0])

for n in set_ken:
	print(n)

Execution result:

Execution result 1


Yamagata Prefecture
Wakayama Prefecture
Gifu Prefecture
Osaka
Ehime Prefecture
Saitama
Aichi prefecture
Kochi Prefecture
Gunma Prefecture
Chiba
Yamanashi Prefecture
Shizuoka Prefecture

Execution result 2


Gifu Prefecture
Yamagata Prefecture
Shizuoka Prefecture
Aichi prefecture
Osaka
Kochi Prefecture
Gunma Prefecture
Chiba
Yamanashi Prefecture
Ehime Prefecture
Wakayama Prefecture
Saitama

When I ran it twice, the order changed. This is because the hashes are randomized as described in Command Line -R Option. I think.

Shell script for UNIX command confirmation:

test.sh


#!/bin/sh

#Cut out the first column, sort, deduplication
cut --fields=1 hightemp.txt | sort | uniq > result_test.txt

#Run in Python program, sort for diff comparison
python main.py | sort > result.txt

#Check the result
diff --report-identical-files result.txt result_test.txt

Confirmation of results:

Terminal


segavvy@ubuntu:~/document/100 language processing knock 2015/17$ ./test.sh
File result.txt and result_test.txt is the same

That's all for the 18th knock. If you have any mistakes, I would appreciate it if you could point them out.

Recommended Posts

100 Amateur Language Processing Knock: 17
100 Amateur Language Processing Knock: 07
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Amateur Language Processing Knock: 97
100 Amateur Language Processing Knock: 67
100 Amateur Language Processing Knock: Summary
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 amateur language processing knocks: 41
100 amateur language processing knocks: 71
100 amateur language processing knocks: 56
100 amateur language processing knocks: 50
100 language processing knock 2020 [00 ~ 69 answer]
100 amateur language processing knocks: 59
100 amateur language processing knocks: 70
100 amateur language processing knocks: 62
100 amateur language processing knocks: 60
100 Language Processing Knock 2020 Chapter 1
100 amateur language processing knocks: 30
100 amateur language processing knocks: 06
100 amateur language processing knocks: 84
100 language processing knock 2020 [00 ~ 49 answer]
100 amateur language processing knocks: 81
100 amateur language processing knocks: 33
100 amateur language processing knocks: 46
100 amateur language processing knocks: 88
100 amateur language processing knocks: 89
100 amateur language processing knocks: 40
100 amateur language processing knocks: 45
100 amateur language processing knocks: 43
100 amateur language processing knocks: 55
100 Language Processing Knock-52: Stemming
100 amateur language processing knocks: 22
100 amateur language processing knocks: 61
100 amateur language processing knocks: 94
100 amateur language processing knocks: 54
100 amateur language processing knocks: 04
100 Language Processing Knock Chapter 1
100 amateur language processing knocks: 63
100 amateur language processing knocks: 78
100 amateur language processing knocks: 12
100 amateur language processing knocks: 14
100 amateur language processing knocks: 08
100 amateur language processing knocks: 42
100 amateur language processing knocks: 19
100 amateur language processing knocks: 73
100 amateur language processing knocks: 75
100 amateur language processing knocks: 98
100 amateur language processing knocks: 83
100 amateur language processing knocks: 95
100 amateur language processing knocks: 32
100 amateur language processing knocks: 96
100 amateur language processing knocks: 87
100 amateur language processing knocks: 72
100 amateur language processing knocks: 79
100 amateur language processing knocks: 23
100 amateur language processing knocks: 05
100 amateur language processing knocks: 00
100 amateur language processing knocks: 02