It's been almost a month since the last update ... It's been a good three-day shaved head.
As the title of the chapter shows, this is often done using UNIX commands, so It may be a little troublesome to write in python.
First, download the dataset ...
Count the number of lines. Use the wc command for confirmation.
python
filename = 'hightemp.txt'
f = open(filename, 'r')
print sum([1 for l in f])
#>>> 24
There seem to be various ways to do this ... http://551sornwmc.blog109.fc2.com/blog-entry-387.html http://stackoverflow.com/questions/845058/how-to-get-line-count-cheaply-in-python
In terms of memory usage and execution speed, it seems better to use this memory-mapped file.
python
# using memory mapped file
import mmap
def mapcount(filename):
f = open(filename, "r+")
buf = mmap.mmap(f.fileno(), 0)
lines = 0
readline = buf.readline
while readline():
lines += 1
return lines
Click here for confirmation with UNIX commands.
python
wc -l hightemp.txt
#>>> 24
Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.
python
#import re
filename = 'hightemp.txt'
f = open(filename, 'r')
lines = f.readlines()
for line in lines:
#line_replaced = re.sub(r'\t', r'\s', line)
line_replaced = line.expandtabs(1)
print line_replaced,
There is expandtabs.
Click here for confirmation with UNIX commands.
python
cat hightemp.txt | tr '\t' ','
This ↑ seems to be the smoothest.
python
sed -e s/'\t'/'\s'/g hightemp.txt
#It doesn't work on Mac, so again
sed -e s/$'\t'/$'\s'/g hightemp.txt
#that?
http://mattintosh.hatenablog.com/entry/2013/01/16/143323
BSD sed included in Mac OS X etc. does not expand \ t in scripts to tabs like echo and printf.
Oh...
Save the extracted version of only the first column of each row as col1.txt and the extracted version of only the second column as col2.txt. Use the cut command for confirmation.
python
filename = 'hightemp.txt'
filename_col1 = 'col1.txt'
filename_col2 = 'col2.txt'
f = open(filename, 'r')
f_col1 = open(filename_col1, 'w')
f_col2 = open(filename_col2, 'w')
lines = f.readlines()
content_col1 = [line.split()[0] + '\n' for line in lines]
content_col2 = [line.split()[1] + '\n' for line in lines]
f_col1.writelines(content_col1)
f_col2.writelines(content_col2)
f_col1.close()
f_col2.close()
One thing to note is that the writelines method does not include line breaks, so Did you add it yourself?
Click here for confirmation with UNIX commands. Wow it's so easy that I feel nauseous.
python
cut -f1 hightemp.txt > col1.txt
cut -f2 hightemp.txt > col2.txt
Combine the col1.txt and col2.txt created in 12 to create a text file in which the first and second columns of the original file are arranged tab-delimited. Use the paste command for confirmation.
python
filename_col1 = 'col1.txt'
filename_col2 = 'col2.txt'
filename_col1_col2 = 'col1_col2.txt'
f_col1 = open(filename_col1, 'r')
f_col2 = open(filename_col2, 'r')
f_col1_col2 = open(filename_col1_col2, 'w')
lines_1 = f_col1.readlines()
lines_2 = f_col2.readlines()
content = [line1 + '\t' + line2 + '\n' for line1, line2 in zip(lines_1, lines_2)]
f_col1_col2.writelines(content)
f_col1_col2.close()
f_col1.close()
f_col2.close()
Click here for confirmation with UNIX commands. It was too easy and I vomited.
python
paste col1.txt col2.txt > col1_col2.txt
Receive the natural number N by means such as a command line argument and display only the first N lines of the input. Use the head command for confirmation.
knock014.py
# -*- coding: utf-8 -*-
import sys
import argparse
parser = argparse.ArgumentParser(description='Head command. Accepts an integer and a file name.')
#Number of lines
parser.add_argument(
'-l', '--line',
type = int,
dest = 'line',
default = 10,
help = 'Equivalent to the number of lines specified by the head command'
)
#file name
parser.add_argument(
'-f', '--filename',
type = str, #Specify the type of value to receive
dest = 'filename', #Save destination variable name
required = True, #Required item
help = 'File name given as input' # --Statement to display when helping
)
args = parser.parse_args()
N = args.line
filename = args.filename
#Display the first N lines
f = open(filename)
for x in xrange(N):
print f.next().strip()
f.close()
When you do the above.
python
python knock014.py -l 3 -f hightemp.txt
# >>>Kochi Prefecture Ekawasaki 41 2013-08-12
# >>>40 Kumagaya, Saitama Prefecture.9 2007-08-16
# >>>40 Tajimi, Gifu Prefecture.9 2007-08-16
python knock014.py -l 3 -f hightemp.txt
# >>>Kochi Prefecture Ekawasaki 41 2013-08-12
# >>>40 Kumagaya, Saitama Prefecture.9 2007-08-16
# >>>40 Tajimi, Gifu Prefecture.9 2007-08-16
# >>>Yamagata 40 Yamagata.8 1933-07-25
# >>>Yamanashi Prefecture Kofu 40.7 2013-08-10
# >>>Wakayama Prefecture Katsuragi 40.6 1994-08-08
# >>>Shizuoka Prefecture Tenryu 40.6 1994-08-04
# >>>40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
# >>>40 Koshigaya, Saitama Prefecture.4 2007-08-16
# >>>Gunma Prefecture Tatebayashi 40.3 2007-08-16
Click here for confirmation with UNIX commands.
python
head -3 hightemp.txt
head hightemp.txt
Receive the natural number N by means such as command line arguments and display only the last N lines of the input. Use the tail command for confirmation.
knock015.py
# -*- coding: utf-8 -*-
import sys
import argparse
parser = argparse.ArgumentParser(description='Tail command. Accepts an integer and a file name.')
#Number of lines
parser.add_argument(
'-l', '--line',
type = int,
dest = 'line',
default = 10,
help = 'Equivalent to the number of lines specified by the tail command'
)
#file name
parser.add_argument(
'-f', '--filename',
type = str, #Specify the type of value to receive
dest = 'filename', #Save destination variable name
required = True, #Required item
help = 'File name given as input' # --Statement to display when helping
)
args = parser.parse_args()
N = args.line
filename = args.filename
#Show last N lines
f = open(filename)
lines = f.readlines()
M = len(lines)
for i, line in enumerate(lines):
if i+N >= M:
#print i
print line.strip()
f.close()
Basically, I just changed the last process from 14. Click here for confirmation with UNIX commands.
python
tail -3 hightemp.txt
tail hightemp.txt
Recommended Posts