Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command.
Omitting import and argparse settings. If the number of lines M in the file is not exactly divisible by the given natural number N, It is a specification that gives one more line in order from the first divided part.
knock016.py
args = parser.parse_args()
N = args.line
filename = args.filename
#Show last N lines
f = open(filename)
lines = f.readlines()
M = len(lines)
#Merchandise and remainder
quotient = M/N
remainder = M - quotient*N
#Find the line that splits the file
num_of_lines = [quotient+1 if i < remainder else quotient for i in xrange(N)]
num_of_lines_cumulative = [sum(num_of_lines[:i+1]) for i in xrange(N)]
for i, line in enumerate(lines):
if i in num_of_lines_cumulative:
print
print line.strip()
else:
print line.strip()
f.close()
UNIX command ... After adding the optional validation (although not enough), the code became longer.
knock016.sh
#!/bin/sh
#Receive the natural number N by means such as command line arguments, and divide the input file into N line by line.
#Achieve the same processing with the split command.
# ex.
# sh knock016.sh -f hightemp.txt -n 7
while getopts f:n: OPT
do
case $OPT in
"f" ) FLG_F="TRUE" ; INPUT_FILE=$OPTARG ;;
"n" ) FLG_N="TRUE" ; N=$OPTARG ;;
* ) echo "Usage: $CMDNAME [-f file name] [-n split number]" 1>&2
exit 1 ;;
esac
done
if [ ! "$FLG_F" = "TRUE" ]; then
echo 'file name is not set.'
exit 1
fi
if [ ! "$FLG_N" = "TRUE" ]; then
echo 'split number is not set.'
exit 1
fi
#INPUT_FILE="hightemp.txt"
TMP_HEAD="split/tmphead.$INPUT_FILE"
TMP_TAIL="split/tmptail.$INPUT_FILE"
SPLITHEAD_PREFIX="split/splithead."
SPLITTAIL_PREFIX="split/splittail."
M=$( wc -l < $INPUT_FILE )
#N=9
quotient=`expr \( $M / $N \)`
remainder=`expr \( $M - $quotient \* $N \)`
if [ $quotient -eq 0 ]; then
echo "cannot divide: N is larger than the lines of the input file."
exit 0
fi
if [ $remainder -eq 0 ]; then
#If the remainder is 0, it will be in one file$Split to include quotient lines
split -l $quotient $INPUT_FILE SPLITHEAD_PREFIX
else
#If the remainder is non-zero,
# (a)From the beginning(($quotient + 1) * $remainder)Line and(b)After that, divide it into 2 files
split_head=`expr \( \( $quotient + 1 \) \* $remainder \)`
split_tail=`expr \( $M - $split_head \)`
head -n $split_head $INPUT_FILE > $TMP_HEAD
tail -n $split_tail $INPUT_FILE > $TMP_TAIL
# (a)In one file($quotient+1)line,(b)In one file$quotientline,含まれるように分割する
split -l `expr \( $quotient + 1 \)` $TMP_HEAD $SPLITHEAD_PREFIX
split -l $quotient $TMP_TAIL $SPLITTAIL_PREFIX
rm -iv split/tmp*
fi
Since split is a command used by specifying the number of lines contained in one file, Impression that a little ingenuity was needed.
Find the type of string in the first column (a set of different strings). Use the sort and uniq commands for confirmation.
python
if __name__ == '__main__':
f = open(filename)
lines = f.readlines()
# unlike problem 12., "+ '\n'" is not necessary
content_col1 = [line.split()[0] for line in lines]
content_col1_set = set(content_col1)
print len(content_col1_set)
for x in content_col1_set:
print x
f.close()
#>>>
#12
#Aichi prefecture
#Yamagata Prefecture
#Gifu Prefecture
#Chiba
#Saitama
#Kochi Prefecture
#Gunma Prefecture
#Yamanashi Prefecture
#Wakayama Prefecture
#Ehime Prefecture
#Osaka
#Shizuoka Prefecture
UNIX command. Do I have to do the same order ...?
python
awk -F'\t' '{print $1;}' hightemp.txt | sort | uniq
#>>>
#Chiba
#Wakayama Prefecture
#Saitama
#Osaka
#Yamagata Prefecture
#Yamanashi Prefecture
#Gifu Prefecture
#Ehime Prefecture
#Aichi prefecture
#Gunma Prefecture
#Shizuoka Prefecture
#Kochi Prefecture
Arrange each row in the reverse order of the numbers in the third column (Note: sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).
python
if __name__ == '__main__':
f = open(filename)
lines = f.readlines()
# reverse=True allows us to perform descending sort
sorted_lines = sorted(lines, key=lambda line: float(line.split()[2]), reverse=True)
for sorted_line in sorted_lines:
print sorted_line,
f.close()
#>>>
#Kochi Prefecture Ekawasaki 41 2013-08-12
#40 Kumagaya, Saitama Prefecture.9 2007-08-16
#40 Tajimi, Gifu Prefecture.9 2007-08-16
#Yamagata 40 Yamagata.8 1933-07-25
#Yamanashi Prefecture Kofu 40.7 2013-08-10
#Wakayama Prefecture Katsuragi 40.6 1994-08-08
#Shizuoka Prefecture Tenryu 40.6 1994-08-04
#40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
#40 Koshigaya, Saitama Prefecture.4 2007-08-16
#Gunma Prefecture Tatebayashi 40.3 2007-08-16
#40 Kamisatomi, Gunma Prefecture.3 1998-07-04
#Aisai 40, Aichi Prefecture.3 1994-08-05
#Chiba Prefecture Ushiku 40.2 2004-07-20
#40 Sakuma, Shizuoka Prefecture.2 2001-07-24
#40 Uwajima, Ehime Prefecture.2 1927-07-22
#40 Sakata, Yamagata Prefecture.1 1978-08-03
#Gifu Prefecture Mino 40 2007-08-16
#Gunma Prefecture Maebashi 40 2001-07-24
#39 Mobara, Chiba.9 2013-08-11
#39 Hatoyama, Saitama Prefecture.9 1997-07-05
#Toyonaka 39, Osaka.9 1994-08-08
#Yamanashi Prefecture Otsuki 39.9 1990-07-19
#39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
#Aichi Prefecture Nagoya 39.9 1942-08-02
UNIX command.
python
sort -k3r hightemp.txt
Specify the column with the k option. Add r and reverse order.
Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, and sort commands for confirmation.
python
from collections import defaultdict
from collections import Counter
...
if __name__ == '__main__':
f = open(filename)
lines = f.readlines()
# extract 1st column
content_col1 = [line.split()[0] for line in lines]
# (1) defaultdict
# http://docs.python.jp/2/library/collections.html#collections.defaultdict
d = defaultdict(int)
for col1 in content_col1:
d[col1] += 1
for word, cnt in sorted(d.items(), key=lambda x: x[1], reverse=True):
print word, cnt
print
# (2) Counter
# http://docs.python.jp/2/library/collections.html#collections.Counter
counter = Counter(content_col1)
for word, cnt in counter.most_common():
print word, cnt
f.close()
#>>>
#Yamagata Prefecture 3
#Saitama Prefecture 3
#Gunma Prefecture 3
#Yamanashi 3
#Aichi 2
#Gifu prefecture 2
#Chiba 2
#Shizuoka Prefecture 2
#Kochi Prefecture 1
#Wakayama Prefecture 1
#Ehime Prefecture 1
#Osaka 1
#Yamagata Prefecture 3
#Saitama Prefecture 3
#Gunma Prefecture 3
#Yamanashi 3
#Aichi 2
#Gifu prefecture 2
#Chiba 2
#Shizuoka Prefecture 2
#Kochi Prefecture 1
#Wakayama Prefecture 1
#Ehime Prefecture 1
#Osaka 1
Whether to count with the defaultdict type as in (1) As in (2), do you use the Counter itself? There is a most_common () method ...
Then UNIX command.
python
cut -f 1 hightemp.txt | sort | uniq -c | sort -nr
#>>>
#3 Gunma Prefecture
#3 Yamanashi Prefecture
#3 Yamagata Prefecture
#3 Saitama Prefecture
#2 Shizuoka Prefecture
#2 Aichi prefecture
#2 Gifu Prefecture
#2 Chiba
#1 Kochi prefecture
#1 Ehime prefecture
#1 Osaka
#1 Wakayama Prefecture
It's an idiom-like command that I often use, so I want to remember it well. Sort by sort, and if there is the same thing in the adjacent line with uniq, put it together, Use the -c option to count such duplicate rows "sort -nr" sorts the rows as numbers (in descending order).
Recommended Posts