Suppose that the following log file is generated for each job that loads the table
TABLENAME1_load.log
# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:03:00 DATA LOAD NORMAL END !!!
# iroiro
TABLETAME2_load.log
# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:10:00 DATA LOAD NORMAL END !!!
# iroiro
I want to output this collectively for the following
result
JOBNAME START END TIME(s) TIME(m)
TABLENAME1 00:01:00 00:03:00 120 2
TABLENAME2 00:01:00 00:10:00 540 9
...
I will do my best to complete everything with shell!
INPUT
--Date (yyyymmdd) --File with load job list (jobs.txt)
jobs.txt
TABLENAME1
TABLENAME2
TABLENAME3
--Log file for each load job (TABLENAME_load.
shell:TABLENAME1_load.20200828000100.log
# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:10:00 DATA LOAD NORMAL END !!!
# iroiro
shell:TABLENAME2_load.20200828000100.log
# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:13:00 DATA LOAD NORMAL END !!!
# iroiro
shell:TABLENAME3_load.20200828000200.log
# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:20:00 DATA LOAD NORMAL END !!!
# iroiro
OUTPUT --Table of JOBNAME, START, END, TIME (s), TIME (m) --Sort by TIME so that the job that is the bottleneck can be identified.
P0.Temporary text file(JOBNAME, START, END, TIME(s), TIME(m)To the header)Create
P1. jobs.For each one-line load job in txt
P1.1.Check if there is a log file
P1.2.Extract the line with the START and END times from the log file
P1.3.Extract the time part and start\Assign to a variable in the form of tEND
P1.4. start,Store end in variable
P1.5.elapsed time(Seconds)In a variable
P1.6.elapsed time(Minutes)In a variable
P1.7. 4,5,Store 6 in one line of text file
P1.8.When all load jobs are completed, output in tabular order in descending order of elapsed time
P1.9.Delete temporary text file
terminal
$ ls -l
sample.sh
jobs.txt
log/
$ cat jobs.txt
TABLENAME1
TABLENAME2
TABLENAME3
$ ls -l log/
20200828/
20200829/
$ ls -l log/20200828/
TABLENAME1_load.20200828000100.log
TABLENAME2_load.20200828000100.log
TABLENAME3_load.20200828000100.log
TABLENAME3_load.20200828000200.log
sample.sh
#!/bin/sh
# Get the jobname from txt file
jobs=($(cat $2))
# Create tmp file with table header
echo JOBNAME START END 'TIME(s)' 'TIME(m)' >> tmp_result.txt
for x in ${jobs[@]};
do
if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
# Extract START and END from log file
result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
start=$(cut -d' ' -f 1 <<< $result)
end=$(cut -d' ' -f 2 <<< $result)
# Calc processing time
time_s=$(expr `date -d$end +%s` - `date -d$start +%s`)
time_m=$((time_s/60))
# Save into txt file
echo $x $start $end $time_s $time_m >> tmp_result.txt
else
echo 'there is no log file'
fi
done 2> /dev/null
# Display result
(head -n +1 tmp_result.txt && tail -n +2 tmp_result.txt | sort -n -r -k 5) | column -t
# Remove tmp file
rm tmp_result.txt
――It seems that you can know the elapsed time for each job and understand which job should be improved.
terminal
$ sh sample.sh 20200828 jobs.txt
JOBNAME START END TIME(s) TIME(m)
TABLENAME3 00:01:00 00:20:00 1140 19
TABLENAME2 00:01:00 00:13:00 720 12
TABLENAME1 00:01:00 00:10:00 540 9
sample.sh
#!/bin/sh
# get the jobname from txt file
jobs=($(cat $1))
for x in ${jobs[@]};
do
echo $x
done
terminal
$ sh sample.sh jobs.txt
TABLENAME1
TABLENAME2
TABLENAME3
sample.sh
echo JOBNAME START END 'TIME(s)' 'TIME(m)' >> tmp_result.txt
--Changed the first argument to yymmdd and the second argument to jobs.txt --Timestamp is included in the log file, so I want to check the existence with a wildcard. --If you judge with -e etc., an unexpected operator error will appear. --Use the result of ls to make a judgment
sample.sh
#!/bin/sh
# --
for x in ${jobs[@]};
do
if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
echo 'log file: ' $x
else
echo 'there is no log file'
fi
done
terminal
$ sh sample.sh 20200828 jobs.txt
log file: TABLENAME1
log file: TABLENAME2
log file: TABLENAME3
--list log files
--If there are multiple log files, take the latest one
- tail -n 1
--Cat the contents
- xargs cat
--Extract only "DATA LOAD" rows
- grep "DATA LOAD"
--Extract the second column separated by ""
- cut -d' ' -f2
sample.sh
ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2
terminal
$ sh sample.sh 20200828 jobs.txt
00:01:00 # TABLE1 START
00:10:00 # TABLE2 END
00:01:00 # TABLE2 START
00:13:00 # TABLE2 END
00:01:00 # TABLE3 START
00:20:00 # TABLE3 END
Now you can get HH: mm: dd
of START and END for each job.
--Convert line breaks to tabs
- tr '\n' '\t'
--Substitute the result of pipe into a variable
- result=$(process | process | process)
sample.sh
for x in ${jobs[@]};
do
if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
echo $result
else
echo 'there is no log file'
fi
done
terminal
$ sh sample.sh 20200828 jobs.txt
00:01:00 00:10:00 # TABLENAME1
00:01:00 00:13:00 # TABLENAME2
00:01:00 00:20:00 # TABLENAME3
--Substitute the first result separated by'' to start and the second to end
- start=$(cut -d' ' -f 1 <<< $result)
- end=$(cut -d' ' -f 2 <<< $result)
sample.sh
if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
start=$(cut -d' ' -f 1 <<< $result)
end=$(cut -d' ' -f 2 <<< $result)
else
echo 'there is no log file'
fi
--Convert to UNIX time to calculate the difference between HH: mm: ss
- date -d
sample.sh
time_s=$(expr `date -d$end +%s` - `date -d$start +%s`)
time_m=$((time_s/60))
terminal
$ expr `date -d'00:01:01' +%s` - `date -d'00:00:01' +%s`
60
sample.sh
echo $x $start $end $time_s $time_m >> tmp_result.txt
――The first line is header, so it is not subject to sort.
- head -n +1 tmp_result.txt &&
--Sort in descending order using the 5th column TIME (m)
as the key from the 2nd row onward.
- tail -n +2 tmp_result.txt | sort -n -r -k 5
--Display in table format
- column -t
sample.sh
(head -n +1 tmp_result.txt && tail -n +2 tmp_result.txt | sort -n -r -k 5) | column -t
sample.sh
rm tmp_result.txt
I would like to get started with shell script, and if you have any opinions on how to write better, I would appreciate it if you could let me know.
-I want to use wildcards to determine the existence of shell script files -About standard output / standard error output, / dev / null. -When comparing strings in the shell, I get the error: =: unary operator expected:
Recommended Posts