I want to extract the start and end times from the log file and put them together in a table

Suppose that the following log file is generated for each job that loads the table

`TABLENAME1_load.log`


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:03:00 DATA LOAD NORMAL END !!!
# iroiro

`TABLETAME2_load.log`


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:10:00 DATA LOAD NORMAL END !!!
# iroiro

I want to output this collectively for the following

`result`


JOBNAME      START     END       TIME(s)  TIME(m)
TABLENAME1   00:01:00  00:03:00  120      2
TABLENAME2   00:01:00  00:10:00  540      9
...

Method

Overview

I will do my best to complete everything with shell!

INPUT

--Date (yyyymmdd) --File with load job list (jobs.txt)

`jobs.txt`


TABLENAME1
TABLENAME2
TABLENAME3

--Log file for each load job (TABLENAME_load. .log)

`shell:TABLENAME1_load.20200828000100.log`


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:10:00 DATA LOAD NORMAL END !!!
# iroiro

`shell:TABLENAME2_load.20200828000100.log`


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:13:00 DATA LOAD NORMAL END !!!
# iroiro

`shell:TABLENAME3_load.20200828000200.log`


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:20:00 DATA LOAD NORMAL END !!!
# iroiro

OUTPUT --Table of JOBNAME, START, END, TIME (s), TIME (m) --Sort by TIME so that the job that is the bottleneck can be identified.

Process flow

P0.Temporary text file(JOBNAME, START, END, TIME(s), TIME(m)To the header)Create
P1. jobs.For each one-line load job in txt
    P1.1.Check if there is a log file
    P1.2.Extract the line with the START and END times from the log file
    P1.3.Extract the time part and start\Assign to a variable in the form of tEND
    P1.4. start,Store end in variable
    P1.5.elapsed time(Seconds)In a variable
    P1.6.elapsed time(Minutes)In a variable
    P1.7. 4,5,Store 6 in one line of text file
P1.8.When all load jobs are completed, output in tabular order in descending order of elapsed time
P1.9.Delete temporary text file

What to use

sample.sh : shellscript --jobs.txt: File with the target job name --log: Assuming that the folder is cut by yymmdd and the log executed on the day remains there

`terminal`


$ ls -l
sample.sh
jobs.txt
log/

$ cat jobs.txt
TABLENAME1
TABLENAME2
TABLENAME3

$ ls -l log/
20200828/
20200829/

$ ls -l log/20200828/
TABLENAME1_load.20200828000100.log
TABLENAME2_load.20200828000100.log
TABLENAME3_load.20200828000100.log
TABLENAME3_load.20200828000200.log

Whole code

`sample.sh`


#!/bin/sh

# Get the jobname from txt file
jobs=($(cat $2))

# Create tmp file with table header
echo JOBNAME START END 'TIME(s)' 'TIME(m)'  >> tmp_result.txt

for x in ${jobs[@]};
do
	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		# Extract START and END from log file
		result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
		start=$(cut -d' ' -f 1 <<< $result)
		end=$(cut -d' ' -f 2 <<< $result)
		# Calc processing time
		time_s=$(expr `date -d$end +%s` - `date -d$start +%s`)
		time_m=$((time_s/60))
		# Save into txt file
		echo $x $start $end $time_s $time_m >> tmp_result.txt
	else
		echo 'there is no log file'
	fi
done 2> /dev/null

# Display result
(head -n +1 tmp_result.txt && tail -n +2 tmp_result.txt | sort -n -r -k 5) | column -t

# Remove tmp file
rm tmp_result.txt

result

――It seems that you can know the elapsed time for each job and understand which job should be improved.

`terminal`


$ sh sample.sh 20200828 jobs.txt
JOBNAME     START     END       TIME(s)  TIME(m)
TABLENAME3  00:01:00  00:20:00  1140     19
TABLENAME2  00:01:00  00:13:00  720      12
TABLENAME1  00:01:00  00:10:00  540      9

Processing details

Read jobs.txt and use it for for statement

`sample.sh`


#!/bin/sh
# get the jobname from txt file
jobs=($(cat $1))
for x in ${jobs[@]};
do
        echo $x
done

`terminal`


$ sh sample.sh jobs.txt
TABLENAME1
TABLENAME2
TABLENAME3

P0. Create a temporary text file (with JOBNAME, START, END, TIME (s), TIME (m) as header)

`sample.sh`


echo JOBNAME START END 'TIME(s)' 'TIME(m)'  >> tmp_result.txt

P1.1 Check if there is a log file

--Changed the first argument to yymmdd and the second argument to jobs.txt --Timestamp is included in the log file, so I want to check the existence with a wildcard. --If you judge with -e etc., an unexpected operator error will appear. --Use the result of ls to make a judgment

`sample.sh`


#!/bin/sh
# --
for x in ${jobs[@]};
do
	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		echo 'log file: ' $x
	else
		echo 'there is no log file'
	fi
done

`terminal`


$ sh sample.sh 20200828 jobs.txt
log file:  TABLENAME1
log file:  TABLENAME2
log file:  TABLENAME3

P1.2. Extract the line with the START and END times from the log file

--list log files --If there are multiple log files, take the latest one - tail -n 1 --Cat the contents - xargs cat --Extract only "DATA LOAD" rows - grep "DATA LOAD" --Extract the second column separated by "" - cut -d' ' -f2

`sample.sh`


ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2

`terminal`


$ sh sample.sh 20200828 jobs.txt
00:01:00 # TABLE1 START
00:10:00 # TABLE2 END
00:01:00 # TABLE2 START
00:13:00 # TABLE2 END
00:01:00 # TABLE3 START
00:20:00 # TABLE3 END

Now you can get HH: mm: dd of START and END for each job.

P1.3. Extract the time part and assign it to a variable in the format of START \ tEND

--Convert line breaks to tabs - tr '\n' '\t' --Substitute the result of pipe into a variable - result=$(process | process | process)

`sample.sh`


for x in ${jobs[@]};
do
	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
		echo $result
	else
		echo 'there is no log file'
	fi
done

`terminal`


$ sh sample.sh 20200828 jobs.txt
00:01:00 00:10:00 # TABLENAME1
00:01:00 00:13:00 # TABLENAME2
00:01:00 00:20:00 # TABLENAME3

P1.4. Store start and end in variables

--Substitute the first result separated by'' to start and the second to end - start=$(cut -d' ' -f 1 <<< $result) - end=$(cut -d' ' -f 2 <<< $result)

`sample.sh`


	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
		start=$(cut -d' ' -f 1 <<< $result)
		end=$(cut -d' ' -f 2 <<< $result)
	else
		echo 'there is no log file'
	fi

P1.5. Elapsed time (seconds) is stored in a variable, P1.6. Elapsed time (minutes) is stored in a variable

--Convert to UNIX time to calculate the difference between HH: mm: ss - date -dstart +%s - date -dend +%s --Evaluate the expression - expr

This is the most packed

`sample.sh`


time_s=$(expr `date -d$end +%s` - `date -d$start +%s`)
time_m=$((time_s/60))

`terminal`


$ expr `date -d'00:01:01' +%s` - `date -d'00:00:01' +%s`
60

P1.7. Store 4,5,6 in one line of text file

`sample.sh`


echo $x $start $end $time_s $time_m >> tmp_result.txt

P1.8. When all load jobs are completed, output in tabular format in descending order of elapsed time.

――The first line is header, so it is not subject to sort. - head -n +1 tmp_result.txt && --Sort in descending order using the 5th column TIME (m) as the key from the 2nd row onward. - tail -n +2 tmp_result.txt | sort -n -r -k 5 --Display in table format - column -t

`sample.sh`


(head -n +1 tmp_result.txt && tail -n +2 tmp_result.txt | sort -n -r -k 5) | column -t

P1.9. Delete the temporary text file

`sample.sh`


rm tmp_result.txt

in conclusion

I would like to get started with shell script, and if you have any opinions on how to write better, I would appreciate it if you could let me know.

reference

-I want to use wildcards to determine the existence of shell script files -About standard output / standard error output, / dev / null. -When comparing strings in the shell, I get the error: =: unary operator expected:

How do I center-align a column in UNIX? -How to calculate the date and time (date, time) with the date command

[Shell script] Calculates the elapsed time from the START and END log files and outputs it.

I want to extract the start and end times from the log file and put them together in a table

TABLENAME1_load.log

TABLETAME2_load.log

result

Method

Overview

jobs.txt

shell:TABLENAME1_load.20200828000100.log

shell:TABLENAME2_load.20200828000100.log

shell:TABLENAME3_load.20200828000200.log

Process flow

What to use

terminal

Whole code

sample.sh

result

terminal

Processing details

Read jobs.txt and use it for for statement

sample.sh

terminal

P0. Create a temporary text file (with JOBNAME, START, END, TIME (s), TIME (m) as header)

sample.sh

P1.1 Check if there is a log file

sample.sh

terminal

P1.2. Extract the line with the START and END times from the log file

sample.sh

terminal

P1.3. Extract the time part and assign it to a variable in the format of START \ tEND

sample.sh

terminal

P1.4. Store start and end in variables

sample.sh

P1.5. Elapsed time (seconds) is stored in a variable, P1.6. Elapsed time (minutes) is stored in a variable

sample.sh

terminal

P1.7. Store 4,5,6 in one line of text file

sample.sh

P1.8. When all load jobs are completed, output in tabular format in descending order of elapsed time.

sample.sh

P1.9. Delete the temporary text file

sample.sh

in conclusion

reference

`TABLENAME1_load.log`

`TABLETAME2_load.log`

`result`

`jobs.txt`

`shell:TABLENAME1_load.20200828000100.log`

`shell:TABLENAME2_load.20200828000100.log`

`shell:TABLENAME3_load.20200828000200.log`

`terminal`

`sample.sh`

`terminal`

`sample.sh`

`terminal`

`sample.sh`

`sample.sh`

`terminal`

`sample.sh`

`terminal`

`sample.sh`

`terminal`

`sample.sh`

`sample.sh`

`terminal`

`sample.sh`

`sample.sh`

`sample.sh`