Hello. I'm Tanaka from NETS1.
I want to identify the alert email. ――XXXXX is a wild card! - is a continuation.
It seems that no one is registered on the last day, so I decided to write it.
One day I received an alert email like this.
Dec 14 12:59:12 app001 2020/12/14 12:59:12.449 app001 ERROR ERROR002 ecnxeci-1349 Great things happened. sugoi[10223]And an error Oh, what!
If you look at the list, you can clearly see that it is a telephone call.
Error code | Error message | Correspondence |
---|---|---|
ERROR002 | app00x ERROR ERROR002 xxxxx A great thing happened. XXXXX[10223]Error in | Telephone contact |
ERROR002 | Great thing happened. XXXXX[YYYYY]Error in | Email contact |
ERROR002 | Great thing happened. | ignore |
But if you look at the result with the previous code ...
ecn has been judged as delete. If true, I would like ecnxeci-1349 to be judged as replace, I misunderstood that the x contained in the ecnxeci-1349 part matched the xxxxx part on the list side.
If x matches, it shouldn't be x!
It's easy to say. You just have to replace it with something other than x that matches xxxxx. But how do you get x to be an unexpected match for a wildcard ...
If you don't define what a wildcard is, you can't judge it as a wildcard.
I think this is the wildcard when I wrote it so that people can easily see it. It is evaluated as replace when evaluated by difflib, and on the manual (list) side If the string meets the above conditions, it is a wildcard.
As for how to judge, I decided to utilize the evaluation value of difflib.
In this example,
: Dec 14 12:59:12 app001 2020/12/14 12:59:12.449 |
equal : app00 | app00
replace : 1 | x
equal : ERROR ERROR002 | ERROR ERROR002
delete : ecn |
equal : x | x
replace : eci-1349 | xxxx
equal :Great thing happened.|Great thing happened.
replace : sugoi | XXXXX
equal : [10223]Error in| [10223]Error in
:Oh, what!|
From this result, 1. Now, create a new evaluation character string as shown below and output the evaluation value.
app001 ERROR ERROR002 xeci-1349 Great things happened. sugoi[10223]Error in
At this time, of course, the xeci-1349 part does not match the character string on the mail side, so the evaluation value does not reach the maximum of 1.0.
Next, in 2. and 3., set x in the part of equal: x | x
to an appropriate character string, and create the following mail side character string.
(Unnecessary beginning and end of the email are deleted at the pre-processing stage)
app001 ERROR ERROR002 ecnieci-1349 Great things happened. sugoi[10223]Error in
Then use this email string to re-evaluate from 1. Then, in 1. of re-evaluation, the following evaluation character string is newly created and the evaluation value is output.
app001 ERROR ERROR002 ecnieci-1349 Great things happened. sugoi[10223]Error in
At this time, the re-evaluation value becomes 1.0, which is the maximum, and the evaluation value rises, so it can be seen that the match was unexpected.
The main part is the same as before, so it is omitted
def search_space(message):
'''Returns the character position next to the space (for convenience, start with a space)'''
space_pos = [0]
index = 0
for c in message:
if c == ' ':
space_pos.append(index + 1)
index+=1
return space_pos
def is_replacement(string):
# x,Replace if X Other than that, if it is a single character, it is not applicable
if len(string) <= 1:
if string.lower() == 'x':
return True
return False
#Not applicable if the characters are not the same consecutively
pre_char = string[1]
for char in string:
if pre_char != char:
return False
return True
def diff_analyzer(skip_seek, msg, man_msg):
fix = 0
fix_man_msg = man_msg
fix_msg = msg
opcodes = [('', 0, skip_seek, 0, 0)]
seq = difflib.SequenceMatcher(None, msg, man_msg)
ratio = seq.ratio()
for tag, i1, i2, j1, j2 in seq.get_opcodes():
fj1 = j1 + fix
fj2 = j2 + fix
if tag == 'replace':
#Wildcard replacement target changes the manual side and new tags(fix_equal)Put on
if is_replacement(fix_man_msg[fj1:fj2]):
fix_man_msg = fix_man_msg[:fj1] + msg[i1:i2] + fix_man_msg[fj2:]
fix = fix + i2 - i1 - (fj2 - fj1)
tag = 'fix_equal'
elif tag == 'equal':
#It is assumed that the random character string on the message side happens to match around x on the manual side.
#Forced change if one character is equal except for spaces(If it is not good to change it, the evaluation should be lowered at the time of re-evaluation)
#If there is a match, replace the message side with an appropriate character
if (fj2 - fj1 == 1 and fix_man_msg[fj1:fj2] != ' ') or is_replacement(fix_man_msg[fj1:fj2]):
replace_msg = ''
for letter in msg[i1:i2]:
#Add 100 to unicode and replace with different characters(Super violent)
replace_msg += chr(ord(letter) + 100)
fix_msg = fix_msg[:i1] + replace_msg + fix_msg[i2:]
opcodes.append((tag, skip_seek + i1, skip_seek + i2, j1, j2))
finish_seek = skip_seek + i2
#Re-evaluate only ratio when replacing wildcards
if fix_man_msg != man_msg:
ratio = difflib.SequenceMatcher(None, msg, fix_man_msg).ratio()
#Re-evaluate when there is an unexpected match
#If the result of the re-evaluation is not an unexpected match, fix_Discard msg
if fix_msg != msg:
f_seek, f_opcodes, f_ratio = diff_analyzer(skip_seek, fix_msg, man_msg)
print(f_ratio, ':', fix_msg)
print(ratio, ':', msg)
if ratio < f_ratio:
finish_seek = f_seek
opcodes = f_opcodes
ratio = f_ratio
else:
fix_msg = msg
return (finish_seek, opcodes, ratio)
def check_message_by_difflib(manual, message):
space_pos = search_space(message)
ratio = 0
#Evaluate each space as the beginning
for i in space_pos:
msg = message[i:]
delete_flag = False
#If the end ends with delete, delete and evaluate
tag, i1, i2, j1, j2 = difflib.SequenceMatcher(None, msg, manual).get_opcodes()[-1]
if tag == 'delete':
msg = msg[:i1]
delete_flag = True
finish_seek, tmp_opcodes, tmp_ratio = diff_analyzer(i, msg, manual)
if ratio <= tmp_ratio:
if delete_flag:
tmp_opcodes.append(('', finish_seek, len(message), 0, 0))
ratio = tmp_ratio
opcodes = tmp_opcodes
return opcodes, ratio
...abridgement...
: Dec 14 12:59:12 app001 2020/12/14 12:59:12.449 |
equal : app00 | app00
fix_equal : 1 | x
equal : ERROR ERROR002 | ERROR ERROR002
fix_equal : ecnxeci-1349 | xxxxx
equal :Great thing happened.|Great thing happened.
fix_equal : sugoi | XXXXX
equal : [10223]Error in| [10223]Error in
:Oh, what!|
ecnxeci-1349 is evaluated as a wildcard part (fix_equal), and it looks good. But I'm wondering if I can really judge only the wildcard part, so I will compare even such a character string.
Mail side… app001 ERROR fix_data.sh error Manual side… app00x ERROR boxdata.sh error
Execution result
0.8813559322033898 : app001 ERROR fiÜ_data.sh error
0.9152542372881356 : app001 ERROR fix_data.sh error
0.7307692307692307 : ERROR fiÜ_data.sh error
0.7692307692307693 : ERROR fix_data.sh error
0.5652173913043478 : fiÜ_data.sh error
0.6086956521739131 : fix_data.sh error
app001 ERROR fix_data.sh error
: |
equal : app00 | app00
fix_equal : 1 | x
equal : ERROR | ERROR
replace : fi | bo
equal : x | x
delete : _ |
equal : data.sh error | data.sh error
It is still correctly equal: x | x
.
Even if you look at the first and second lines, the evaluation value after replacing as intended has dropped.
It's a nice atmosphere.
I think that main should be changed so that the evaluation value of 1.0 and the longest match is the judgment result. Like this.
mail = 'Dec 14 12:59:12 app001 2020/12/14 12:59:12.449 app001 ERROR ERROR002 ecnxeci-1349 Great things happened. sugoi[10223]And an error Oh, what!'
manual1 = 'app00x ERROR ERROR002 xxxxx A great thing happened. XXXXX[10223]Error in'
manual2 = 'Great thing happened. XXXXX[YYYYY]Error in'
manual3 = 'Great thing happened.'
max_match_length = 0
result = 'Not applicable'
for manual in manuals:
opcodes, ratio = check_message_by_difflib(manual, mail)
match_length = sum([opcode[2] - opcode[1] for opcode in opcodes if opcode[0] == 'fix_equal' or opcode[0] == 'equal'])
if ratio == 1:
if max_match_length < match_length:
max_match_length = match_length
result = manual
print('result:' + result)
Execution result
Result: app00x ERROR ERROR002 xxxxx Great thing happened. XXXXX[10223]Error in
I tried my best to make a reasonable mechanical judgment, but in the end it was a visual check. There may be a bug in the tool, and it doesn't correspond to "xxx must have the same number of characters". ~~ (Actually, there is a typographical error on the manual side) ~~ Don't believe in the machine too much and make a mistake in determining the really important alerts ...
In the future, I'd like to be able to correct judgment errors based on opcodes results and actual judgment results.
Recommended Posts