Thanks! An engineer in charge of the product inspection process in the production engineering department. You may get a data duplication error when fetching data into Elasticsearch. I will introduce how to deal with it because it was described in the official blog.
It is a continuation of Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --What is Elastic Stack.
This article is intended for those who are new to Elastic Stack and who are thinking about trying it out.
Official blog "Effectively prevent event-based data duplication in Elasticsearch" is a practice.
Generate a unique document ID with Filter. The ID is completed by hashing the message field with MD5, setting the time stamp field converted to hexadecimal in the prefix field, and concatenating these at the time of output.
Please note that if you leave the blog as it is, you will get an error saying Invalid Field Reference:'@ metadata [prefix]'
, so I changed it as follows.
logstash/pipeline/filter/filebeat_filter.cfg
fingerprint {
source => "message"
target => "[@metadata][fingerprint]"
method => "MD5"
key => "test"
}
# ruby { code => "event.set('@metadata[prefix]', event.get('@timestamp').to_i.to_s(16))" }
ruby { code => 'event.set("[@metadata][prefix]", event.get("@timestamp").to_i.to_s(16))' }
As mentioned above, set by concatenating to document_id.
logstash/pipeline/output/filebeat_out.cfg
output {
elasticsearch {
hosts => [ 'elasticsearch' ]
index => "%{[@metadata][beat]}-csv-%{+YYYY.MM.dd}"
document_id => "%{[@metadata][prefix]}%{[@metadata][fingerprint]}"
}
}
It's simple, but I think it's useful because it can be adapted in a wide range of situations.
Recommended Posts