Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --How to deal with data duplication errors in Elasticsearch

Introduction

Thanks! An engineer in charge of the product inspection process in the production engineering department. You may get a data duplication error when fetching data into Elasticsearch. I will introduce how to deal with it because it was described in the official blog.

It is a continuation of Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --What is Elastic Stack.

Target audience

This article is intended for those who are new to Elastic Stack and who are thinking about trying it out.

Content of this article

Official blog "Effectively prevent event-based data duplication in Elasticsearch" is a practice.

Filter settings

Generate a unique document ID with Filter. The ID is completed by hashing the message field with MD5, setting the time stamp field converted to hexadecimal in the prefix field, and concatenating these at the time of output.

Please note that if you leave the blog as it is, you will get an error saying Invalid Field Reference:'@ metadata [prefix]', so I changed it as follows.

`logstash/pipeline/filter/filebeat_filter.cfg`


  fingerprint {
    source => "message"
    target => "[@metadata][fingerprint]"
    method => "MD5"
    key => "test"
  }

  # ruby { code => "event.set('@metadata[prefix]', event.get('@timestamp').to_i.to_s(16))" }
  ruby { code => 'event.set("[@metadata][prefix]", event.get("@timestamp").to_i.to_s(16))' }

Output settings

As mentioned above, set by concatenating to document_id.

`logstash/pipeline/output/filebeat_out.cfg`


output {
  elasticsearch {
    hosts    => [ 'elasticsearch' ]
    index    => "%{[@metadata][beat]}-csv-%{+YYYY.MM.dd}"
    document_id => "%{[@metadata][prefix]}%{[@metadata][fingerprint]}"
  }
}