Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --How to deal with data duplication errors in Elasticsearch

Introduction

Thanks! An engineer in charge of the product inspection process in the production engineering department. You may get a data duplication error when fetching data into Elasticsearch. I will introduce how to deal with it because it was described in the official blog.

It is a continuation of Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --What is Elastic Stack.

Target audience

This article is intended for those who are new to Elastic Stack and who are thinking about trying it out.

Content of this article

Official blog "Effectively prevent event-based data duplication in Elasticsearch" is a practice.

Filter settings

Generate a unique document ID with Filter. The ID is completed by hashing the message field with MD5, setting the time stamp field converted to hexadecimal in the prefix field, and concatenating these at the time of output.

Please note that if you leave the blog as it is, you will get an error saying Invalid Field Reference:'@ metadata [prefix]', so I changed it as follows.

logstash/pipeline/filter/filebeat_filter.cfg


  fingerprint {
    source => "message"
    target => "[@metadata][fingerprint]"
    method => "MD5"
    key => "test"
  }

  # ruby { code => "event.set('@metadata[prefix]', event.get('@timestamp').to_i.to_s(16))" }
  ruby { code => 'event.set("[@metadata][prefix]", event.get("@timestamp").to_i.to_s(16))' }

Output settings

As mentioned above, set by concatenating to document_id.

logstash/pipeline/output/filebeat_out.cfg


output {
  elasticsearch {
    hosts    => [ 'elasticsearch' ]
    index    => "%{[@metadata][beat]}-csv-%{+YYYY.MM.dd}"
    document_id => "%{[@metadata][prefix]}%{[@metadata][fingerprint]}"
  }
}

Finally

It's simple, but I think it's useful because it can be adapted in a wide range of situations.

Recommended Posts

Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --How to deal with data duplication errors in Elasticsearch
Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --Two ways to deal with Logstash OutOfMemoryError
Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --What is Elastic Stack?
Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --Dividing PipelineFilter into 3 files [input / filter / output] to improve maintainability and reusability
Analyze and visualize csv logs with Excel Elastic Stack (docker-compose) --Set up with docker-compose
Analyze and visualize csv logs with Excel Elastic Stack (docker-compose) --Receive input from multiple beats with Pipeline-to-Pipeline of Logstash
Analyze and visualize csv logs with Excel Elastic Stack (docker-compose) --Parse "year / month / day, hour: minute: second" in multiline with grok filter and treat it as Japan time
Analyze and visualize csv logs with Excel Elastic Stack (docker-compose)-(1st line: date, 2nd and subsequent lines: csv data) date is added to each line after the 2nd line as a timestamp field.
How to send custom metrics and events to datadog with laravel in docker-compose environment
[Ruby] 5 errors that tend to occur when scraping with Selenium and how to deal with them
How to deal with errors in Rails s could not find a JavaScript runtime.
How to handle TSV files and CSV files in Ruby
Common problems with WSL and how to deal with them
How to deal with different versions of rbenv and Ruby
How to deal with 405 Method Not Allowed error in Tomcat + JSP
Run logstash with Docker and try uploading data to Elastic Cloud
How to delete large amounts of data in Rails and concerns
How to get and add data from Firebase Firestore in Ruby
How to encrypt and decrypt with RSA public key in Java
twitter-4 selections of certain errors with Twitter login function created by omniauth gem and how to deal with them