Analyze and visualize csv logs with Excel Elastic Stack (docker-compose)-(1st line: date, 2nd and subsequent lines: csv data) date is added to each line after the 2nd line as a timestamp field.

Introduction

Thanks! An engineer in charge of the product inspection process in the production engineering department. It is a continuation of Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --What is Elastic Stack.

Target audience

This article is intended for those who are new to Elastic Stack and who are thinking about trying it out.

Content of this article

I will explain how to treat the first date of such csv data as the timestamp of each line.

Date,2020/10/28,20:19:18
10,Test1,130.1,OK
20,Test2,1321,OK
30,Test3,50.2,OK
End
Date,2020/10/29,10:30:50
10,Test1,140.4,OK
20,Test2,1300,OK
30,Test3,50.0,OK
End
Date,2020/10/29,11:40:50
10,Test1,141.1,OK
20,Test2,1310,OK
30,Test3,55.8,NG
End

I have put a set of configuration files in GitLab, so please refer to it. Click here for repository-> elastic-stack

policy

I examined it in two ways.

Policy 1: Read line by line with logstash

In conclusion, this method didn't work. aggregate Because filters allow you to share information between multiple events. , In some cases it is possible.

In the official example, the numbers from start to end are added and the result of adding to the event field of TASK_END is set.

 INFO - 12345 - TASK_START - start
 INFO - 12345 - SQL - sqlQuery1 - 12
 INFO - 12345 - SQL - sqlQuery2 - 34
 INFO - 12345 - TASK_END - end

However, you can share data between events by setting the task_id of the aggregate filter to "12345". In this situation, the task_id equivalent did not exist in the data and could not be used.

Policy 2: Combine multiple lines with filebeat

It's a successful method. By using filebeat's multiline, multiple lines of text can be combined into one \ n delimiter. It can be an event. Below, it can be summarized by the rules of three lines.

  multiline.pattern: (End)
  multiline.negate: true
  multiline.match: before

@ datake913's article Multiline setting summary that handles multiple lines with Filebeat is a table and it was easy to understand, so I will refer to it as "pattern: Consecutive lines that do not match (END) are added before the next matching line. " It can be combined into one line as shown below.

Date,2020/10/28,20:19:18\n10,Test1,130.1,OK\n20,Test2,1321,OK\n30,Test3,50.2,OK\nEnd

Continuation of policy 2: Decompose multiline with logstash

To parse the timestamp, use De-Excel Elastic Stack (docker-compose) to analyze and visualize csv logs --- parse "year / month / day, hour: minute: second" in multiline with grok filter, and Japanese time Treat as.

If you apply the csv filter to multiline as it is, you can only parse Date, 2020/10 / 28,20: 19: 18 up to the first \ n. split By using a filter, multiline can be decomposed again and divided into multiple events. ..

As a final process, lines that start with a non-number are deleted with the drop filter and mutate Type conversion is performed from String type with a filter.

logstash.conf


filter {
  grok {
    patterns_dir => ["/opt/logstash/extra_patterns"]
    match => { "message" => "%{TIMESTAMP_JP:read_timestamp}" }
  }

  date {
    match => ["read_timestamp", "yyyy/MM/dd,HH:mm:ss"]
    timezone => "Asia/Tokyo"
    target => "@timestamp"
  }

  split{}

  csv {
    columns => ["Step","TestName","Value1","Judge"]
    separator => ","
  }

  if [Step] !~ /\d+/ {
    drop{}
  }

  mutate {
    convert => {
      "Step" => "integer"
      "Value1" => "float"
    }
  }
}

Finally

You can now assign the same timestamp to multiple lines using multiline and split. In future articles, I would like to introduce countermeasures for the following heap errors.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1.hprof ...
Heap dump file created [3178685347 bytes in 34.188 secs]
warning: thread "[main]>worker11" terminated with exception (report_on_exception is true):
warning: thread "[main]>worker4" terminated with exception (report_on_exception is true):
java.lang.OutOfMemoryError: Java heap space

Recommended Posts

Analyze and visualize csv logs with Excel Elastic Stack (docker-compose)-(1st line: date, 2nd and subsequent lines: csv data) date is added to each line after the 2nd line as a timestamp field.
Analyze and visualize csv logs with Excel Elastic Stack (docker-compose) --Set up with docker-compose
Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --What is Elastic Stack?
Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --How to deal with data duplication errors in Elasticsearch
Analyze and visualize csv logs with Excel Elastic Stack (docker-compose) --Receive input from multiple beats with Pipeline-to-Pipeline of Logstash
Analyze and visualize csv logs with Excel Elastic Stack (docker-compose) --Parse "year / month / day, hour: minute: second" in multiline with grok filter and treat it as Japan time
Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --Two ways to deal with Logstash OutOfMemoryError
Build Elastic Stack with Docker and analyze IIS logs
Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --Dividing PipelineFilter into 3 files [input / filter / output] to improve maintainability and reusability