Analyze and visualize csv logs with Excel Elastic Stack (docker-compose)-(1st line: date, 2nd and subsequent lines: csv data) date is added to each line after the 2nd line as a timestamp field.

Introduction

Thanks! An engineer in charge of the product inspection process in the production engineering department. It is a continuation of Analyzing and visualizing csv logs with Excel Elastic Stack (docker-compose) --What is Elastic Stack.

Target audience

This article is intended for those who are new to Elastic Stack and who are thinking about trying it out.

Content of this article

I will explain how to treat the first date of such csv data as the timestamp of each line.

Date,2020/10/28,20:19:18
10,Test1,130.1,OK
20,Test2,1321,OK
30,Test3,50.2,OK
End
Date,2020/10/29,10:30:50
10,Test1,140.4,OK
20,Test2,1300,OK
30,Test3,50.0,OK
End
Date,2020/10/29,11:40:50
10,Test1,141.1,OK
20,Test2,1310,OK
30,Test3,55.8,NG
End

I have put a set of configuration files in GitLab, so please refer to it. Click here for repository-> elastic-stack

policy

I examined it in two ways.

Policy 1: Read line by line into logstash. Save when the date comes and set the date saved on the following line in the timestamp field.
Policy 2: Use filebeat to send End to logstash as one event, and logstash to decompose it line by line.

Policy 1: Read line by line with logstash

In conclusion, this method didn't work. aggregate Because filters allow you to share information between multiple events. , In some cases it is possible.

In the official example, the numbers from start to end are added and the result of adding to the event field of TASK_END is set.

 INFO - 12345 - TASK_START - start
 INFO - 12345 - SQL - sqlQuery1 - 12
 INFO - 12345 - SQL - sqlQuery2 - 34
 INFO - 12345 - TASK_END - end

However, you can share data between events by setting the task_id of the aggregate filter to "12345". In this situation, the task_id equivalent did not exist in the data and could not be used.

Policy 2: Combine multiple lines with filebeat

It's a successful method. By using filebeat's multiline, multiple lines of text can be combined into one \ n delimiter. It can be an event. Below, it can be summarized by the rules of three lines.

  multiline.pattern: (End)
  multiline.negate: true
  multiline.match: before

@ datake913's article Multiline setting summary that handles multiple lines with Filebeat is a table and it was easy to understand, so I will refer to it as "pattern: Consecutive lines that do not match (END) are added before the next matching line. " It can be combined into one line as shown below.

Date,2020/10/28,20:19:18\n10,Test1,130.1,OK\n20,Test2,1321,OK\n30,Test3,50.2,OK\nEnd

Continuation of policy 2: Decompose multiline with logstash

To parse the timestamp, use De-Excel Elastic Stack (docker-compose) to analyze and visualize csv logs --- parse "year / month / day, hour: minute: second" in multiline with grok filter, and Japanese time Treat as.

If you apply the csv filter to multiline as it is, you can only parse Date, 2020/10 / 28,20: 19: 18 up to the first \ n. split By using a filter, multiline can be decomposed again and divided into multiple events. ..

As a final process, lines that start with a non-number are deleted with the drop filter and mutate Type conversion is performed from String type with a filter.

`logstash.conf`


filter {
  grok {
    patterns_dir => ["/opt/logstash/extra_patterns"]
    match => { "message" => "%{TIMESTAMP_JP:read_timestamp}" }
  }

  date {
    match => ["read_timestamp", "yyyy/MM/dd,HH:mm:ss"]
    timezone => "Asia/Tokyo"
    target => "@timestamp"
  }

  split{}

  csv {
    columns => ["Step","TestName","Value1","Judge"]
    separator => ","
  }

  if [Step] !~ /\d+/ {
    drop{}
  }

  mutate {
    convert => {
      "Step" => "integer"
      "Value1" => "float"
    }
  }
}

Finally

You can now assign the same timestamp to multiple lines using multiline and split. In future articles, I would like to introduce countermeasures for the following heap errors.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1.hprof ...
Heap dump file created [3178685347 bytes in 34.188 secs]
warning: thread "[main]>worker11" terminated with exception (report_on_exception is true):
warning: thread "[main]>worker4" terminated with exception (report_on_exception is true):
java.lang.OutOfMemoryError: Java heap space