Log aggregation and analysis (working with AWS Athena in Java)

Previous story

I was considering collecting and analyzing the logs of the Web application and visualizing the results, and I was thinking about what kind of configuration it would be if I wanted to make it myself. It is a story of the examination process at that time and the service used in the AWS environment actually adopted.

At the start of consideration

I thought Kibana would be enough ...

At first, there was a time when I was wondering if Elasticsearch → Kibana would be enough ... Certainly, it is possible to search and aggregate logs with Elasticsearch and visualize them with Kibana. However, it is difficult to see the aggregated results on another axis, and if you use Kibana, other people can see the raw log, so there are various problems, so the Kibana plan has disappeared. It was.

Proposal to put the aggregated result in RDB

Aggregation seems to be no problem with Elasticsearch, so can I get the necessary data from RDB when I put the aggregated result in RDB and visualize it? I thought.

But .... I decided to use a time-series DB because even if I put the aggregated results in the RDB, I didn't get such good results in terms of performance when I took them out by analysis.

Summarize the flow so far

  1. Push the web app log into Elasticsearch
  2. Aggregate with Elasticsearch and put it in time series DB (candidate at that time is InfluxDB)
  3. Extract from InfluxDB on the axis you want to analyze and graph it

I settled on such a system configuration.

However, as a result of various thoughts, from the conclusion, it seems that it will cost money in terms of infrastructure, so I decided to consider another plan ...

Therefore, it was proposed to build in the AWS environment.

Consider configuration by combining AWS services

Organize what you want to do

What I want to do is "I want to aggregate and analyze logs and visualize them !!" output by the application.

Services to use

If you check the AWS service for the time being, it seems that the following flow can be done

  1. Place application logs on S3 with fluentd
  2. Call Athena from Lambda and aggregate the logs placed in S3 (When Athena is executed, the CSV of the aggregation result is output)
  3. For the analysis, target the CSV output in 2, and re-aggregate with Athena on the axis you want.

The flow is simple like this.

Sample to operate Athena

To operate Athena from Java, use JDBC for Athena provided by AWS. Currently (as of November 2018) the latest is Athena JDBC42-2.0.5.jar, but [Download here](https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc. html) You can.

This is the Java code that calls Athena to register with Lambda in Flow 2.


    import java.io.File;
    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.ResultSet;
    import java.sql.Statement;
    import java.util.Properties;
  
    public class AthenaService {  

        //Athena Ohio Settings
        private static final String CONNECTION_URL = "jdbc:awsathena://athena.us-east-2.amazonaws.com:443";
        private static final String S3_BUCKET = "test-bucket";

        public void execute(String dateTime) {

            Properties info = new Properties();

            info.put("UID", "XXXXXXXX");
            info.put("PWD", "XXXXXXXX");
            info.put("S3OutputLocation",
               + "s3://" + S3_BUCKET + File.separator
               + "test-dir" + File.separator);

            Class.forName("com.simba.athena.jdbc.Driver");
            Connection connection = DriverManager.getConnection(CONNECTION_URL, info);
            Statement statement = connection.createStatement();

            String query = "SELECT xxxxxxxxxxxxxxxxxxxx";
            ResultSet result = statement.executeQuery(query);

            while(result.next()) {
                System.out.println(rs.getString("Key name"));
            }

            result.close();
            statement.close();
            connection.close();
    }

You can easily connect by simply setting the required information in Properties in this way.

bonus

Deleting metadata that can be done when running Athena

As a bonus, I don't need the file with the .metadata extension that was created when I ran Athena, so I'll delete it. Click here for the sample

import com.amazonaws.services.s3.model.ObjectListing;
import com.amazonaws.services.s3.model.S3ObjectSummary;

import open.ag.kabigon.athena.Constant;
import open.ag.kabigon.s3.service.S3Base;

public class S3Handler {

        private static final String S3_BUCKET = "test-bucket";

    private AmazonS3 s3;

    public void deleteAtenaMetadate(String dateTime) {
        BasicAWSCredentials awsCreds = new BasicAWSCredentials(
                "UID", "PWD");

        s3 = AmazonS3ClientBuilder.standard()
                .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
                .withRegion(Regions.US_EAST_2)
                .build();

        //Get the Object that exists in the directory of the specified bucket
        ObjectListing objectList = s3.listObjects(Constant.S3_AG_BACKET, "test-dir" + File.separator);

        deleteObject(objectList);
        s3.shutdown();
    }

    private void deleteObject(ObjectListing objectList) {
        objectList.getObjectSummaries().forEach(i -> {
            //The extension is.metadata or .Delete txt object
            if (i.getKey().endsWith(".metadata") || i.getKey().endsWith(".txt"))
                this.s3.deleteObject(Constant.S3_AG_BACKET, i.getKey());
        });

        if (objectList.isTruncated()) {
            ObjectListing remainsObject = this.s3.listNextBatchOfObjects(objectList);
            this.deleteObject(remainsObject);
        }
    }
}

Recommended Posts

Log aggregation and analysis (working with AWS Athena in Java)
Morphological analysis in Java with Kuromoji
Working with huge JSON in Java Lambda
Static code analysis with Checkstyle in Java + Gradle
Practice working with Unicode surrogate pairs in Java
Encrypt / decrypt with AES256 in PHP and Java
I dealt with Azure Functions not working in Java
Link Docker log to AWS CloudWatch and monitor in real time with VS Code
Store in Java 2D map and turn with for statement
Topic Analysis (LDA) in Java
Create a SlackBot with AWS lambda & API Gateway in Java
List aggregation in Java (Collectors.groupingBy)
Generate AWS Signature V4 in Java and request an API
Compare Hello, world! In Spring Boot with Java, Kotlin and Groovy
When log data accumulates in Rails and the environment stops working
How to encrypt and decrypt with RSA public key in Java
Use java with MSYS and Cygwin
Install Java and Tomcat with Ansible
AWS SDK for Java 1.11.x and 2.x
Encoding and Decoding example in Java
Use JDBC with Java and Scala.
Log output to file in Java
StringBuffer and StringBuilder Class in Java
Deleting AWS S3 Objects in Java
Output PDF and TIFF with Java 8
Using Java with AWS Lambda-Eclipse Preparation
Renamed folders in AWS S3 (Java)
Understanding equals and hashCode in Java
Try running AWS X-Ray in Java
Creating lexical analysis in Java 8 (Part 2)
1 Implement simple lexical analysis in Java
Play with Markdown in Java flexmark-java
Encrypt with Java and decrypt with C #
Creating lexical analysis in Java 8 (Part 1)
Hello world in Java and Gradle
Using Java with AWS Lambda-Implementation Tips-Get Instance Name from Reagion and Instance ID
[Rails 6] Register and log in with Devise + SNS authentication (multiple links allowed)
I wrote a Lambda function in Java and deployed it with SAM
Using Java with AWS Lambda-Implementation-Check CloudWatch Arguments
Difference between final and Immutable in Java
Monitor Java applications with jolokia and hawtio
Using Java with AWS Lambda-Implementation-Stop / Launch EC2
Link Java and C ++ code with SWIG
Concurrency Method in Java with basic example
Let's try WebSocket with Java and javascript!
[Java] for Each and sorted in Lambda
[Java] Reading and writing files with OpenCSV
AWS Lambda with Java starting now Part 1
Read xlsx file in Java with Selenium
Split a string with ". (Dot)" in Java
Try managing Java libraries with AWS CodeArtifact
Arrylist and linked list difference in java
Program PDF headers and footers in Java
Learn Flyweight patterns and ConcurrentHashMap in Java
Java Direction in C ++ Design and Evolution
Java to C and C to Java in Android Studio
Reading and writing gzip files in Java
Difference between int and Integer in Java
Discrimination of Enums in Java 7 and above
NLP4J [004] Try text analysis using natural language processing and parsing statistical processing in Java
NLP4J [003] Try text analysis using natural language processing and part-speech statistical processing in Java