Working with huge JSON in Java Lambda

I wrote an article that makes an InputStream an input to Lambda.

I tried Java Lambda input / output type ~ Stream version ~ https://qiita.com/kazfuku/items/6f0f55ffa3a88d76cfaa

The advantage of using InputStream is that it can handle huge JSON. So, I tried to verify what it actually was.

test data

First, I prepared a huge JSON. The data contains 6000 objects with name and text, which is about 5.9MB, which is close to Lambda's input size limit of 6MB.

{
  "data":[
    {
      "name":"vrZPwIw3T7","text":"Ku7aQqW3WzUeiRdXnNB26iVElWdOUj8mQhvHksvN1sMmQ2fT3M8navvbTJuspda2q0bY3FWvsDoguE33tTNtoxuiHjdkUIHmylIezYGitmhJ2bbgcHhcHPzGr4eg3Ger9EijFU82Sq4WS9G5UVW62Cw1rDMNdIld2yxn1Zd3DXqE26iOf1IaBQTzEG7Pld03hkXIkAdTdeAjXlAJlGrwnQgjMh1FohW1bUAYeaLi52qLnbgQd7lZAJuOlitfGUyUbP0BjbsPflOLGQwInPjr2Mt3mG4HDokWj2JJgRkXkRYxq34AxQGNWXjlfWKViDxk2InIP6oMsir5YmTL1oO58dzmBGCoYV7e0PTGQHXJbgPJUFUoCmv3mATCEg1xhOa4IUcP7vC7dMvydS3Qt1QHteYajeCvXiW0HjuHkm2oJ61yEg6JocqLVMQ75RaU0Wjb2KvbAwQmggSel5E6mMl0BacZwBXw7OaYHkHO1p1hQup2hhNkaAkN7B8NS8QJ3oSRPQsM6QsETC3x1ErrN0jZZVqupjDvPEr9xj0fDOpqCo7XqTuSPbf3UhHQgjPyikbc2JaqeMdJf1R0RojlqWmf2STGH8HTuGJTQG3vEP04BkrNLaKNVoXE49tPyePO6EqRAKWNxVZoQmw34Xv6yGzMfOLPcSRhML0rYk1FEaBDmgGNpQIPdYjbT3MC08eEY9cHa813iWvm42XmG5LaiIt2z4IcGaWnLwCRytYJJsdqphSEhyvyOpIKM4i02t9rx3Pkt0704EhFo3SD8gVVIE2y1coFUJqy2GxVqptZrKpFv56c4SsWSPqdLqTH9Gh09Y5Cph6eOKg0JXvip1GoONZ80oBUeRudMvsl32m23fYZlG1dNFnGUZSkz2TiGP9baIfLyPWCcPZGEvVaP9FR64FW0yOLvpKyTNYXg21ZsgEkYo3tbcn3AS5R3Ai4eg5hYMaBoVAsMBK1BPZAncDoqOs97nLa2DZFyqgNSz8Asgmh"
    },
    {
      "name":"OXJIXTP9dz","text":"HkWP0PumYHQZxiGNhGWASXOPrygri7cKXs2hrWx0WaumM8OEVc9UKs2EIknzCsBmAMFRER5YNIUs5oz30LjDrjn1PsoKuh60KPOEaHnBNTt8PivYx0hIfmLoLk56ad6LSLNpUVMCP26WiPyont6OjfD1c4sxtKn3qlg6SaNSs2B1tGoReVb3pwOVvPH2BaULL5rzYyDFfAqFqo4D2UevrdoUOeXK3Ks3tav92wHnECM9pbXdsCbWyr1BNulJ5elcZm8HALPgBeX9dg0vaqpgITfz3klyYIWynzPOJC92t0vMao2tL7lr6uxuQvldWgdhlzGjYP123pdWb2h0zItg8NyyK58tCKy2t2YqtdP23fvmVpmOFygiM6kF9LvDRfnu3mz0X2SvcsQh8UqB84dHiOXwicmnI6DX47OuPXOUZc0wICql8zit6WvbEmDchKy9M74u9mPaiIxGXBy8FvLEptqqGytywwC3GGYXEpLYZlbxDycrSTtCq6PUuWoUbfsJmZT4iZSvM0aoyVKBE2l23oXhFZpM4fxyyziIVAHP9YsQbHQlvr8adtD3voumsGKcklt4mnNQclQdSLKPKSIGdUlkvhcCO4MZcEpKcmSrFU6naOYGL1geB1CuTYHYuw0x6tc7JudQAEB6IWE8xwTgPWQUM15xTsqsLrBIwZ70MGpCGW8JCw6sqJExsXi6wpJ1I3L43TUG4hJOnEPIHeXTco06zaDiSrqG3LsLuCiHIkqYui1N0fJBRJhVcn2X8dXMnQKxqhISGrnP7TeBBcAhI8qrmNK0k9EV6mECQtN2g8qaRYVqwOqC4kwzMpvPWkUnNQuUZbknLlWOKuVeh0mrjTzIxQkMShqhdt21o75h9rz0DPxvNHkS6jLw7TBprYieZwcO8iIRy1zYFedSXyVktczdEczIebkfDFmjGtDeZw5RuuFUYKDk4U3J5lpmfmf9K3G6LuPeV5soPxL54l8ZxlJGNpP1kZftZeadtms7"
    },
...

Try with Map method

First, this is an implementation of the Map method, and the number of data is counted.

What is the Map method? -> I tried the input / output type of Java Lambda ~ Map edition ~

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.util.Collections;
import java.util.List;
import java.util.Map;

public class MapHugeJsonFunction implements RequestHandler<Map<String, Object>, Map<String, Object>> {

    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();

    @Override
    public Map<String, Object> handleRequest(Map<String, Object> event, Context context) {
        List<Map<String, Object>> data = (List<Map<String, Object>>) event.get("data");
        int count = data.size();

        return Collections.singletonMap("count", count);
    }
}

Execution result (≠ Cold Start)

REPORT RequestId: ac45a98c-be93-49b5-8813-c30ae7d731c9	Duration: 2030.51 ms	Billed Duration: 2100 ms	Memory Size: 128 MB	Max Memory Used: 118 MB

We are using 118MB, which is close to the maximum memory. This is the state where all the JSON is in memory when the Map to be passed to the handlerRequest is built. If you add more business logic to this, 128MB will not be enough memory.

Try with Stream method

I tried the same process with the Stream method. With this implementation, you can process all the JSON without putting it in memory.

The JSON Streaming API uses the Jackson Streaming API (https://github.com/FasterXML/jackson-databind#5-minute-tutorial-streaming-parser-generator). In XML, it's the same idea as using SAX Parser.

What is the Stream method? -> I tried Java Lambda input / output type ~ Stream version ~

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Collections;
import java.util.Map;

public class StreamHugeJsonFunction implements RequestStreamHandler {

    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();

    @Override
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
        JsonParser parser = OBJECT_MAPPER.createParser(inputStream);

        int count = 0;
        boolean inData = false;
        boolean inDataArray = false;
        JsonToken token;
        while((token = parser.nextToken()) != null) {
            if ("FIELD_NAME".equals(token.name()) && "data".equals(parser.getCurrentName())) {
                inData = true;
            }
            if (inData && "START_ARRAY".equals(token.name())) {
                inDataArray = true;
            }
            if (inDataArray && "END_ARRAY".equals(token.name())) {
                inDataArray = false;
            }
            if (inData && "END_OBJECT".equals(token.name())) {
                inData = false;
            }
            if (inDataArray && "START_OBJECT".equals(token.name())) {
                count++;
            }
        }

        Map<String, Integer> response = Collections.singletonMap("count", count);
        OBJECT_MAPPER.writeValue(outputStream, response);
    }
}

Execution result (≠ Cold Start)

REPORT RequestId: 41f93e2e-e1db-4775-b296-8b280d2696f9	Duration: 871.01 ms	Billed Duration: 900 ms	Memory Size: 128 MB	Max Memory Used: 80 MB

Memory improved to 118 MB-> 80 MB. The processing time has also been improved to 2030 ms-> 871 ms.

Summary

If the JSON parsing process is tedious but doesn't require all of the JSON data, the Stream method may save processing time and memory.

Recommended Posts

Working with huge JSON in Java Lambda
Read JSON in Java
POST JSON in Java
Create JSON in Java
Practice working with Unicode surrogate pairs in Java
Use Lambda Layers with Java
[Java] JSON communication with jackson
I dealt with Azure Functions not working in Java
POST Json in Java ~ HttpURLConnection ~
Json serialization / deserialization in Java 1.4
Log aggregation and analysis (working with AWS Athena in Java)
Create a SlackBot with AWS lambda & API Gateway in Java
Java lambda expressions learned with Comparator
Play with Markdown in Java flexmark-java
Implement API Gateway Lambda Authorizer in Java Lambda
Concurrency Method in Java with basic example
[Java] for Each and sorted in Lambda
Try using JSON format API in Java
Read xlsx file in Java with Selenium
Split a string with ". (Dot)" in Java
I wrote a Lambda function in Java and deployed it with SAM
I want to ForEach an array with a Lambda expression in Java
Interact with LINE Message API using Lambda (Java)
Read a string in a PDF file with Java
Create a CSR with extended information in Java
Refactored GUI tools made with Java8 + JavaFX in 2016
Partization in Java
Static code analysis with Checkstyle in Java + Gradle
Code to escape a JSON string in Java
Changes in Java 11
How to use Java framework with AWS Lambda! ??
Solution for NetBeans 8.2 not working in Java 9 environment
JSON with Java and Jackson Part 2 XSS measures
Rock-paper-scissors in Java
Text extraction in Java from PDF with pdfbox-2.0.8
How to use Java API with lambda expression
Diffed with JSON
Hello Java Lambda
[Java] Lambda expression
[JAVA] [Spring] [MyBatis] Use IN () with SQL Builder
Pi in Java
Encrypt / decrypt with AES256 in PHP and Java
Java lambda expression
FizzBuzz in Java
Handle JSON in cross domain with Play Framework
Programming with direct sum types in Java (Neta)
Get along with Java containers in Cloud Run
Code to use when you want to process Json with only standard library in Java
Convert JSON and YAML in Java (using Jackson and SnakeYAML)
How to deploy Java to AWS Lambda with Serverless Framework
Include image in jar file with java static method
Notice multi thread problem when working with Java Servlet
Quickly implement a singleton with an enum in Java
Output true with if (a == 1 && a == 2 && a == 3) in Java (Invisible Identifier)
Getting Started with Legacy Java Engineers (Stream + Lambda Expression)
Check coverage with Codecov in Java + Gradle + Wercker configuration
[java] sort in list
Install java with Homebrew
Interpreter implementation in Java
Make Blackjack in Java
Rock-paper-scissors app in Java