I wrote an article that makes an InputStream an input to Lambda.

I tried Java Lambda input / output type ~ Stream version ~ https://qiita.com/kazfuku/items/6f0f55ffa3a88d76cfaa

The advantage of using InputStream is that it can handle huge JSON. So, I tried to verify what it actually was.

test data

First, I prepared a huge JSON. The data contains 6000 objects with name and text, which is about 5.9MB, which is close to Lambda's input size limit of 6MB.

{
  "data":[
    {
      "name":"vrZPwIw3T7","text":"Ku7aQqW3WzUeiRdXnNB26iVElWdOUj8mQhvHksvN1sMmQ2fT3M8navvbTJuspda2q0bY3FWvsDoguE33tTNtoxuiHjdkUIHmylIezYGitmhJ2bbgcHhcHPzGr4eg3Ger9EijFU82Sq4WS9G5UVW62Cw1rDMNdIld2yxn1Zd3DXqE26iOf1IaBQTzEG7Pld03hkXIkAdTdeAjXlAJlGrwnQgjMh1FohW1bUAYeaLi52qLnbgQd7lZAJuOlitfGUyUbP0BjbsPflOLGQwInPjr2Mt3mG4HDokWj2JJgRkXkRYxq34AxQGNWXjlfWKViDxk2InIP6oMsir5YmTL1oO58dzmBGCoYV7e0PTGQHXJbgPJUFUoCmv3mATCEg1xhOa4IUcP7vC7dMvydS3Qt1QHteYajeCvXiW0HjuHkm2oJ61yEg6JocqLVMQ75RaU0Wjb2KvbAwQmggSel5E6mMl0BacZwBXw7OaYHkHO1p1hQup2hhNkaAkN7B8NS8QJ3oSRPQsM6QsETC3x1ErrN0jZZVqupjDvPEr9xj0fDOpqCo7XqTuSPbf3UhHQgjPyikbc2JaqeMdJf1R0RojlqWmf2STGH8HTuGJTQG3vEP04BkrNLaKNVoXE49tPyePO6EqRAKWNxVZoQmw34Xv6yGzMfOLPcSRhML0rYk1FEaBDmgGNpQIPdYjbT3MC08eEY9cHa813iWvm42XmG5LaiIt2z4IcGaWnLwCRytYJJsdqphSEhyvyOpIKM4i02t9rx3Pkt0704EhFo3SD8gVVIE2y1coFUJqy2GxVqptZrKpFv56c4SsWSPqdLqTH9Gh09Y5Cph6eOKg0JXvip1GoONZ80oBUeRudMvsl32m23fYZlG1dNFnGUZSkz2TiGP9baIfLyPWCcPZGEvVaP9FR64FW0yOLvpKyTNYXg21ZsgEkYo3tbcn3AS5R3Ai4eg5hYMaBoVAsMBK1BPZAncDoqOs97nLa2DZFyqgNSz8Asgmh"
    },
    {
      "name":"OXJIXTP9dz","text":"HkWP0PumYHQZxiGNhGWASXOPrygri7cKXs2hrWx0WaumM8OEVc9UKs2EIknzCsBmAMFRER5YNIUs5oz30LjDrjn1PsoKuh60KPOEaHnBNTt8PivYx0hIfmLoLk56ad6LSLNpUVMCP26WiPyont6OjfD1c4sxtKn3qlg6SaNSs2B1tGoReVb3pwOVvPH2BaULL5rzYyDFfAqFqo4D2UevrdoUOeXK3Ks3tav92wHnECM9pbXdsCbWyr1BNulJ5elcZm8HALPgBeX9dg0vaqpgITfz3klyYIWynzPOJC92t0vMao2tL7lr6uxuQvldWgdhlzGjYP123pdWb2h0zItg8NyyK58tCKy2t2YqtdP23fvmVpmOFygiM6kF9LvDRfnu3mz0X2SvcsQh8UqB84dHiOXwicmnI6DX47OuPXOUZc0wICql8zit6WvbEmDchKy9M74u9mPaiIxGXBy8FvLEptqqGytywwC3GGYXEpLYZlbxDycrSTtCq6PUuWoUbfsJmZT4iZSvM0aoyVKBE2l23oXhFZpM4fxyyziIVAHP9YsQbHQlvr8adtD3voumsGKcklt4mnNQclQdSLKPKSIGdUlkvhcCO4MZcEpKcmSrFU6naOYGL1geB1CuTYHYuw0x6tc7JudQAEB6IWE8xwTgPWQUM15xTsqsLrBIwZ70MGpCGW8JCw6sqJExsXi6wpJ1I3L43TUG4hJOnEPIHeXTco06zaDiSrqG3LsLuCiHIkqYui1N0fJBRJhVcn2X8dXMnQKxqhISGrnP7TeBBcAhI8qrmNK0k9EV6mECQtN2g8qaRYVqwOqC4kwzMpvPWkUnNQuUZbknLlWOKuVeh0mrjTzIxQkMShqhdt21o75h9rz0DPxvNHkS6jLw7TBprYieZwcO8iIRy1zYFedSXyVktczdEczIebkfDFmjGtDeZw5RuuFUYKDk4U3J5lpmfmf9K3G6LuPeV5soPxL54l8ZxlJGNpP1kZftZeadtms7"
    },
...

Try with Map method

First, this is an implementation of the Map method, and the number of data is counted.

What is the Map method? -> I tried the input / output type of Java Lambda ~ Map edition ~

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.util.Collections;
import java.util.List;
import java.util.Map;

public class MapHugeJsonFunction implements RequestHandler<Map<String, Object>, Map<String, Object>> {

    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();

    @Override
    public Map<String, Object> handleRequest(Map<String, Object> event, Context context) {
        List<Map<String, Object>> data = (List<Map<String, Object>>) event.get("data");
        int count = data.size();

        return Collections.singletonMap("count", count);
    }
}

Execution result (≠ Cold Start)

REPORT RequestId: ac45a98c-be93-49b5-8813-c30ae7d731c9	Duration: 2030.51 ms	Billed Duration: 2100 ms	Memory Size: 128 MB	Max Memory Used: 118 MB

We are using 118MB, which is close to the maximum memory. This is the state where all the JSON is in memory when the Map to be passed to the handlerRequest is built. If you add more business logic to this, 128MB will not be enough memory.

Try with Stream method

I tried the same process with the Stream method. With this implementation, you can process all the JSON without putting it in memory.

The JSON Streaming API uses the Jackson Streaming API (https://github.com/FasterXML/jackson-databind#5-minute-tutorial-streaming-parser-generator). In XML, it's the same idea as using SAX Parser.

What is the Stream method? -> I tried Java Lambda input / output type ~ Stream version ~

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Collections;
import java.util.Map;

public class StreamHugeJsonFunction implements RequestStreamHandler {

    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();

    @Override
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
        JsonParser parser = OBJECT_MAPPER.createParser(inputStream);

        int count = 0;
        boolean inData = false;
        boolean inDataArray = false;
        JsonToken token;
        while((token = parser.nextToken()) != null) {
            if ("FIELD_NAME".equals(token.name()) && "data".equals(parser.getCurrentName())) {
                inData = true;
            }
            if (inData && "START_ARRAY".equals(token.name())) {
                inDataArray = true;
            }
            if (inDataArray && "END_ARRAY".equals(token.name())) {
                inDataArray = false;
            }
            if (inData && "END_OBJECT".equals(token.name())) {
                inData = false;
            }
            if (inDataArray && "START_OBJECT".equals(token.name())) {
                count++;
            }
        }

        Map<String, Integer> response = Collections.singletonMap("count", count);
        OBJECT_MAPPER.writeValue(outputStream, response);
    }
}

Execution result (≠ Cold Start)

REPORT RequestId: 41f93e2e-e1db-4775-b296-8b280d2696f9	Duration: 871.01 ms	Billed Duration: 900 ms	Memory Size: 128 MB	Max Memory Used: 80 MB

Memory improved to 118 MB-> 80 MB. The processing time has also been improved to 2030 ms-> 871 ms.

Summary

If the JSON parsing process is tedious but doesn't require all of the JSON data, the Stream method may save processing time and memory.

Working with huge JSON in Java Lambda

test data

Try with Map method

Try with Stream method

Summary