I wrote an article that makes an InputStream an input to Lambda.
I tried Java Lambda input / output type ~ Stream version ~ https://qiita.com/kazfuku/items/6f0f55ffa3a88d76cfaa
The advantage of using InputStream is that it can handle huge JSON. So, I tried to verify what it actually was.
First, I prepared a huge JSON. The data contains 6000 objects with name and text, which is about 5.9MB, which is close to Lambda's input size limit of 6MB.
{
"data":[
{
"name":"vrZPwIw3T7","text":"Ku7aQqW3WzUeiRdXnNB26iVElWdOUj8mQhvHksvN1sMmQ2fT3M8navvbTJuspda2q0bY3FWvsDoguE33tTNtoxuiHjdkUIHmylIezYGitmhJ2bbgcHhcHPzGr4eg3Ger9EijFU82Sq4WS9G5UVW62Cw1rDMNdIld2yxn1Zd3DXqE26iOf1IaBQTzEG7Pld03hkXIkAdTdeAjXlAJlGrwnQgjMh1FohW1bUAYeaLi52qLnbgQd7lZAJuOlitfGUyUbP0BjbsPflOLGQwInPjr2Mt3mG4HDokWj2JJgRkXkRYxq34AxQGNWXjlfWKViDxk2InIP6oMsir5YmTL1oO58dzmBGCoYV7e0PTGQHXJbgPJUFUoCmv3mATCEg1xhOa4IUcP7vC7dMvydS3Qt1QHteYajeCvXiW0HjuHkm2oJ61yEg6JocqLVMQ75RaU0Wjb2KvbAwQmggSel5E6mMl0BacZwBXw7OaYHkHO1p1hQup2hhNkaAkN7B8NS8QJ3oSRPQsM6QsETC3x1ErrN0jZZVqupjDvPEr9xj0fDOpqCo7XqTuSPbf3UhHQgjPyikbc2JaqeMdJf1R0RojlqWmf2STGH8HTuGJTQG3vEP04BkrNLaKNVoXE49tPyePO6EqRAKWNxVZoQmw34Xv6yGzMfOLPcSRhML0rYk1FEaBDmgGNpQIPdYjbT3MC08eEY9cHa813iWvm42XmG5LaiIt2z4IcGaWnLwCRytYJJsdqphSEhyvyOpIKM4i02t9rx3Pkt0704EhFo3SD8gVVIE2y1coFUJqy2GxVqptZrKpFv56c4SsWSPqdLqTH9Gh09Y5Cph6eOKg0JXvip1GoONZ80oBUeRudMvsl32m23fYZlG1dNFnGUZSkz2TiGP9baIfLyPWCcPZGEvVaP9FR64FW0yOLvpKyTNYXg21ZsgEkYo3tbcn3AS5R3Ai4eg5hYMaBoVAsMBK1BPZAncDoqOs97nLa2DZFyqgNSz8Asgmh"
},
{
"name":"OXJIXTP9dz","text":"HkWP0PumYHQZxiGNhGWASXOPrygri7cKXs2hrWx0WaumM8OEVc9UKs2EIknzCsBmAMFRER5YNIUs5oz30LjDrjn1PsoKuh60KPOEaHnBNTt8PivYx0hIfmLoLk56ad6LSLNpUVMCP26WiPyont6OjfD1c4sxtKn3qlg6SaNSs2B1tGoReVb3pwOVvPH2BaULL5rzYyDFfAqFqo4D2UevrdoUOeXK3Ks3tav92wHnECM9pbXdsCbWyr1BNulJ5elcZm8HALPgBeX9dg0vaqpgITfz3klyYIWynzPOJC92t0vMao2tL7lr6uxuQvldWgdhlzGjYP123pdWb2h0zItg8NyyK58tCKy2t2YqtdP23fvmVpmOFygiM6kF9LvDRfnu3mz0X2SvcsQh8UqB84dHiOXwicmnI6DX47OuPXOUZc0wICql8zit6WvbEmDchKy9M74u9mPaiIxGXBy8FvLEptqqGytywwC3GGYXEpLYZlbxDycrSTtCq6PUuWoUbfsJmZT4iZSvM0aoyVKBE2l23oXhFZpM4fxyyziIVAHP9YsQbHQlvr8adtD3voumsGKcklt4mnNQclQdSLKPKSIGdUlkvhcCO4MZcEpKcmSrFU6naOYGL1geB1CuTYHYuw0x6tc7JudQAEB6IWE8xwTgPWQUM15xTsqsLrBIwZ70MGpCGW8JCw6sqJExsXi6wpJ1I3L43TUG4hJOnEPIHeXTco06zaDiSrqG3LsLuCiHIkqYui1N0fJBRJhVcn2X8dXMnQKxqhISGrnP7TeBBcAhI8qrmNK0k9EV6mECQtN2g8qaRYVqwOqC4kwzMpvPWkUnNQuUZbknLlWOKuVeh0mrjTzIxQkMShqhdt21o75h9rz0DPxvNHkS6jLw7TBprYieZwcO8iIRy1zYFedSXyVktczdEczIebkfDFmjGtDeZw5RuuFUYKDk4U3J5lpmfmf9K3G6LuPeV5soPxL54l8ZxlJGNpP1kZftZeadtms7"
},
...
First, this is an implementation of the Map method, and the number of data is counted.
What is the Map method? -> I tried the input / output type of Java Lambda ~ Map edition ~
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.Collections;
import java.util.List;
import java.util.Map;
public class MapHugeJsonFunction implements RequestHandler<Map<String, Object>, Map<String, Object>> {
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
@Override
public Map<String, Object> handleRequest(Map<String, Object> event, Context context) {
List<Map<String, Object>> data = (List<Map<String, Object>>) event.get("data");
int count = data.size();
return Collections.singletonMap("count", count);
}
}
Execution result (≠ Cold Start)
REPORT RequestId: ac45a98c-be93-49b5-8813-c30ae7d731c9 Duration: 2030.51 ms Billed Duration: 2100 ms Memory Size: 128 MB Max Memory Used: 118 MB
We are using 118MB, which is close to the maximum memory. This is the state where all the JSON is in memory when the Map to be passed to the handlerRequest is built. If you add more business logic to this, 128MB will not be enough memory.
I tried the same process with the Stream method. With this implementation, you can process all the JSON without putting it in memory.
The JSON Streaming API uses the Jackson Streaming API (https://github.com/FasterXML/jackson-databind#5-minute-tutorial-streaming-parser-generator). In XML, it's the same idea as using SAX Parser.
What is the Stream method? -> I tried Java Lambda input / output type ~ Stream version ~
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Collections;
import java.util.Map;
public class StreamHugeJsonFunction implements RequestStreamHandler {
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
@Override
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
JsonParser parser = OBJECT_MAPPER.createParser(inputStream);
int count = 0;
boolean inData = false;
boolean inDataArray = false;
JsonToken token;
while((token = parser.nextToken()) != null) {
if ("FIELD_NAME".equals(token.name()) && "data".equals(parser.getCurrentName())) {
inData = true;
}
if (inData && "START_ARRAY".equals(token.name())) {
inDataArray = true;
}
if (inDataArray && "END_ARRAY".equals(token.name())) {
inDataArray = false;
}
if (inData && "END_OBJECT".equals(token.name())) {
inData = false;
}
if (inDataArray && "START_OBJECT".equals(token.name())) {
count++;
}
}
Map<String, Integer> response = Collections.singletonMap("count", count);
OBJECT_MAPPER.writeValue(outputStream, response);
}
}
Execution result (≠ Cold Start)
REPORT RequestId: 41f93e2e-e1db-4775-b296-8b280d2696f9 Duration: 871.01 ms Billed Duration: 900 ms Memory Size: 128 MB Max Memory Used: 80 MB
Memory improved to 118 MB-> 80 MB. The processing time has also been improved to 2030 ms-> 871 ms.
If the JSON parsing process is tedious but doesn't require all of the JSON data, the Stream method may save processing time and memory.
Recommended Posts