[Java] Data processing using stream API from Java 8

6 minute read

In this blog post, I’ll show you how to process data declaratively from Java 8 using the Stream API.

*This blog is a translation from the English version. You can check the original from here. We use some machine translation. If you find any mistakes in the translation, we would appreciate it if you could point it out. *

In Java, collections and arrays are two common data structures that perform many operations on a regular basis: add, delete, modify, query, aggregate, statistics, filter, and so on. These operations also exist in relational databases. But before Java 8, it’s not very convenient to work with collections and arrays.

This problem has been significantly mitigated in Java 8 by introducing a new abstraction called the Stream API, which allows you to work with data in a declarative way. This article will show you how to use Stream. Please note that stream performance and principles are not the focus of this article.

Stream introduction

Streams provide a high-level abstraction of Java collection operations and expressions by querying data from the database, similar to SQL statements.

Stream APIs greatly increase the productivity of Java programmers and allow them to write effective, clean and concise code.

The set of elements to process is considered a stream transmitted in the pipeline. These elements can be processed by pipeline nodes such as filters, sorts, and aggregates.

Java stream features and benefits

  • No storage. Streams are not data structures, they are just views of the data source.
  • -Streams are functional in nature. The data source does not change when you make changes to the stream. For example, filtering a stream does not remove the filtered elements, but produces a new stream that does not contain the filtered elements.
  • -Lazy evaluation. Operations on the stream are not performed immediately. Only executed when the user really needs the result.
  • Consumable. The elements of the stream are visited only once during the life of the stream. Once traversed, the stream is invalid, like a container iterator. If you want to traverse the Stream again, you need to regenerate the new Stream. Let’s use an example to see what a Stream can do.

image.png

In the previous example, we took some plastic balls as a data source, filtered the red balls, melted them and turned them into random triangles. Another filter removes small triangles. Reducer sums the circumference.

As shown in the previous figure, a stream contains three important operations: stream creation, intermediate operations, and terminal operations.

Stream creation

In Java 8, you can use many methods to create a Stream.

1. Create a stream using an existing collection

In addition to many stream-related classes, Java 8 also enhances the collection classes themselves. The Java 8 Stream method can turn a collection into a Stream.

List<String> strings = Arrays.asList("Hollis", "HollisChuang", "hollis", "Hello", "HelloWorld", "Hollis");
Stream<String> stream = strings.stream();

In the example above, we are creating a stream from an existing list. Also, the parallelStream method can create a parallel stream for a collection.

You also often create Streams from collections.

2. Create a stream using a stream method

The of method provided by Stream can be used to directly return a Stream consisting of the specified elements.

Stream<String> stream = Stream.of("Hollis", "HollisChuang", "hollis", "Hello", "HelloWorld", "Hollis");

The code above uses the of method to create a stream and return it.

Stream intermediate operations

Streams have many intermediate operations that can be combined to form a pipeline. Each intermediate operation is like a worker on the pipeline. Each worker can process a Stream. Intermediate operations return a new Stream.

image.png

Below is a list of common intermediate operations.

image.png

** filter ** The filter method is used to filter the elements by the specified criteria. The following code snippet uses the filter method to filter empty strings.

List<String> strings = Arrays.asList("Hollis", "", "HollisChuang", "H", "hollis");
strings.stream().filter(string -> !string.isEmpty()).forEach(System.out::println);
//Hollis,, HollisChuang, H, hollis

map The map method maps each element to the corresponding result. The following code snippet uses the map method to generate the squared number of the corresponding elements.

List<Integer> numbers = Arrays.asList(3, 2, 2, 3, 7, 3, 5);
numbers.stream().map( i -> i*i).forEach(System.out::println);
//9,4,4,9,49,9,25

limit/skip Limit returns the first N elements of Stream. Skip discards the first N elements of Stream. The following code snippet uses the limit method to hold the first four elements.

List<Integer> numbers = Arrays.asList(3, 2, 2, 3, 7, 3, 5);
numbers.stream().limit(4).forEach(System.out::println);
// / 3,2,2,3

** sorted ** The sorted method sorts the elements of Stream. The following code snippet uses the sorted method to sort the elements of a Stream.

List<Integer> numbers = Arrays.asList(3, 2, 2, 3, 7, 3, 5);
numbers.stream().sorted().forEach(System.out::println);
//2,2,3,3,3,5,7

** distinct ** To remove the duplicate, use the distinct method. The following code snippet uses the distinct method to deduplicate elements.

List <Integer> numbers = Arrays.asList (3, 2, 2, 3, 7, 3, 5);
numbers.stream().distinct().forEach(System.out::println);
//3,2,7,5

Next, I will explain using an example and a figure what happens to the Stream after the operations of filter, map, sort, limit, and distinct.

The code is shown below.

List<String> strings = Arrays.asList("Hollis", "HollisChuang", "hollis", "Hello", "HelloWorld", "Hollis");
Stream s = strings.stream().filter(string -> string.length()<= 6).map(String::length).sorted().limit(3)
            .distinct ();

The following figure shows each step and the result.

image.png

Stream terminal business

Terminal operations on streams also return Streams. How can I convert the stream to the desired type? For example, count the elements in a Stream and convert that Stream into a collection. To do this, you need terminal operation.

The terminal operation consumes the Stream and produces the final result. That is, after a terminal operation has been performed on a stream, that stream cannot be reused and no intermediate operations are allowed on that stream. Otherwise an exception will be thrown.

java.lang.IllegalStateException: stream has already been operated upon or closed

This is the same as the saying, “You cannot step on the same river twice.”

The following table shows common terminal operations.

image.png

forEach The forEach method iterates over the elements in the stream. The following code snippet uses forEach to return 10 random numbers.

Random random = new Random();random.ints().limit(10).forEach(System.out::println);

count The count method counts the elements in the Stream.

List<String> strings = Arrays.asList("Hollis", "HollisChuang", "hollis", "Hollis666", "Hello", "HelloWorld", "Hollis");
System.out.println(strings.stream().count());
//7

collect The collect operation is a reduce operation that can accept various parameters and accumulate stream elements in the summary result.

List<String> strings = Arrays.asList("Hollis", "HollisChuang", "hollis", "Hollis666", "Hello", "HelloWorld", "Hollis");
strings = strings.stream().filter(string -> string.startsWith("Hollis")).collect(Collectors.toList());
System.out.println(strings);
//Hollis, HollisChuang, Hollis666, Hollis

Next, we will continue to use the diagrams showing filters, maps, sorts, limits, and separate operations performed to show the results of different terminal operations on the given Stream in the previous example.

The following figure uses an example to show the inputs and outputs of all the operations described in this article.

image.png

Overview

This article explains how to use streams and their characteristics in Java 8. This article also describes stream creation, stream intermediate operations, and terminal operations.

There are two ways to create a stream: using the collection’s stream method and using the stream’s of method.

Intermediate operations on the stream can process the stream. Both the input and output of the intermediate operation are Streams. Intermediate operations include filters, maps, and sorts.

Stream intermediate operations can transform a stream into some other container, such as counting the elements in the stream, converting the stream into a collection, or iterating over the elements in the stream.

Alibaba Cloud is the No. 1 (2019 Gartner) cloud infrastructure operator in the Asia-Pacific region with two data centers in Japan and more than 60 availability zones in the world. Click here for more information on Alibaba Cloud. Alibaba Cloud Japan Official Page