What are Java 8 Streams

Streams are a sequence of elements from a source like collections which supports aggregate operations. They have been introduced in Java 8 and they are providing a functional programming interface, which follows the Monad design pattern. Basically, it gets rid of the iteration boilerplate code and creates a chain of query methods to get the desired set of elements. The Streams API provides also parallel processing of collection elements. This increase the performance for large collections.

Streams

Let’s start with an example. For instance, I have a given person list. The result needs to be a string list with the first names of persons, who are older than 17 years in alphabetical order. If I implement it with foreach loop it looks like the following snippet:

List<String> nameList = new ArrayList<>();
for (Person person : PERSON_LIST) {
    if (person.getAge() >= 18) {
        nameList.add(person.getFirstName();
    }
}

Collections.sort(nameList, new Comparator<String>() {

    @Override
    public int compare(String o1, String o2) {
        return o1.compareTo(o2);
    }
});

Now, let’s implement the same behavior with the streams API:

List<String> list = PERSON_LIST.stream()
                    .filter(person -> person.getAge() >= 18)
                    .map(person -> person.getFirstName())
                    .sorted(String::compareTo)
                    .collect(Collectors.toList()));

As we can see, the amount of code is reduced to 5 lines and no iteration boilerplate code is needed, because it’s done internally. Also, Lambda expressions and method references are used to reduce the code.

A stream consumes data from collections, arrays and I/O resources. A stream also doesn’t change the source and it returns a new object. After the stream is obtained, plenty of different operations are available. These operations are separated into two kinds of operations:

  • Intermediate operations
    • They are returning a stream object
    • Can be composed to a stream pipeline like in our example the filter, map and sorted operation
    • They won’t be executed until a terminal operation is called
  • Terminal operations
    • They don’t return a stream object. See in our example the collect operation
    • They are triggering the intermediate operations and produces a result

The intermediate operations are put into a pipeline and executed when the terminal operation is called. In our example, we obtain the stream from the PERSON_LIST collection. The filter operation checks if the person is old enough. If not the element won’t be processed further and the next element will be checked. Otherwise, the person object goes to the map operation and the first name will be mapped to a new String object. The last intermediate operation sorts the first names in alphabetical order. The collect operation triggers the iteration process and runs through the elements. At this point the intermediate operations take action and the matching elements are put into the list in the proper order.

It’s important to take care of the order of the intermediate operations. For instance, it cost more time to sort the list first and do the filtering afterwards. To get a full list of operations check the Stream JavaDoc.

Parallel Streams

Before we are starting with parallel streams, we need to dig deeper into the intermediate operations. As mentioned already streams are using Lambda expressions and method references. The argument for intermediate operations needs to follow the following rules:

  • non-interference
    • The data source isn’t modified at all during the execution process (except concurrent collections).
    • This rule applies on sequential streams, too.
  • stateless
    • Intermediate operations are further divided into stateful and stateless operations. A stateless operation is independent of the previously processed element like the filter or map method. In contrast, a stateful operation is dependent on the previous element. In our example, the sort operation needs to run the entire collection and need to know the name of the previous element to create a correct result.
  • associative
    • The function must follow the association property:
      (x op y) op z == x op (y op z)

These rules have an impact on the performance of parallel streams. If you violate them the processing of the elements in a collection will decrease significantly.

So in our example code, the sort method should be called outside the stream. During the test, it was slower to run it the sort method outside of the stream pipeline. So I will leave it inside for this case. This could of course look different for other queries.

// changed call from stream() to parallelStream()
List<String> list = PERSON_LIST.parallelStream()
                    .filter(person -> person.getAge() >= 18)
                    .map(person -> person.getFirstName())
                    .sorted(String::compareTo)
                    .collect(Collectors.toList()));

Conclusion

This article is a short round-trip into the streams API and gives an overview of the underlying technique. I hope it helps you to understand streams and if you want take a look into the source code it’s hosted on GitHub. Check also the following links to learn more about streams and leave a comment:

chevron_left
chevron_right

Leave a comment

Your email address will not be published. Required fields are marked *

Comment
Name
Email
Website

This site uses Akismet to reduce spam. Learn how your comment data is processed.