Java8 practice notes (04): introducing streams

Almost every Java application makes and processes collections.

Although collections are indispensable to almost any Java application, collection operations are far from perfect.

  • Many business logic involve operations similar to database, such as grouping several dishes according to categories (such as all vegetarian dishes), or finding the most expensive dishes. How many times have you re implemented these operations with iterators yourself? Most databases allow you to specify these operations declaratively. For example, the following SQL query statement can select the name of dishes with low calories: select name from dishes where calorie < 400. You see, you don't need to implement how to filter according to the attributes of dishes (such as using iterators and accumulators), you just need to express what you want. This basic idea means that you don't have to worry about how to explicitly implement these queries - it's all done for you! Why can't we do this when we get here?

  • What if you have to deal with a large number of elements? To improve performance, you need parallel processing and use a multi-core architecture. But writing parallel code is more complex than using iterators, and debugging is enough!

What is flow

Stream is a new member of Java API, which allows you to process data sets declaratively (expressed by query statements rather than writing an implementation temporarily). For now, you can think of them as high-level iterators that traverse data sets.

In addition, streams can be processed transparently in parallel, and you don't need to write any multithreaded code!

StreamBasic

Before Java 8:

List<Dish> lowCaloricDishes = new ArrayList<>();
for(Dish d: menu){
	if(d.getCalories() < 400){
		lowCaloricDishes.add(d);
	}
}

Collections.sort(lowCaloricDishes, new Comparator<Dish>() {
	public int compare(Dish d1, Dish d2){
		return Integer.compare(d1.getCalories(), d2.getCalories());
	}
});

List<String> lowCaloricDishesName = new ArrayList<>();

for(Dish d: lowCaloricDishes){
	lowCaloricDishesName.add(d.getName());
}

After Java 8:

import static java.util.Comparator.comparing;
import static java.util.stream.Collectors.toList;
	List<String> lowCaloricDishesName = menu.stream()
				.filter(d -> d.getCalories() < 400)
				.sorted(comparing(Dish::getCalories))
				.map(Dish::getName)
				.collect(toList());

In order to execute this code in parallel using multi-core architecture, you only need to replace stream() with parallelStream():

List<String> lowCaloricDishesName = menu.parallelStream()
				.filter(d -> d.getCalories() < 400)
				.sorted(comparing(Dishes::getCalories))
				.map(Dish::getName)
				.collect(toList());

The new approach has several obvious benefits.

  • The code is written declaratively: it describes what you want to accomplish (filtering low calorie dishes) rather than how to implement an operation (using control flow statements such as loops and if conditions). This approach, coupled with behavioral parameterization, makes it easy for you to cope with changing needs: you can easily create another version of the code and use Lambda expressions to filter high calorie dishes without having to copy and paste the code.

  • You can link several basic operations to express the complex data processing pipeline (sort, map and collect operations are followed by filter, as shown in the figure below), while keeping the code clear and readable. The result of the filter is passed to the sorted method, then to the map method, and finally to the collect method.

Because the operations such as filter, sorted, map and collect are high-level components independent of the specific thread model, their internal implementation can be single threaded or make full use of your multi-core architecture transparently! In practice, this means that you don't have to worry about threads and locks in order to make some data processing tasks parallel. The Stream API has done it for you!

Map<Dish.Type, List<Dish>> dishesByType =
	menu.stream().collect(groupingBy(Dish::getType));

Group dishes according to the categories in the Map

{FISH=[prawns, salmon],
OTHER=[french fries, rice, season fruit, pizza],
MEAT=[pork, beef, chicken]}

Other libraries: Guava, Apache, and lambdaj

In order to provide Java programmers with a better set of library operations, predecessors have made many attempts.

To sum up, the Stream API in Java 8 allows you to write such code:

  • Declarative - more concise and easier to read
  • Composable - more flexible
  • Parallelizable - better performance

Data preparation Dish

Stream introduction

Stream definition -- a sequence of elements generated from a source that supports data processing operations

  • Element sequences -- like collections, streams also provide an interface to access an ordered set of values for a particular element type. Because a collection is a data structure, its main purpose is to store and access elements (such as ArrayList and LinkedList) with specific time / space complexity. But the purpose of flow is to express computation, such as filter, sorted and map you saw earlier. Set is about data, and flow is about calculation.

  • Source - a stream uses a source that provides data, such as a collection, array, or input / output resource. Note that the original order is preserved when generating streams from ordered collections. The flow generated by the list has the same element order as the list.

  • Data processing operation - the data processing function of stream supports operations similar to database and common operations in functional programming languages, such as filter, map, reduce, find, match, sort, etc. Stream operations can be executed sequentially or in parallel.

In addition, flow operation has two important characteristics.

  • Pipeline - many stream operations themselves return a stream, so that multiple operations can be linked to form a large pipeline. This makes possible some optimizations in the next chapter, such as delay and short circuit. The operation of pipeline can be regarded as database query of data source.
  • Internal iteration -- unlike a collection of explicit iterations using iterators, the iterative operation of the flow is performed behind the scenes.
import static java.util.stream.Collectors.toList;

List<String> threeHighCaloricDishNames = menu.stream()
				.filter(d -> d.getCalories() > 300)
				.map(Dish::getName)
				.limit(3)
				.collect(toList());

System.out.println(threeHighCaloricDishNames);

flow chart

Flow and set

It can only be traversed once

Note that, like iterators, a stream can only be traversed once. After traversing, we say that the stream has been consumed. You can get a new stream from the original data source and go through it again

List<String> title = Arrays.asList("Java8", "In", "Action");
Stream<String> s = title.stream();
s.forEach(System.out::println);
s.forEach(System.out::println);//java.lang.IllegalStateException: the stream has been manipulated or closed

External iteration and internal iteration

External iteration

List<String> names = new ArrayList<>();

for(Dish d: menu){
	names.add(d.getName());
}

//Use iterator mode
List<String> names = new ArrayList<>();
Iterator<String> iterator = menu.iterator();

while(iterator.hasNext()) {
	Dish d = iterator.next();
	names.add(d.getName());
}

Internal iteration

List<String> names = menu.stream().map(Dish::getName).collect(toList());

During internal iterations, projects can be processed transparently in parallel or in a more optimized order.

These optimizations are difficult if you use the external iterative method of Java in the past. This seems like a bit of a pick in the egg, but this is almost the reason why Java 8 introduces Streams - the internal iteration of Streams library can automatically select a data representation and parallel implementation suitable for your hardware.

The external iteration is a collection, and each item is explicitly taken out and processed.

During internal iterations, projects can be processed transparently in parallel or in a more optimized sequence Process in sequence.

Flow makes use of internal iteration: the iteration is done for you. However, this is useful only if you have predefined a list of operations that can hide iterations, such as filter or map.

Stream operation

List<String> names = menu.stream()
	.filter(d -> d.getCalories() > 300)
	.map(Dish::getName)
	.limit(3)
	.collect(Collectors.toList())
  • The intermediate operations filter, map and limit can be connected into a pipeline;
  • The terminal operation collect triggers the pipeline to execute and close it.

Intermediate operation

In order to find out what happened in the assembly line

List<String> names =
menu.stream()
	.filter(d -> {
		System.out.println("filtering" + d.getName());
		return d.getCalories() > 300;
	})
	.map(d -> {
		System.out.println("mapping" + d.getName());
		return d.getName();
	})
	.limit(3)
	.collect(toList());
System.out.println(names);

When this code executes, it prints:

filtering pork
mapping pork
filtering beef
mapping beef
filtering chicken
mapping chicken
[pork, beef, chicken]

It will be found that several optimizations take advantage of the delay nature of the flow.

  1. Although many dishes have more than 300 calories, only the first three are selected! This is because of the limit operation and a technique called short circuit.

  2. Although filter and map are two independent operations, they are combined into the same traversal.

Terminal operation

The terminal operation generates results from the pipeline of the stream. The result is any value that is not a stream, such as List, Integer, or even void.

menu.stream().forEach(System.out::println);

Use stream

In short, the use of streams generally includes three things:

  • A data source (such as a collection) to execute a query;
  • An intermediate operation chain to form a flow pipeline;
  • A terminal operates, executes the pipeline, and can generate results.

The idea behind the flow pipeline is similar to the Builder pattern Builder.

Intermediate operation

operation Return type Operating parameters Function Descriptor
map Stream<R> Function<T, R> T -> R
filter Stream<T> Predicate<T> T -> boolean
limit Stream<T> - -
sorted Stream<T> Comparator<T> (T, T) -> int
distinct Stream<T> - -

Terminal operation

operation objective
forEach Consume each element in the flow and apply Lambda to it. This operation returns void
count Returns the number of elements in the stream. This operation returns long
collect Reduce the flow into a set, such as List, Map or even Integer

Summary

  • A stream is "a series of elements generated from a source that supports data processing operations".
  • Flow uses internal iteration: iteration is abstracted through filter, map, sorted and other operations.
  • There are two types of stream operations: intermediate operations and terminal operations.
  • Intermediate operations such as filter and map will return a stream and can be linked together. You can use them to set up a stream Waterline, but does not produce any results.
  • Terminal operations such as forEach and count will return a non stream value and process the pipeline to return the result.
  • The elements in the stream are calculated on demand.

Tags: Java gitee

Posted by danieloberg on Tue, 24 May 2022 10:17:38 +0300