[JAVA] To do Stream.distinct with field properties etc.

To perform Stream.distinct with fields, properties, calculation results, etc.

Stream.distinct of java Stream API cannot take a lambda expression as an argument.

class Item{
	String name,shop;
	int price;
	Item(String n,int p,String s){ name=n; price=p; shop=s; }
	public String toString(){ return name+", "+price+", "+shop; }
}
Item[] items = {
	new Item("item-1",1000,"shop-A"),
	new Item("item-2",1100,"shop-B"),
	new Item("item-3",1200,"shop-C"),
	new Item("item-4",2000,"shop-A"),
	new Item("item-5",2100,"shop-B"),
	new Item("item-6",2200,"shop-C"),
	new Item("item-7",3000,"shop-A"),
	new Item("item-8",3100,"shop-B"),
	new Item("item-9",3200,"shop-C"),
};

To retrieve one product per shop from the above array, if distinct could take a lambda expression as an argument, it would look like this:

Stream.of(items)
	.distinct(item->item.shop)
	...

But I can't write it like this, so I'll use a combination of filter and Set instead.

Set<String> unique = new HashSet<>();
Stream.of(items)
	.filter(item->unique.add(item.shop))
	.forEach(System.out::println);
	
> item-1, 1000, shop-A
> item-2, 1100, shop-B
> item-3, 1200, shop-C

The reason why this is equivalent to distinct is that Set.add behaves as follows.

If the specified element is not included in the set, it is added to the set and returns true. If the specified element is already included in the set, it returns false without changing the set.

In other words, it can be used as a conditional expression that returns true for the first one of the duplicate elements.

In parallel streams

However, HashSet is not thread-safe and is dangerous for parallel streams.

Set unique = new HashSet <> (); // Not thread safe! Stream.of(items) .parallel () // Processed in multiple threads! .filter(item->unique.add(item.shop)) .forEach(System.out::println);

item-7,3000, shop-A // Duplicate! item-5,2100,shop-B item-1,1000, shop-A // Duplicate! item-6,2200,shop-C

At a rate of about 1/100, the above result was obtained. Instead, use ConcurrentHashMap, which supports concurrency, by converting it to a Set with Collections.newSetFromMap.

Set<String> unique = Collections.newSetFromMap(new ConcurrentHashMap<>());
Stream.of(items)
	.parallel()
	.filter(item->unique.add(item.shop))
	.forEach(System.out::println);

> item-6, 2200, shop-C
> item-5, 2100, shop-B
> item-7, 3000, shop-A

Ok

Recommended Posts

To do Stream.distinct with field properties etc.
How to do API-based control with cancancan
How to get jdk etc from oracle with cli
What to do when you launch an application with rails