Example of using Iterator in Java (Bonus: Super convenient! Compositing Iterator ~ Do something like flatMap to Iterator)

Introduction

Iterator is useful when processing large amounts of data. Especially when processing a large amount of data that costs a lot of calculation time and memory usage, if all the data is converted in advance and converted to a collection such as List, the memory will be insufficient and OutOfMemory will occur, or GC The processing may be significantly slowed down due to the frequent occurrence of.

When not using an iterator

For example, suppose you have an interface that receives data in a list from a data store as follows: (It is assumed that the primary keys are arranged in the order of record registration)

Data store.java


interface MyRepository<E>{
  /**
  *The value of the primary key is[fromInclusive, untilInclusive]Returns a list of records in the range of
  */
  List<E> findByIdRange(long fromInclusive,long untilInclusive);
}

When you want to convert all the records written on one day (a range of a certain primary key) and plunge into another data store, for example, you can write as follows using Stream API. ..

Obedient.java


MyRepository<MyRecord> myRepository;
void extractTransferAndLoad(long fromInclusive,long untilInclusive){
  //Image of running load-intensive processing in parallel
  myRepository.findByIdRange(fromInclusive,untilInclusive).parallelStream().map(this::transfer).foreach(this::load);
}

However, if you have a large amount of data, myRepository.findByIdRange (long, long) may cause OutOfMemory, or it may consume a large amount of memory and affect performance.

Try using an iterator

Therefore, I will make my own iterator. There are two interfaces to remember to create an iterator.

Interface for mastering iterators.java


package java.util;
interface Iterator<A>{
  boolean hasNext();
  A next();
}

package java.lang;
interface Iterable<A>{
  Iterator<A> iterator();
}

Let's write a class that gets the above findByIdRange separately using an iterator.

Iterator implementation.java


class MyIterableRepository<A> implements Iterable<List<A>>{
  final private MyRepository<A> myRepository; 
  final private long fromInclusive;
  final private long untilInclusive;
  final private int splitSize; //split by splitSize and return List
  public MyIterableRepository(MyRepository<A> myRepository,long fromInclusive,long untilInclusive,int splitSize){
    //Constructor just for DI ... omitted
  }
  public Iterator<List<A>> iterator() {
    return new Iterator<List<A>>() {
      long currentFromInclusive = fromInclusive;
      public boolean hasNext() {
        return untilInclusive >= currentFromInclusive;
      }
      public List<A> next() {
        long currentUntilInclusive = Math.min(untilInclusive, nextFromInclusive+splitSize-1);
        List<A> ret = myRepository.findById(currentFromInclusive,currentUntilInclusive);
        currentFromInclusive = currentUntilInclusive+1;
        return ret;
      }
    };
  }
}

Now you have an iterator. It's a simple pattern, so it's easy if you get used to it and remember it. When Iterable is implemented, the caller can write the following using the extended for statement.

Caller.java


MyRepository<MyRecord> myRepository;
void extractTransferAndLoad(long fromInclusive,long untilInclusive){
  MyIterableRepository<MyData> myIterableRepository = new MyIterableRepository(myRepository,fromInclusive,untilInclusive, MagicNumbers.SPLIT_SIZE);
  for(List<MyData> myDataList : myIterableRepository){ 
    //Image of running load-intensive processing in parallel
    myDataList.parallelStream().map(this::transfer).forEach(this::load);
  }
}

By changing in this way, you can reduce the number of objects generated at one time, and at the end of the for statement, myDataList (and its reference) can be modified from GC, and you can avoid falling into OoM. I will.

Extra: Super convenient! Iterator synthesis

There are times when you want to synthesize an iterator, such as when you need to retrieve data from multiple data sources. For example, if you have user information, purchase history associated with the user, and posting history associated with the user, and you are creating an integrated static page in batch processing, double extension for It's just a sentence, but (below, an image of the content of the process you want to realize)

I want to do the following as processing.java


Iterable<MyRepository2> repositories;
void extractTransferAndLoad(MyTime fromInclusive,MyTime untilInclusive){
  for(MyRepository2 repository: repositories){
    Iterable<List<UserId>> ittrable = new MyIterableRepository2(repository,fromInclusive,untilInclusive,42).getUserIdsIterable();
    for(List<UserId> userIds : ittrable){
      userIds.parallelStream().filter( /*Ignore the ID you have already processed*/ ).map( /*Convert*/ ).forEach( /*Export*/ );
    }
  }
}

If you are given the process of receiving an iterator and converting it, and you have to call it, you're in trouble. (Example of callee below)

Callee.java


//Called method
void extractTransferAndLoad2(Iterable<List<UserId>> userIdsListIterable){
  for(List<UserId> userIds : userIdsListIterable){
      userIds.parallelStream().filter( /*Ignore the ID you have already processed*/ ).map( /*Convert*/ ).forEach( /*Export*/ );
  }
}

//Image of what the caller wants to do
extractTransferAndLoad2(repositories.flatMap(repository -> new MyIterableRepository2(repository,fromInclusive,untilInclusive,42).getUserIdsIterable()));
//Iterator doesn't have a flatMap, so you can't do this.

So if you use the method I thought of to synthesize a super-awesome iterator

Iterator synthesis.java


public class IteratorUtils {
  public static <A,B> Iterator<B> composedIterator(Iterator<A> aittr, Function<A,Iterator<B>> func){		
    return new Iterator<B>(){

      Iterator<B> bittr = Collections.emptyIterator();
			
      public boolean hasNext() {
        while(!bittr.hasNext() && aittr.hasNext()){
          bittr = func.apply(aittr.next());
        }
        return bittr.hasNext();
      }
      public B next() {
        while(!bittr.hasNext() && aittr.hasNext()){
          bittr = func.apply(aittr.next());
        }
        return bittr.next();
      }			
    };
  }
}

Then

When calling.java


extractTransferAndLoad2(IteratorUtils.composedIterator(repositories, repository -> new MyIterableRepository2(repository,fromInclusive,untilInclusive,42).getUserIdsIterable()));

You can synthesize and pass iterators as! !!

By the way, I thought that if I stopped Iterator and returned all with Stream, it would be one shot with flatMap, but there is no way to make a finite Stream with Java other than making a Spliterator based on Iterator, so I'm sorry. However, it could not be realized.

Recommended Posts

Example of using Iterator in Java (Bonus: Super convenient! Compositing Iterator ~ Do something like flatMap to Iterator)
I want to do something like "cls" in Java
Implementation of like function in Java
Implement something like a stack in Java
Example of using addition faster than using StringBuilder (Java)
How to get the class name of the argument of LoggerFactory.getLogger when using SLF4J in Java