[JAVA] Efficiently extract multiple elements from the list randomly and without duplication

Introduction

(As pointed out in Comment, there is some room for efficiency improvement.)

Comment of Hokuhoku no Imo This is the method I thought of for.

This section describes how to efficiently extract multiple elements randomly from the list without duplication. The "list" here is a variable-length, randomly accessible list [^ variable-length, randomly accessible list].

[^ Variable-length randomly accessible list]: java.util.ArrayList for Java, ʻArray for JavaScript, std: vector` for C ++

The explanation is given in Java, but the final code that has been functionalized for reuse is provided in Java, JavaScript, and Kotlin.

In the Java code example, suppose the following ʻimport` statement is declared at the beginning of the file.

Java


import java.util.*;

Randomly retrieve elements from the list

Only one

It's easy to randomly pick just one element from the list.

Java


List<Character> list = Arrays.asList('A', 'B', 'C', 'D'); // A, B, C,List with 4 elements of D

int index = new Random().nextInt(list.size()); //Randomly selected integer greater than or equal to 0 and less than 4
Character result = list.get(index); //Randomly selected elements

System.out.println(result); //Output example> B

all

Retrieving all the elements in a random order is also easy if the standard library has such a function.

Java


List<Character> list = Arrays.asList('A', 'B', 'C', 'D'); // A, B, C,List with 4 elements of D

List<Character> shuffled = new ArrayList<>(list); //A copy of the list that is then randomly sorted
Collections.shuffle(shuffled); //Randomly sort the elements of shuffled.

System.out.println(shuffled); //Output example> [C, A, D, B]

Multiple, not all

So how about retrieving multiple elements, not all elements, in a random order?

Java (Note: Inefficient)


List<Character> list = Arrays.asList('A', 'B', 'C', 'D'); // A, B, C,List with 4 elements of D

List<Character> shuffled = new ArrayList<>(list); //A copy of the list that is then randomly sorted
Collections.shuffle(shuffled); //Randomly sort the elements of shuffled.
List result = shuffled.subList(0, 2); //New list with two elements from the front of shuffled

System.out.println(result); //Output example> [C, B]

No, it's easy to write this, There is a lot of waste because all the elements are shuffled. If you shuffle 10,000 elements when you retrieve 100 elements from a list of 10,000 elements, 9900 elements will be wasted.

Then what do you do?

Avoid unnecessary shuffle

Randomly select only one element, remove the selected element from the list, and so on.

Java (Note: Inefficient)


List<Character> list = Arrays.asList('A', 'B', 'C', 'D'); // A, B, C,List with 4 elements of D

List<Character> result = new ArrayList<>(); //A list with randomly selected elements
List<Character> remaining = new ArrayList<>(list); //List of remaining elements
Random random = new Random(); //Random number generator
for (int i = 0; i < 2; i++) { //Repeat twice.
    int remainingCount = remaining.size(); //Number of remaining elements
    int index = random.nextInt(remainingCount); //Randomly selected index

    Character element = remaining.remove(index); //Fetch randomly selected elements.
    result.add(element); //Have Randomly Selected Elements Add a randomly selected element to the end of the list.
}

System.out.println(result); //Output example> [B, D]

There is no useless shuffle. However, this is also inefficient.

Randomly accessible lists internally use arrays to hold elements If you delete an element in the middle, the process of stuffing the elements after that will be performed.

|A|B|C|D| | | | |
↓ Delete B
|A|C|D| | | | | |

Therefore, the earlier the element to be deleted, the longer the processing time.

Then what should I do?

Avoid deleting elements in the middle

It's slow because it deletes the elements in the middle. Delete only the end. Let's replace the elements in the middle without deleting them.

When deleting B


|A|B|C|D| | | | |
↓ Delete the D at the end and
|A|B|C| | | | | |
↓ Replace B with D.
|A|D|C| | | | | |

The code looks like this:

Java


List<Character> list = Arrays.asList('A', 'B', 'C', 'D'); // A, B, C,List with 4 elements of D

List<Character> result = new ArrayList<>(); //A list with randomly selected elements

List<Character> remaining = new ArrayList<>(list); //List of remaining elements
Random random = new Random(); //Random number generator
for (int i = 0; i < 2; i++) { //Repeat twice.
    int remainingCount = remaining.size(); //Number of remaining elements
    int index = random.nextInt(remainingCount); //Randomly selected index

    Character element = remaining.get(index); //Randomly selected elements
    result.add(element); //Have Randomly Selected Elements Add a randomly selected element to the end of the list.

    int lastIndex = remainingCount - 1; //Index at the end of the list of remaining elements
    Character lastElement = remaining.remove(lastIndex); //Remove the end from the list of remaining elements.
    if (index < lastIndex) { //If the randomly selected element is other than the end ...
        remaining.set(index, lastElement); //Replace it with the last element.
    }
}

System.out.println(result); //Output example> [D, A]

measurement

Let's easily measure the difference in processing speed. Compare the one that shuffles all the elements to get the first partial list with the last one that is the least wasteful.

If you set the number of elements in the list to 100,000 and randomly acquire 50,000 from it, Approximately 590ms for all element shuffle If there is no waste, it will be about 370ms.

Code used for measurement and output example

Java


public class TakeAtRandom {
    public static void main(String... args) {
        //List size
        final int LIST_SIZE = 10_000_000;
        //Number of elements to get from the list
        final int N = 5_000_000;
        //Number of repetitions
        final int REPETITION = 100;

        final List<Integer> list = new ArrayList<>(LIST_SIZE);
        for (int i = 0; i < LIST_SIZE; i++) {
            list.add(i);
        }
        final Random random = new Random();

        for (int i = 0; i < REPETITION; i++) {
            {
                final long startAt = System.currentTimeMillis();
                takeFromShuffled(list, N, random);
                final long finishedAt = System.currentTimeMillis();
                System.out.print(String.format("Shuffle all elements: %,d ms", finishedAt - startAt));
            }
            {
                final long startAt = System.currentTimeMillis();
                takeAtRandom(list, N, random);
                final long finishedAt = System.currentTimeMillis();
                System.out.print(String.format("No waste: %,d ms", finishedAt - startAt));
            }
            System.out.println();
        }
    }

    public static <T> List<T> takeFromShuffled(List<T> list, int n, Random random) {
        final List<T> workList = new ArrayList<>(list);
        Collections.shuffle(workList);
        return workList.subList(0, n);
    }

    public static <T> List<T> takeAtRandom(Collection<T> list, int n, Random random) {
        final List<T> taken = new ArrayList<>(n); //A list with randomly selected elements

        final List<T> remaining = new ArrayList<>(list); //List of remaining elements
        for (int i = 0; i < n; i++) { //Repeat n times.
            final int remainingCount = remaining.size(); //Number of remaining elements
            final int index = random.nextInt(remainingCount); //Randomly selected index

            final T element = remaining.get(index); //Randomly selected elements
            taken.add(element); //Add a randomly selected element to the end of the list of randomly selected elements.

            final int lastIndex = remainingCount - 1; //Index at the end of the list of remaining elements
            final T lastElement = remaining.remove(lastIndex); //Remove the end from the list of remaining elements.
            if (index < lastIndex) { //If the randomly selected element is other than the end ...
                remaining.set(index, lastElement); //Replace it with the last element.
            }
        }

        return taken;
    }
}

Output example


Shuffle all elements:608 ms no waste: 705 ms
Shuffle all elements:659 ms no waste: 1,985 ms
Shuffle all elements:585 ms no waste: 372 ms
Shuffle all elements:598 ms no waste: 378 ms
Shuffle all elements:598 ms no waste: 397 ms
Shuffle all elements:586 ms no waste: 451 ms
Shuffle all elements:583 ms no waste: 381 ms
Shuffle all elements:584 ms no waste: 391 ms
Shuffle all elements:580 ms no waste: 383 ms
Shuffle all elements:578 ms no waste: 383 ms
Shuffle all elements:584 ms no waste: 377 ms
Shuffle all elements:585 ms no waste: 378 ms
Shuffle all elements:598 ms no waste: 381 ms
Shuffle all elements:591 ms no waste: 359 ms
Shuffle all elements:605 ms no waste: 354 ms
Shuffle all elements:590 ms no waste: 373 ms
Shuffle all elements:584 ms no waste: 379 ms
Shuffle all elements:586 ms no waste: 359 ms
Shuffle all elements:603 ms no waste: 357 ms
Shuffle all elements:589 ms no waste: 376 ms
Shuffle all elements:579 ms no waste: 385 ms
Shuffle all elements:581 ms no waste: 352 ms
Shuffle all elements:610 ms no waste: 350 ms
Shuffle all elements:587 ms no waste: 372 ms
Shuffle all elements:603 ms no waste: 381 ms
Shuffle all elements:595 ms no waste: 353 ms
Shuffle all elements:606 ms no waste: 348 ms
Shuffle all elements:579 ms no waste: 376 ms
Shuffle all elements:576 ms no waste: 390 ms
Shuffle all elements:582 ms no waste: 359 ms
Shuffle all elements:602 ms no waste: 357 ms
Shuffle all elements:583 ms no waste: 375 ms
Shuffle all elements:585 ms no waste: 379 ms
Shuffle all elements:586 ms no waste: 353 ms
Shuffle all elements:603 ms no waste: 350 ms
Shuffle all elements:590 ms no waste: 370 ms
Shuffle all elements:587 ms no waste: 376 ms
Shuffle all elements:586 ms no waste: 351 ms
Shuffle all elements:600 ms no waste: 359 ms
Shuffle all elements:585 ms no waste: 379 ms
Shuffle all elements:579 ms no waste: 382 ms
Shuffle all elements:580 ms no waste: 352 ms
Shuffle all elements:605 ms no waste: 349 ms
Shuffle all elements:588 ms no waste: 372 ms
Shuffle all elements:586 ms no waste: 375 ms
Shuffle all elements:584 ms no waste: 351 ms
Shuffle all elements:598 ms no waste: 358 ms
Shuffle all elements:579 ms no waste: 376 ms
Shuffle all elements:579 ms no waste: 381 ms
Shuffle all elements:578 ms no waste: 357 ms
Shuffle all elements:601 ms no waste: 348 ms
Shuffle all elements:593 ms no waste: 371 ms
Shuffle all elements:585 ms no waste: 376 ms
Shuffle all elements:584 ms no waste: 352 ms
Shuffle all elements:602 ms no waste: 349 ms
Shuffle all elements:586 ms no waste: 372 ms
Shuffle all elements:586 ms no waste: 376 ms
Shuffle all elements:578 ms no waste: 359 ms
Shuffle all elements:598 ms no waste: 357 ms
Shuffle all elements:583 ms no waste: 379 ms
Shuffle all elements:578 ms no waste: 382 ms
Shuffle all elements:592 ms no waste: 356 ms
Shuffle all elements:605 ms no waste: 350 ms
Shuffle all elements:587 ms no waste: 374 ms
Shuffle all elements:586 ms no waste: 377 ms
Shuffle all elements:586 ms no waste: 353 ms
Shuffle all elements:600 ms no waste: 357 ms
Shuffle all elements:581 ms no waste: 380 ms
Shuffle all elements:580 ms no waste: 380 ms
Shuffle all elements:580 ms no waste: 352 ms
Shuffle all elements:605 ms no waste: 349 ms
Shuffle all elements:592 ms no waste: 373 ms
Shuffle all elements:585 ms no waste: 374 ms
Shuffle all elements:585 ms no waste: 352 ms
Shuffle all elements:598 ms no waste: 357 ms
Shuffle all elements:580 ms no waste: 377 ms
Shuffle all elements:582 ms no waste: 383 ms
Shuffle all elements:578 ms no waste: 362 ms
Shuffle all elements:611 ms no waste: 358 ms
Shuffle all elements:581 ms no waste: 375 ms
Shuffle all elements:594 ms no waste: 377 ms
Shuffle all elements:587 ms no waste: 362 ms
Shuffle all elements:603 ms no waste: 350 ms
Shuffle all elements:587 ms no waste: 373 ms
Shuffle all elements:585 ms no waste: 376 ms
Shuffle all elements:586 ms no waste: 353 ms
Shuffle all elements:601 ms no waste: 358 ms
Shuffle all elements:593 ms no waste: 378 ms
Shuffle all elements:580 ms no waste: 384 ms
Shuffle all elements:577 ms no waste: 351 ms
Shuffle all elements:606 ms no waste: 349 ms
Shuffle all elements:589 ms no waste: 380 ms
Shuffle all elements:590 ms no waste: 352 ms
Shuffle all elements:605 ms no waste: 348 ms
Shuffle all elements:590 ms no waste: 375 ms
Shuffle all elements:581 ms no waste: 359 ms
Shuffle all elements:599 ms no waste: 357 ms
Shuffle all elements:581 ms no waste: 381 ms
Shuffle all elements:579 ms no waste: 353 ms
Shuffle all elements:607 ms no waste: 349 ms

Functionalization

I made it a function so that it can be reused.

Java

Java


import java.util.*;

public class TakeAtRandom {
    /**
     *Randomly retrieves the specified number of elements from the list without duplication.
     *
     * @param list Extracts elements from this list. This list does not change.
     * @param n Number of elements to retrieve.
     * @param <T>Element type.
     * @return A list with the retrieved elements.
     */
    public static <T> List<T> takeAtRandom(Collection<T> list, int n) {
        return takeAtRandom(list, n, new Random());
    }

    /**
     *Randomly retrieves the specified number of elements from the list without duplication.
     * 
     * @param list Extracts elements from this list. This list does not change.
     * @param n Number of elements to retrieve.
     * @param random The random number generator to use.
     * @param <T>Element type.
     * @return A list with the retrieved elements.
     */
    public static <T> List<T> takeAtRandom(Collection<T> list, int n, Random random) {
        if (list == null) {
            throw new IllegalArgumentException("The value of the argument list is null.");
        }
        if (n > list.size()) {
            throw new IllegalArgumentException(
                    String.format("The value of the argument n%d is the size of the argument list%Greater than d.",
                            n, list.size()));
        }
        if (random == null) {
            throw new IllegalArgumentException("The value of the argument random is null.");
        }

        final List<T> taken = new ArrayList<>(n); //A list with randomly selected elements

        final List<T> remaining = new ArrayList<>(list); //List of remaining elements
        for (int i = 0; i < n; i++) { //Repeat n times.
            final int remainingCount = remaining.size(); //Number of remaining elements
            final int index = random.nextInt(remainingCount); //Randomly selected index

            final T element = remaining.get(index); //Randomly selected elements
            taken.add(element); //Add a randomly selected element to the end of the list of randomly selected elements.

            final int lastIndex = remainingCount - 1; //Index at the end of the list of remaining elements
            final T lastElement = remaining.remove(lastIndex); //Remove the end from the list of remaining elements.
            if (index < lastIndex) { //If the randomly selected element is other than the end ...
                remaining.set(index, lastElement); //Replace it with the last element.
            }
        }

        return taken;
    }
}

JavaScript

JavaScript


/**
 *Randomly retrieve the specified number of elements from the array without duplication.
 *
 * @param {Array}array Extracts elements from this array. This array does not change.
 * @param {number}n Number of elements to retrieve.
 * @return {Array}An array with the retrieved elements.
 */
function takeAtRandom(array, n) {
  if (array == null) {
    throw new Error("The value of the argument array is null or undefined.");
  }
  if (n > array.length) {
    throw new Error(`The value of the argument n${n}Is the size of the argument array${array.length}Is larger.`);
  }
  
  const taken = [];
  
  const remaining = array.concat();
  for (let i = 0; i < n; i++) {
    const remainingCount = remaining.length;
    const index = Math.floor(Math.random() * remainingCount);
    
    const element = remaining[index];
    taken.push(element);
    
    const lastIndex = remainingCount - 1;
    const lastElement = remaining.pop(lastIndex);
    if (index < lastIndex) {
      remaining[index] = lastElement;
    }
  }
  
  return taken;
}

Generator version

JavaScript


/**
 *Returns a generator that randomly picks elements from an array without duplication.
 *
 * @param {Array}array Extracts elements from this array. This array does not change.
 */
function* takeAtRandom(array) {
  if (array == null) {
    throw new Error("The value of the argument array is null or undefined.");
  }
  
  const remaining = array.concat();
  while(true) {
    const remainingCount = remaining.length;
    if (remainingCount === 0) {
      break;
    }
    
    const index = Math.floor(Math.random() * remainingCount);
    yield remaining[index];
    
    const lastIndex = remainingCount - 1;
    const lastElement = remaining.pop();
    if (index < lastIndex) {
      remaining[index] = lastElement;
    }
  }
}

Kotlin

Kotlin


import kotlin.random.Random

/**
 *Randomly retrieves the specified number of elements from the list without duplication.
 *
 * @receiver Extracts an element from this list.
 * @param n Number of elements to retrieve.
 * @param random The random number generator to use.
 * @param <T>Element type.
 * @return A list with the retrieved elements.
 */
fun <T> Collection<T>.takeAtRandom(n: Int, random: Random = Random): List<T> {
    require(n <= size) { "The value of the argument n$n is the size of the receiver${size}Is larger." }

    val taken = mutableListOf<T>() //A list with randomly selected elements

    val remaining = toMutableList() //List of remaining elements
    //Repeat n times.
    repeat(n) {
        val remainingCount = remaining.size //Number of remaining elements
        val index = random.nextInt(remainingCount) //Randomly selected index

        taken += remaining[index] //Randomly selected elements

        val lastIndex = remainingCount - 1 //Index at the end of the list of remaining elements
        val lastElement = remaining.removeAt(lastIndex) //Remove the end from the list of remaining elements.
        if (index < lastIndex) { //If the randomly selected element is other than the end ...
            remaining[index] = lastElement //Replace it with the last element.
        }
    }

    return taken
}

Sequence version

Kotlin


import kotlin.random.Random

/**
 *Returns a sequence that randomly retrieves the specified number of elements from the list without duplication.
 *
 * @receiver Extracts an element from this list.
 * @param random The random number generator to use.
 * @param <T>Element type.
 * @return sequence.
 */
fun <T> Collection<T>.toRandomSequence(random: Random = Random): Sequence<T> = sequence {
    val remaining = toMutableList() //List of remaining elements
    while (true) {
        val remainingCount = remaining.size //Number of remaining elements
            .takeIf { it > 0 }
            ?: break
        val index = random.nextInt(remainingCount) //Randomly selected index

        remaining[index].also {
            yield(it)
        }

        val lastIndex = remainingCount - 1 //Index at the end of the list of remaining elements
        val lastElement = remaining.removeAt(lastIndex) //Remove the end from the list of remaining elements.
        if (index < lastIndex) { //If the randomly selected element is other than the end ...
            remaining[index] = lastElement //Replace it with the last element.
        }
    }
}

/that's all

Recommended Posts

Efficiently extract multiple elements from the list randomly and without duplication
ArrayList and the role of the interface seen from List
Extract a specific element from the list of objects
[Java] Delete the elements of List
Extract the required information from pom.xml
Choose the first non-empty one from multiple Optional and call that method
[Java] Generate a narrowed list from multiple lists using the Stream API