Iterator that can scan forward from the middle of the sequence

Introduction

The iterator I wrote earlier that can scan forward from the middle of the sequence

sslice.py


def sslice(sequence, *slice_args):
    return (sequence[index]
            for index in range(*slice(*slice_args).indices(len(sequence))))

It is quite convenient and useful for its short length.

If you are always

I want an iterator that can be used like ʻitertools.islice ()` for sequences, is fast even if it scans from the middle, and can scan forward as well. !! !!

If you are anxious, feel free to use the above code now.

I have no idea why such a function is prepared ...

If you think, please see the continuation.

How to use sslice ()

First of all, as a concrete situation

I want to extract from the list L a list containing the nth element and the maximum m elements before and after it.

Let's assume the situation.

For example, if you have a list of some rankings, your ranking is 12,345, and you want to display up to 5th above and below each.

I wanted to make it a little more common, but I'm sorry I couldn't find a good one.

Note that we want to keep the sample code simple, so let's say that the first element in the list L is the 0th element instead of the 1st.

In this case, the 12,345th place is the 12,344th element.

In addition, it should be easy to handle ranks below you, so here we will only consider processing ranks above you.

Therefore, this process is

From the list L, retrieve the list containing the largest m element before the nth element

It became a requirement. The method I thought of at this stage was to use slicing,

>>> L = list(range(1, 50000+1))
>>> n = 12345 - 1
>>> m = 5
>>> L[max(0, n - m): n]
[12340, 12341, 12342, 12343, 12344]

It was a way to get a slice of the list with code like.

Of course, there is no particular problem with this method.

But what if the requirement for this process is not the "maximum m element ahead "but the" maximum m element ahead" that meets certain criteria?

The judgment condition this time is simply

>>> def is_odd(number):
...     return (number % 2 == 1)
...
>>> is_odd(1)
True
>>> is_odd(2)
False

I will say. The point is odd.

In this case, we have to determine whether each element meets the conditions, so at least slicing alone cannot solve it, and we need to scan forward from the nth element of the list.

Just in case, once I summarize the story so far,

>>> L = list(range(1, 50000+1))
>>> n = 12345 - 1
>>> m = 5
>>> L[max(0, n - m): n]
>>> def is_odd(number):
...     return (number % 2 == 1)
...
>>> 

Under the condition

[12343, 12341, 12339, 12337, 12335]

The purpose is to write the process to get the result.

In the end, you have to flip the list of results to [12335, 12337, 12339, 12341, 12343] in order to display them in ranking order, but skip that process.

Slices eat up too much memory

The first thing we considered was how to combine slices and list comprehensions.

>>> [rank for rank in L[n - 1::-1] if is_odd(rank)][:m]
[12343, 12341, 12339, 12337, 12335]

The correct result was displayed.

However, this method does not make a fool of the memory consumption of the temporarily generated list when the number of elements in the list L is large.

Therefore, I will reject this method.

ʻIslice ()` cannot scan forward

The next thing I considered was to use the ʻislice () module of the ʻitertools module that everyone loves to handle with an iterator.

>>> from itertools import islice
>>> islice(L, n - 1, None, -1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Step for islice() must be a positive integer or None.
>>>

For some reason, I got an error.

I was a little wondering why this would be an error, but I finally figured out why by reading the help for ʻislice ()`.

>>> help(itertools.islice)
Help on class islice in module itertools:

class islice(builtins.object)
 |  islice(iterable, stop) --> islice object
 |  islice(iterable, start, stop[, step]) --> islice object
 |
 |  Return an iterator whose next() method returns selected values from an
 |  iterable.  If start is specified, will skip all preceding elements;
 |  otherwise, start defaults to zero.  Step defaults to one.  If
 |  specified as another value, step determines how many values are
 |  skipped between successive calls.  Works like a slice() on a list
 |  but returns an iterator.
...

In short, ʻislice ()was supposed to have an iterable, oriter` method, or a sequence object as the first argument.

Python's list objects are sequences, but iterable objects aren't always sequences, rather they are often not sequences, so you can't expect to be able to scan in the opposite direction from the middle.

So, the step of ʻislice ()` was a specification that could not be a negative number.

I'm not giving up, so

Shouldn't it be possible to specify a negative number for step only when the first argument of ʻislice ()` is a sequence?

It's not surprising, but apparently I didn't make it that way because it's not as beautiful as Python to change the behavior depending on the type of the argument.

I'll leave it to the Pythonistas to worry about "why our Python is so correct and too beautiful", and I'll return to the story of achieving this goal.

Create ʻislice (), sslice ()` for the sequence

I thought about it in various ways, but in the end I simply wrote ʻislice ()andsslice ()` for the sequence.

def sslice(sequence, *slice_args):
    return (sequence[index]
            for index in range(*slice(*slice_args).indices(len(sequence))))

The code below was written with care not to expand slices using sslice ().

>>> list(islice((rank for rank in sslice(L, n - 1, None, -1) if is_odd(rank)), 0, m))
[12343, 12341, 12339, 12337, 12335]

With this, I finally achieved this goal.

By the way, ʻislice () seems to mean the iterator (ʻi) of a slice of an iterable object (slice), so if you name the iterator of a sequence slice tosslice (), it's a devout Pythonista. If you want slices of iterator, ʻislice (), and if you want slices of sequence, sslice ()` is very easy to remember, so please forgive me.

Case where sslice () is faster than ʻislice ()`

sslice () accesses the sequence via subscripts, so it's usually faster to use the variously optimized ʻislice () if you can get the same result with ʻislice (). However, it may be faster to use sslice () if the argument start has a large value.

The reason is that ʻislice ()has to process sequentially from the beginning, whereassslice ()` can be accessed as a subscript.

Earlier, I wrote that "it should be easy to handle ranks below you", but it is better to use sslice () for that as well.

It seems that there have been several requests for improvement regarding this limitation or specification of ʻislice ()`, but apparently mentioned above.

It is not beautiful in Python to change the behavior depending on the argument type

It seems that it has been rejected because of that.

If you're a well-trained Python user, you'll be weeping and weeping as soon as you hear the explanation.

As expected Pythonista! It's easy to do things that prioritize correctness over ease of use. I yearn for it!

I may have to take off my clothes and cover my head with the ice water in the bucket, but I'm a little weak and I'm not good at that, but please let me pass.

Summary

I wrote an iterator that can scan forward from the middle of the sequence.

sslice.py


def sslice(sequence, *slice_args):
    return (sequence[index]
            for index in range(*slice(*slice_args).indices(len(sequence))))

It's quite convenient and useful for its short length, so if you want an iterator that can be used like ʻitertools.islice ()` for sequences, is fast even if you scan from the middle, and can scan forward, please use it. Please give me.

It's so short that I thought maybe someone had written a similar function in the past, but I just found the same code except for the variable name so I won't claim the license.

Please feel free to use it.

(Bonus) How to use slice.indices ()

By the way, sslice () uses slice.indices () because

>>> sslice(range(20), None, None, -1)

This is to correspond to arguments such as.

There are many parts of the Python help documentation that are so confusing that you might wonder, "Is this written on purpose?", And the help for slice.indices () is one of them. think.

>>> help(slice.indices)
Help on method_descriptor:

indices(...)
    S.indices(len) -> (start, stop, stride)

    Assuming a sequence of length len, calculate the start and stop
    indices, and the stride length of the extended slice described by
    S. Out of bounds indices are clipped in a manner consistent with the
    handling of normal slices.
(END)

I'm not the only one who didn't understand at first glance what the purpose of this method was after reading the above help, right? (Sweat

Such slice.indices () is recommended because it will be decided in one shot if you use it when you want to generate the argument of range () from any slice object.

Perhaps there are no other situations where this method is useful. (Lol)

Recommended Posts

Iterator that can scan forward from the middle of the sequence
Obtain the sequence information of the translated protein from the mutation information of CDS
From a book that programmers can learn (Python): Find the mode
From a book that programmers can learn ... (Python): Review of arrays
Existence from the viewpoint of Python
I made a simple timer that can be started from the terminal
Find out the name of the method that called it from the method that is python
From a book that makes the programmer's way of thinking interesting (Python)
This and that of the inclusion notation.
Algorithm Gymnastics 24 Middle of the Linked List
Learning notes from the beginning of Python 1
Omit BOM from the beginning of the string
Get the value of the middle layer of NN
Learning notes from the beginning of Python 2
[Python] The movement of the decorator that can be understood this time ② The decorator that receives the argument
Tensorflow, it seems that even the eigenvalues of the matrix can be automatically differentiated
From a book that programmers can learn: Verification of rune checksum (fixed length)