rollingrank
I made a python library called rollingrank that calculates the rank in the rolling window, so I will introduce it. Please use it with kaggle. Please report a bug on github issue.
Repository https://github.com/contribu/rollingrank
In the problem I was solving Using the ranking in the rolling window as a feature was the key to achieving generalization performance. I think the point is that the distribution is not biased depending on the time.
While it seems to be useful (I don't know if it's useful in general because I only know my case) I don't see much in the kaggle area.
The reason may be that it is not easy to use. You can easily use the rolling rank with this library, so why not use it for the problem you are solving now?
pip install rollingrank
import numpy as np
import rollingrank
#Entering a numpy array will return a numpy array of the same length.
x = np.array([0.1, 0.2, 0.3, 0.25, 0.1, 0.2, 0.3])
y = rollingrank.rollingrank(x, window=3)
print(y)
# [nan nan 2. 1. 0. 1. 2.]
#With pct,[0, 1]Will return with
y = rollingrank.rollingrank(x, window=3, pct=True)
print(y)
# [nan nan 1. 0.5 0. 0.5 1. ]
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html
Of the methods defined in, you can use average, min, max, first by passing them with the method option.
I wanted to make something with O (n * log (w)) complexity, When I looked closely after implementing it, it was O (n * w). If you devise a balance tree, you should be able to make it O (n * log (w)). Someone please.
It seems good to modify something like the following. https://github.com/mpaland/avl_array
How to make a pip library: https://blog.amedama.jp/entry/packaging-python
rolling rank: https://github.com/pandas-dev/pandas/issues/9481
Pybind11 was convenient for linking with C ++.
Recommended Posts