memo.
import numpy as np
import pandas as pd
np.random.seed(99)
df = pd.DataFrame(index=pd.date_range('2020-01-01', periods=60, freq='D'),
data=np.array(list('Sunny cloudy rain snow'))[np.random.randint(0, 4, 60)],
columns=['weather'], dtype='string')
df
weather
2020-01-01 Cloudy
2020-01-02 Snow
2020-01-03 Cloudy
2020-01-04 fine
2020-01-05 Cloudy
2020-01-06 fine
2020-01-07 Rain
2020-01-08 fine
2020-01-09 Cloudy
2020-01-10 fine
2020-01-11 Cloudy
2020-01-12 snow
2020-01-13 rain
2020-01-14 snow
2020-01-15 snow
2020-01-16 cloudy
2020-01-17 Cloudy
2020-01-18 fine
2020-01-19 cloudy
2020-01-20 snow
2020-01-21 rain
2020-01-22 snow
2020-01-23 fine
2020-01-24 snow
2020-01-25 rain
2020-01-26 fine
2020-01-27 fine
2020-01-28 fine
2020-01-29 fine
2020-01-30 fine
2020-01-31 cloudy
2020-02-01 Snow
2020-02-02 Snow
2020-02-03 Snow
2020-02-04 Rain
2020-02-05 Snow
2020-02-06 Snow
2020-02-07 Rain
2020-02-08 Cloudy
2020-02-09 rain
2020-02-10 fine
2020-02-11 fine
2020-02-12 cloudy
2020-02-13 rain
2020-02-14 cloudy
2020-02-15 cloudy
2020-02-16 fine
2020-02-17 Cloudy
2020-02-18 snow
2020-02-19 cloudy
2020-02-20 snow
2020-02-21 snow
2020-02-22 rain
2020-02-23 rain
2020-02-24 rain
2020-02-25 snow
2020-02-26 cloudy
2020-02-27 fine
2020-02-28 fine
2020-02-29 fine
It is a record when you want to extract the period showing a specific weather pattern when you have the above data.
Judgment on a daily basis is easy.
df[df['weather'] == 'Fine']
weather
2020-01-04 fine
2020-01-06 fine
2020-01-08 fine
2020-01-10 fine
2020-01-18 fine
2020-01-23 fine
2020-01-26 fine
2020-01-27 fine
2020-01-28 fine
2020-01-29 fine
2020-01-30 fine
2020-02-10 fine
2020-02-11 fine
2020-02-16 fine
2020-02-27 fine
2020-02-28 fine
2020-02-29 fine
I just want to finish it.
df[(df['weather'] == 'Fine') & (df['weather'].shift(-1, fill_value='') == 'Fine')]
weather
2020-01-26 fine
2020-01-27 fine
2020-01-28 fine
2020-01-29 fine
2020-02-10 fine
2020-02-27 fine
2020-02-28 fine
This has extracted the day of "the day offine
and the next day of fine
", and failed to extract the "period offine
". The last day of the period, specifically January 30, February 11, and February 29, is missed in this example. The day after the day when the result of the conditional expression is True
must also be True
.
So the following is the correct answer.
cond = (df['weather'] == 'Fine') & (df['weather'].shift(-1, fill_value='') == 'Fine')
cond |= cond.shift(fill_value=False)
out_df = df[cond]
out_df
weather
2020-01-26 fine
2020-01-27 fine
2020-01-28 fine
2020-01-29 fine
2020-01-30 fine
2020-02-10 fine
2020-02-11 fine
2020-02-27 fine
2020-02-28 fine
2020-02-29 fine
cond |= cond.shift(fill_value=False)
Iscond = cond | cond.shift(fill_value=False)
is what it means.
By the way, in order to assign a group number for each period, you can see if it will be fine the next day, so use diff ()
to
ngroup = cond.diff()[cond].cumsum()
out_df.assign(ngroup=ngroup)
Weather ngroup
2020-01-26 fine 1
2020-01-27 fine 1
2020-01-28 fine 1
2020-01-29 fine 1
2020-01-30 fine 1
2020-02-10 fine 2
2020-02-11 fine 2
2020-02-27 fine 3
2020-02-28 fine 3
2020-02-29 fine 3
It is an application of the previous section.
cond = ((df['weather'] == 'Fine') & (df['weather'].shift(-1, fill_value='') == 'Fine')
& (df['weather'].shift(-2, fill_value='') == 'Fine'))
cond |= cond.shift(1, fill_value=False) | cond.shift(2, fill_value=False)
out_df = df[cond]
out_df
weather
2020-01-26 fine
2020-01-27 fine
2020-01-28 fine
2020-01-29 fine
2020-01-30 fine
2020-02-27 fine
2020-02-28 fine
2020-02-29 fine
How to write as follows
cond = ((df['weather'] == 'Fine')
& (df['weather'].shift(-1, fill_value='') == 'Fine')
& (df['weather'].shift(-2, fill_value='') == 'Fine')
& (df['weather'].shift(-3, fill_value='') == 'Fine')
& (df['weather'].shift(-4, fill_value='') == 'Fine'))
cond |= (cond.shift(1, fill_value=False) | cond.shift(2, fill_value=False)
| cond.shift(3, fill_value=False) | cond.shift(4, fill_value=False))
out_df = df[cond]
out_df
Is painful to go this far, so use scipy.ndimage.binary_closing ()
to
from scipy.ndimage import binary_closing
days = 5
cond = binary_closing((df['weather'] != 'Fine').to_numpy(dtype=bool),
np.ones(days, dtype=bool), border_value=True)
df[~cond]
weather
2020-01-26 fine
2020-01-27 fine
2020-01-28 fine
2020-01-29 fine
2020-01-30 fine
If you want to get rain [rainy] sunny sunny sunny rain
when there is rain fine sunny sunny sunny rain
.
Applying the diff ()
used when assigning the group number,
cond = (df['weather'] == 'Fine') & (df['weather'].shift(-1, fill_value='') == 'Fine')
cond &= cond.diff()
cond |= cond.shift(fill_value=False)
out_df = df[cond]
out_df
weather
2020-01-26 fine
2020-01-27 fine
2020-02-10 fine
2020-02-11 fine
2020-02-27 fine
2020-02-28 fine
If you want to get rain, sunny, sunny, sunny, rain
when you have rain, fine, sunny, sunny, rain
.
It is an application of the previous section.
cond = (df['weather'] == 'Fine') & (df['weather'].shift(-1, fill_value='') == 'Fine')
cond = ~cond & cond.diff()
cond |= cond.shift(-1, fill_value=False)
out_df = df[cond]
out_df
weather
2020-01-29 fine
2020-01-30 fine
2020-02-10 fine
2020-02-11 fine
2020-02-28 fine
2020-02-29 fine
It's easy to think of it as an application of the previous section, and add "not sunny the day before" and "not sunny the day after tomorrow" as the conditions. .. ..
cond = ((df['weather'] == 'Fine') & (df['weather'].shift(-1, fill_value='') == 'Fine')
& (df['weather'].shift(1, fill_value='') != 'Fine')
& (df['weather'].shift(-2, fill_value='') != 'Fine'))
cond |= cond.shift(fill_value=False)
out_df = df[cond]
out_df
weather
2020-02-10 fine
2020-02-11 fine
For the time being, you can also do this.
is_sunny = (df['weather'] == 'Fine').to_numpy(dtype=int)
# array([0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
# 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,
# 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
base = 2**np.arange(4)
#Weather base
# 2020-02-09 rain* 1
# 2020-02-10 fine* 2
# 2020-02-11 fine* 4
# 2020-02-12 cloudy* 8
# 0*1 + 1*2 + 1*4 + 0*8 = 6
target_number = 6
cond = np.convolve(is_sunny, base) == target_number
out_df = df[cond[2:-1] | cond[3:]]
First, extract only the sunny and rainy days, and get the sunny day and the next rainy day. Then use cumsum ()
to separate the groups and extract.
is_sun_rain = df['weather'].isin(set('Sunny rain'))
subdf = df[is_sun_rain]
subcond = ((subdf['weather'] == 'Fine')
& (subdf['weather'].shift(-1, fill_value='') == 'rain'))
subcond |= subcond.shift(fill_value=False)
cond = ((is_sun_rain & subcond).cumsum() % 2).astype('boolean')
cond |= cond.shift(fill_value=False)
out_df = df[cond]
ngroup = cond.diff()[cond].cumsum()
out_df.assign(ngroup=ngroup)
Weather ngroup
2020-01-06 fine 1
2020-01-07 rain 1
2020-01-10 fine 2
2020-01-11 cloudy 2
2020-01-12 snow 2
2020-01-13 rain 2
2020-01-18 fine 3
2020-01-19 cloudy 3
2020-01-20 snow 3
2020-01-21 rain 3
2020-01-23 fine 4
2020-01-24 snow 4
2020-01-25 rain 4
2020-01-30 fine 5
2020-01-31 cloudy 5
2020-02-01 snow 5
2020-02-02 Snow 5
2020-02-03 snow 5
2020-02-04 rain 5
2020-02-11 fine 6
2020-02-12 cloudy 6
2020-02-13 rain 6
2020-02-16 fine 7
2020-02-17 cloudy 7
2020-02-18 snow 7
2020-02-19 cloudy 7
2020-02-20 snow 7
2020-02-21 snow 7
2020-02-22 rain 7
It is an application of the previous section.
is_sun_rain = df['weather'].isin(set('Sunny rain'))
subdf = df[is_sun_rain]
subcond = ((subdf['weather'] == 'Fine')
& (subdf['weather'].shift(-1, fill_value='') == 'Cloudy')
& (subdf['weather'].shift(-2, fill_value='') == 'rain'))
subcond |= subcond.shift(2, fill_value=False)
cond = ((is_sun_rain & subcond).cumsum() % 2).astype('boolean')
cond |= cond.shift(fill_value=False)
out_df = df[cond]
ngroup = cond.diff()[cond].cumsum()
out_df.assign(ngroup=ngroup)
Weather ngroup
2020-01-10 fine 1
2020-01-11 cloudy 1
2020-01-12 snow 1
2020-01-13 rain 1
2020-01-18 fine 2
2020-01-19 cloudy 2
2020-01-20 snow 2
2020-01-21 rain 2
2020-01-30 fine 3
2020-01-31 cloudy 3
2020-02-01 snow 3
2020-02-02 Snow 3
2020-02-03 snow 3
2020-02-04 rain 3
2020-02-11 fine 4
2020-02-12 cloudy 4
2020-02-13 rain 4
Or
arr, _ = df['weather'].factorize()
# 0: 'Cloudy', 1: 'snow', 2: 'Fine', 3: 'rain'
subarr, = (arr != 1).nonzero()
subcond = np.convolve(arr[subarr], [1, 10, 100]) == 203
flags = np.zeros(arr.size, dtype=int)
flags[subarr[subcond[2:]]] = 1
flags[subarr[subcond[:-2]]+1] = -1
cond = flags.cumsum().astype(bool)
out_df = df[cond]
ngroup = flags[cond].cumsum()
out_df.assign(ngroup=ngroup)
Recommended Posts