pandas – Python – rolling functions for GroupBy object
pandas – Python – rolling functions for GroupBy object
For the Googlers who come upon this old question:
Regarding @kekerts comment on @Garretts answer to use the new
df.groupby(id)[x].rolling(2).mean()
rather than the now-deprecated
df.groupby(id)[x].apply(pd.rolling_mean, 2, min_periods=1)
curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.
So I think Ive figured out a solution that uses the new rolling() method and still works the same:
df.groupby(id)[x].rolling(2).mean().reset_index(0,drop=True)
which should give you the series
0 0.0
1 0.5
2 1.5
3 3.0
4 3.5
5 4.5
which you can add as a column:
df[x] = df.groupby(id)[x].rolling(2).mean().reset_index(0,drop=True)
cumulative sum
To answer the question directly, the cumsum method would produced the desired series:
In [17]: df
Out[17]:
id x
0 a 0
1 a 1
2 a 2
3 b 3
4 b 4
5 b 5
In [18]: df.groupby(id).x.cumsum()
Out[18]:
0 0
1 1
2 3
3 3
4 7
5 12
Name: x, dtype: int64
pandas rolling functions per group
More generally, any rolling function can be applied to each group as follows (using the new .rolling method as commented by @kekert). Note that the return type is a multi-indexed series, which is different from previous (deprecated) pd.rolling_* methods.
In [10]: df.groupby(id)[x].rolling(2, min_periods=1).sum()
Out[10]:
id
a 0 0.00
1 1.00
2 3.00
b 3 3.00
4 7.00
5 9.00
Name: x, dtype: float64
To apply the per-group rolling function and receive result in original dataframe order, transform should be used instead:
In [16]: df.groupby(id)[x].transform(lambda s: s.rolling(2, min_periods=1).sum())
Out[16]:
0 0
1 1
2 3
3 3
4 7
5 9
Name: x, dtype: int64
deprecated approach
For reference, heres how the now deprecated pandas.rolling_mean behaved:
In [16]: df.groupby(id)[x].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]:
0 0.0
1 0.5
2 1.5
3 3.0
4 3.5
5 4.5
pandas – Python – rolling functions for GroupBy object
Here is another way that generalizes well and uses pandas expanding method.
It is very efficient and also works perfectly for rolling window calculations with fixed windows, such as for time series.
# Import pandas library
import pandas as pd
# Prepare columns
x = range(0, 6)
id = [a, a, a, b, b, b]
# Create dataframe from columns above
df = pd.DataFrame({id:id, x:x})
# Calculate rolling sum with infinite window size (i.e. all rows in group) using expanding
df[rolling_sum] = df.groupby(id)[x].transform(lambda x: x.expanding().sum())
# Output as desired by original poster
print(df)
id x rolling_sum
0 a 0 0
1 a 1 1
2 a 2 3
3 b 3 3
4 b 4 7
5 b 5 12