# How to calculate rolling / moving average using python + NumPy / SciPy?

## How to calculate rolling / moving average using python + NumPy / SciPy?

If you just want a straightforward non-weighted moving average, you can easily implement it with `np.cumsum`

, which ~~may be~~ **is** faster than FFT based methods:

**EDIT** Corrected an off-by-one wrong indexing spotted by Bean in the code. **EDIT**

```
def moving_average(a, n=3) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
>>> a = np.arange(20)
>>> moving_average(a)
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.,
12., 13., 14., 15., 16., 17., 18.])
>>> moving_average(a, n=4)
array([ 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5,
10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5])
```

So I guess the answer is: it is really easy to implement, and maybe numpy is already a little bloated with specialized functionality.

A simple way to achieve this is by using `np.convolve`

.

The idea behind this is to leverage the way the discrete convolution is computed and use it to return a *rolling mean*. This can be done by convolving with a sequence of `np.ones`

of a length equal to the sliding window length we want.

In order to do so we could define the following function:

```
def moving_average(x, w):
return np.convolve(x, np.ones(w), valid) / w
```

This function will be taking the convolution of the sequence `x`

and a sequence of ones of length `w`

. Note that the chosen `mode`

is `valid`

so that the convolution product is only given for points where the sequences overlap completely.

Some examples:

```
x = np.array([5,3,8,10,2,1,5,1,0,2])
```

For a moving average with a window of length `2`

we would have:

```
moving_average(x, 2)
# array([4. , 5.5, 9. , 6. , 1.5, 3. , 3. , 0.5, 1. ])
```

And for a window of length `4`

:

```
moving_average(x, 4)
# array([6.5 , 5.75, 5.25, 4.5 , 2.25, 1.75, 2. ])
```

## How does `convolve`

work?

Lets have a more in depth look at the way the discrete convolution is being computed.

The following function aims to replicate the way `np.convolve`

is computing the output values:

```
def mov_avg(x, w):
for m in range(len(x)-(w-1)):
yield sum(np.ones(w) * x[m:m+w]) / w
```

Which, for the same example above would also yield:

```
list(mov_avg(x, 2))
# [4.0, 5.5, 9.0, 6.0, 1.5, 3.0, 3.0, 0.5, 1.0]
```

So what is being done at each step is to take the inner product between the array of ones and the current * window*. In this case the multiplication by `np.ones(w)`

is superfluous given that we are directly taking the `sum`

of the sequence.

Bellow is an example of how the first outputs are computed so that it is a little clearer. Lets suppose we want a window of `w=4`

:

```
[1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*5 + 1*3 + 1*8 + 1*10) / w = 6.5
```

And the following output would be computed as:

```
[1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*3 + 1*8 + 1*10 + 1*2) / w = 5.75
```

And so on, returning a moving average of the sequence once all overlaps have been performed.

#### How to calculate rolling / moving average using python + NumPy / SciPy?

NumPys lack of a particular domain-specific function is perhaps due to the Core Teams discipline and fidelity to NumPys prime directive: *provide an N-dimensional array type*, as well as functions for creating, and indexing those arrays. Like many foundational objectives, this one is not small, and NumPy does it brilliantly.

The (much) larger ** SciPy** contains a much larger collection of domain-specific libraries (called

**by SciPy devs)–for instance, numerical optimization (**

*subpackages**optimize*), signal processsing (

*signal*), and integral calculus (

*integrate*).

My guess is that the function you are after is in at least one of the SciPy subpackages (*scipy.signal* perhaps); however, i would look first in the collection of ** SciPy scikits**, identify the relevant scikit(s) and look for the function of interest there.

Scikits are independently developed packages based on NumPy/SciPy and directed to a particular technical discipline (e.g., *scikits-image*, *scikits-learn*, etc.) Several of these were (in particular, the awesome OpenOpt for numerical optimization) were highly regarded, mature projects long before choosing to reside under the relatively new *scikits* rubric. The *Scikits* homepage liked to above lists about 30 such *scikits*, though at least several of those are no longer under active development.

Following this advice would lead you to *scikits-timeseries*; however, that package is no longer under active development; In effect, ** Pandas** has become, AFAIK, the

*de facto*

*NumPy*-based time series library.

*Pandas* has several functions that can be used to calculate a *moving average*; the simplest of these is probably **rolling_mean**, which you use like so:

```
>>> # the recommended syntax to import pandas
>>> import pandas as PD
>>> import numpy as NP
>>> # prepare some fake data:
>>> # the date-time indices:
>>> t = PD.date_range(1/1/2010, 12/31/2012, freq=D)
>>> # the data:
>>> x = NP.arange(0, t.shape[0])
>>> # combine the data & index into a Pandas Series object
>>> D = PD.Series(x, t)
```

Now, just call the function **rolling_mean** passing in the Series object and a *window size*, which in my example below is *10 days*.

```
>>> d_mva = PD.rolling_mean(D, 10)
>>> # d_mva is the same size as the original Series
>>> d_mva.shape
(1096,)
>>> # though obviously the first w values are NaN where w is the window size
>>> d_mva[:3]
2010-01-01 NaN
2010-01-02 NaN
2010-01-03 NaN
```

verify that it worked–e.g., compared values 10 – 15 in the original series versus the new Series smoothed with rolling mean

```
>>> D[10:15]
2010-01-11 2.041076
2010-01-12 2.041076
2010-01-13 2.720585
2010-01-14 2.720585
2010-01-15 3.656987
Freq: D
>>> d_mva[10:20]
2010-01-11 3.131125
2010-01-12 3.035232
2010-01-13 2.923144
2010-01-14 2.811055
2010-01-15 2.785824
Freq: D
```

The function rolling_mean, along with about a dozen or so other function are informally grouped in the Pandas documentation under the rubric *moving window* functions; a second, related group of functions in Pandas is referred to as exponentially-weighted functions (e.g., *ewma*, which calculates exponentially moving weighted average). The fact that this second group is not included in the first (*moving window* functions) is perhaps because the exponentially-weighted transforms dont rely on a fixed-length window