# python – Moving average or running mean

## python – Moving average or running mean

UPDATE: more efficient solutions have been proposed, `uniform_filter1d` from `scipy` being probably the best among the standard 3rd-party libraries, and some newer or specialized libraries are available too.

You can use `np.convolve` for that:

``````np.convolve(x, np.ones(N)/N, mode=valid)
``````

### Explanation

The running mean is a case of the mathematical operation of convolution. For the running mean, you slide a window along the input and compute the mean of the windows contents. For discrete 1D signals, convolution is the same thing, except instead of the mean you compute an arbitrary linear combination, i.e., multiply each element by a corresponding coefficient and add up the results. Those coefficients, one for each position in the window, are sometimes called the convolution kernel. The arithmetic mean of N values is `(x_1 + x_2 + ... + x_N) / N`, so the corresponding kernel is `(1/N, 1/N, ..., 1/N)`, and thats exactly what we get by using `np.ones(N)/N`.

### Edges

The `mode` argument of `np.convolve` specifies how to handle the edges. I chose the `valid` mode here because I think thats how most people expect the running mean to work, but you may have other priorities. Here is a plot that illustrates the difference between the modes:

``````import numpy as np
import matplotlib.pyplot as plt
modes = [full, same, valid]
for m in modes:
plt.plot(np.convolve(np.ones(200), np.ones(50)/50, mode=m));
plt.axis([-10, 251, -.1, 1.1]);
plt.legend(modes, loc=lower center);
plt.show()
`````` ## Efficient solution

Convolution is much better than straightforward approach, but (I guess) it uses FFT and thus quite slow. However specially for computing the running mean the following approach works fine

``````def running_mean(x, N):
cumsum = numpy.cumsum(numpy.insert(x, 0, 0))
return (cumsum[N:] - cumsum[:-N]) / float(N)
``````

The code to check

``````In: x = numpy.random.random(100000)
In: N = 1000
In: %timeit result1 = numpy.convolve(x, numpy.ones((N,))/N, mode=valid)
10 loops, best of 3: 41.4 ms per loop
In: %timeit result2 = running_mean(x, N)
1000 loops, best of 3: 1.04 ms per loop
``````

Note that `numpy.allclose(result1, result2)` is `True`, two methods are equivalent.
The greater N, the greater difference in time.

### warning: although cumsum is faster there will be increased floating point error that may cause your results to be invalid/incorrect/unacceptable

``````# demonstrate loss of precision with only 100,000 points
np.random.seed(42)
x = np.random.randn(100000)+1e6
y1 = running_mean_convolve(x, 10)
y2 = running_mean_cumsum(x, 10)
assert np.allclose(y1, y2, rtol=1e-12, atol=0)
``````
• the more points you accumulate over the greater the floating point error (so 1e5 points is noticable, 1e6 points is more significant, more than 1e6 and you may want to resetting the accumulators)
• you can cheat by using `np.longdouble` but your floating point error still will get significant for relatively large number of points (around >1e5 but depends on your data)
• you can plot the error and see it increasing relatively fast
• the convolve solution is slower but does not have this floating point loss of precision
• the uniform_filter1d solution is faster than this cumsum solution AND does not have this floating point loss of precision

#### python – Moving average or running mean

Update: The example below shows the old `pandas.rolling_mean` function which has been removed in recent versions of pandas. A modern equivalent of that function call would use pandas.Series.rolling:

``````In : pd.Series(x).rolling(window=N).mean().iloc[N-1:].values
Out:
array([ 0.49815397,  0.49844183,  0.49840518, ...,  0.49488191,
0.49456679,  0.49427121])
``````

pandas is more suitable for this than NumPy or SciPy. Its function rolling_mean does the job conveniently. It also returns a NumPy array when the input is an array.

It is difficult to beat `rolling_mean` in performance with any custom pure Python implementation. Here is an example performance against two of the proposed solutions:

``````In : import numpy as np

In : import pandas as pd

In : def running_mean(x, N):
...:     cumsum = np.cumsum(np.insert(x, 0, 0))
...:     return (cumsum[N:] - cumsum[:-N]) / N
...:

In : x = np.random.random(100000)

In : N = 1000

In : %timeit np.convolve(x, np.ones((N,))/N, mode=valid)
10 loops, best of 3: 172 ms per loop

In : %timeit running_mean(x, N)
100 loops, best of 3: 6.72 ms per loop

In : %timeit pd.rolling_mean(x, N)[N-1:]
100 loops, best of 3: 4.74 ms per loop

In : np.allclose(pd.rolling_mean(x, N)[N-1:], running_mean(x, N))
Out: True
``````

There are also nice options as to how to deal with the edge values.