python – Most efficient way to map function over numpy array

python – Most efficient way to map function over numpy array

Ive tested all suggested methods plus np.array(map(f, x)) with perfplot (a small project of mine).

Message #1: If you can use numpys native functions, do that.

If the function youre trying to vectorize already is vectorized (like the x**2 example in the original post), using that is much faster than anything else (note the log scale):

enter

If you actually need vectorization, it doesnt really matter much which variant you use.

enter


Code to reproduce the plots:

import numpy as np
import perfplot
import math


def f(x):
    # return math.sqrt(x)
    return np.sqrt(x)


vf = np.vectorize(f)


def array_for(x):
    return np.array([f(xi) for xi in x])


def array_map(x):
    return np.array(list(map(f, x)))


def fromiter(x):
    return np.fromiter((f(xi) for xi in x), x.dtype)


def vectorize(x):
    return np.vectorize(f)(x)


def vectorize_without_init(x):
    return vf(x)


b = perfplot.bench(
    setup=np.random.rand,
    n_range=[2 ** k for k in range(20)],
    kernels=[
        f,
        array_for,
        array_map,
        fromiter,
        vectorize,
        vectorize_without_init,
    ],
    xlabel=len(x),
)
b.save(out1.svg)
b.show()

How about using numpy.vectorize.

import numpy as np
x = np.array([1, 2, 3, 4, 5])
squarer = lambda t: t ** 2
vfunc = np.vectorize(squarer)
vfunc(x)
# Output : array([ 1,  4,  9, 16, 25])

python – Most efficient way to map function over numpy array

TL;DR

As noted by @user2357112, a direct method of applying the function is always the fastest and simplest way to map a function over Numpy arrays:

import numpy as np
x = np.array([1, 2, 3, 4, 5])
f = lambda x: x ** 2
squares = f(x)

Generally avoid np.vectorize, as it does not perform well, and has (or had) a number of issues. If you are handling other data types, you may want to investigate the other methods shown below.

Comparison of methods

Here are some simple tests to compare three methods to map a function, this example using with Python 3.6 and NumPy 1.15.4. First, the set-up functions for testing:

import timeit
import numpy as np

f = lambda x: x ** 2
vf = np.vectorize(f)

def test_array(x, n):
    t = timeit.timeit(
        np.array([f(xi) for xi in x]),
        from __main__ import np, x, f, number=n)
    print(array: {0:.3f}.format(t))

def test_fromiter(x, n):
    t = timeit.timeit(
        np.fromiter((f(xi) for xi in x), x.dtype, count=len(x)),
        from __main__ import np, x, f, number=n)
    print(fromiter: {0:.3f}.format(t))

def test_direct(x, n):
    t = timeit.timeit(
        f(x),
        from __main__ import x, f, number=n)
    print(direct: {0:.3f}.format(t))

def test_vectorized(x, n):
    t = timeit.timeit(
        vf(x),
        from __main__ import x, vf, number=n)
    print(vectorized: {0:.3f}.format(t))

Testing with five elements (sorted from fastest to slowest):

x = np.array([1, 2, 3, 4, 5])
n = 100000
test_direct(x, n)      # 0.265
test_fromiter(x, n)    # 0.479
test_array(x, n)       # 0.865
test_vectorized(x, n)  # 2.906

With 100s of elements:

x = np.arange(100)
n = 10000
test_direct(x, n)      # 0.030
test_array(x, n)       # 0.501
test_vectorized(x, n)  # 0.670
test_fromiter(x, n)    # 0.883

And with 1000s of array elements or more:

x = np.arange(1000)
n = 1000
test_direct(x, n)      # 0.007
test_fromiter(x, n)    # 0.479
test_array(x, n)       # 0.516
test_vectorized(x, n)  # 0.945

Different versions of Python/NumPy and compiler optimization will have different results, so do a similar test for your environment.

Leave a Reply

Your email address will not be published. Required fields are marked *