parallel processing – How do I parallelize a simple Python loop?
parallel processing – How do I parallelize a simple Python loop?
Using multiple threads on CPython wont give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing
module instead:
pool = multiprocessing.Pool(4)
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this wont work in the interactive interpreter.
To avoid the usual FUD around the GIL: There wouldnt be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.
from joblib import Parallel, delayed
def process(i):
return i * i
results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10))
print(results) # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib
).
Taken from https://blog.dominodatalab.com/simple-parallelization/
Edit on Mar 31, 2021: On joblib
, multiprocessing
, threading
and asyncio
joblib
in the above code usesimport multiprocessing
under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores – because of the GIL)- You can let
joblib
use multiple threads instead of multiple processes, but this (or usingimport threading
directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread - Since Python 3.7, as an alternative to
threading
, you can parallelise work with asyncio, but the same advice applies like forimport threading
(though in contrast to latter, only 1 thread will be used; on the plus side,asyncio
has a lot of nice features which are helpful for async programming) - Using multiple processes incurs overhead. Think about it: Typically, each process needs to initialise/load everything you need to run your calculation. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that
joblib
produces better results:
import time
from joblib import Parallel, delayed
def countdown(n):
while n>0:
n -= 1
return n
t = time.time()
for _ in range(20):
print(countdown(10**7), end= )
print(time.time() - t)
# takes ~10.5 seconds on medium sized Macbook Pro
t = time.time()
results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20))
print(results)
print(time.time() - t)
# takes ~6.3 seconds on medium sized Macbook Pro
parallel processing – How do I parallelize a simple Python loop?
To parallelize a simple for loop, joblib brings a lot of value to raw use of multiprocessing. Not only the short syntax, but also things like transparent bunching of iterations when they are very fast (to remove the overhead) or capturing of the traceback of the child process, to have better error reporting.
Disclaimer: I am the original author of joblib.