Python multiprocessing PicklingError: Cant pickle
Python multiprocessing PicklingError: Cant pickle
Here is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of a module.
This piece of code:
import multiprocessing as mp
class Foo():
@staticmethod
def work(self):
pass
if __name__ == __main__:
pool = mp.Pool()
foo = Foo()
pool.apply_async(foo.work)
pool.close()
pool.join()
yields an error almost identical to the one you posted:
Exception in thread Thread-2:
Traceback (most recent call last):
File /usr/lib/python2.7/threading.py, line 552, in __bootstrap_inner
self.run()
File /usr/lib/python2.7/threading.py, line 505, in run
self.__target(*self.__args, **self.__kwargs)
File /usr/lib/python2.7/multiprocessing/pool.py, line 315, in _handle_tasks
put(task)
PicklingError: Cant pickle <type function>: attribute lookup __builtin__.function failed
The problem is that the pool
methods all use a mp.SimpleQueue
to pass tasks to the worker processes. Everything that goes through the mp.SimpleQueue
must be pickable, and foo.work
is not picklable since it is not defined at the top level of the module.
It can be fixed by defining a function at the top level, which calls foo.work()
:
def work(foo):
foo.work()
pool.apply_async(work,args=(foo,))
Notice that foo
is pickable, since Foo
is defined at the top level and foo.__dict__
is picklable.
Id use pathos.multiprocesssing
, instead of multiprocessing
. pathos.multiprocessing
is a fork of multiprocessing
that uses dill
. dill
can serialize almost anything in python, so you are able to send a lot more around in parallel. The pathos
fork also has the ability to work directly with multiple argument functions, as you need for class methods.
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool(4)
>>> class Test(object):
... def plus(self, x, y):
... return x+y
...
>>> t = Test()
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]
>>>
>>> class Foo(object):
... @staticmethod
... def work(self, x):
... return x+1
...
>>> f = Foo()
>>> p.apipe(f.work, f, 100)
<processing.pool.ApplyResult object at 0x10504f8d0>
>>> res = _
>>> res.get()
101
Get pathos
(and if you like, dill
) here:
https://github.com/uqfoundation
Python multiprocessing PicklingError: Cant pickle
When this problem comes up with multiprocessing
a simple solution is to switch from Pool
to ThreadPool
. This can be done with no change of code other than the import-
from multiprocessing.pool import ThreadPool as Pool
This works because ThreadPool shares memory with the main thread, rather than creating a new process- this means that pickling is not required.
The downside to this method is that python isnt the greatest language with handling threads- it uses something called the Global Interpreter Lock to stay thread safe, which can slow down some use cases here. However, if youre primarily interacting with other systems (running HTTP commands, talking with a database, writing to filesystems) then your code is likely not bound by CPU and wont take much of a hit. In fact Ive found when writing HTTP/HTTPS benchmarks that the threaded model used here has less overhead and delays, as the overhead from creating new processes is much higher than the overhead for creating new threads and the program was otherwise just waiting for HTTP responses.
So if youre processing a ton of stuff in python userspace this might not be the best method.