python – Manager dict in Multiprocessing

python – Manager dict in Multiprocessing

Here is what you wrote:

# from here code executes in main process and all child processes
# every process makes all these imports
from multiprocessing import Process, Manager

# every process creates own manager and d
manager = Manager() 
# BTW, Manager is also child process, and 
# in its initialization it creates new Manager, and new Manager
# creates new and new and new
# Did you checked how many python processes were in your system? - a lot!
d = manager.dict()

def f():
    # d - is that d, that is defined in globals in this, current process 
    d[1].append(4)
    print d

if __name__ == __main__:
# from here code executes ONLY in main process 
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

Here is what you should have written:

from multiprocessing import Process, Manager
def f(d):
    d[1] = d[1] + [4]
    print d

if __name__ == __main__:
    manager = Manager() # create only 1 mgr
    d = manager.dict() # create only 1 dict
    d[1] = []
    p = Process(target=f,args=(d,)) # say to f, in which d it should append
    p.start()
    p.join()

The reason that the new item appended to d[1] is not printed is stated in Pythons official documentation:

Modifications to mutable values or items in dict and list proxies will
not be propagated through the manager, because the proxy has no way of
knowing when its values or items are modified. To modify such an item,
you can re-assign the modified object to the container proxy.

Therefore, this is actually what happens:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # invoke d.__getitem__(), returning a local copy of the empty list assigned by the main process,
    # (consider that a KeyError exception wasnt raised, so a list was definitely returned),
    # and append 4 to it, however this change is not propagated through the manager,
    # as its performed on an ordinary list with which the manager has no interaction
    d[1].append(4)
    # convert d to string via d.__str__() (see https://docs.python.org/2/reference/datamodel.html#object.__str__),
    # returning the remote string representation of the object (see https://docs.python.org/2/library/multiprocessing.html#multiprocessing.managers.SyncManager.list),
    # to which the change above was not propagated
    print d

if __name__ == __main__:
    # invoke d.__setitem__(), propagating this assignment (mapping 1 to an empty list) through the manager
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

Reassigning d[1] with a new list, or even with the same list once again, after it was updated, triggers the manager to propagate the change:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # perform the exact same steps, as explained in the comments to the previous code snippet above,
    # but in addition, invoke d.__setitem__() with the changed item in order to propagate the change
    l = d[1]
    l.append(4)
    d[1] = l
    print d

if __name__ == __main__:
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

The line d[1] += [4] would have worked as well.


EDIT for Python 3.6 or later:

Since Python 3.6, per this changeset following this issue, its also possible to use nested Proxy Objects which automatically propagate any changes performed on them to the containing Proxy Object. Thus, replacing the line d[1] = [] with d[1] = manager.list() would correct the issue as well:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    d[1].append(4)
    # the __str__() method of a dict object invokes __repr__() on each of its items,
    # so explicitly invoking __str__() is required in order to print the actual list items
    print({k: str(v) for k, v in d.items()}

if __name__ == __main__:
    d[1] = manager.list()
    p = Process(target=f)
    p.start()
    p.join()

Unfortunately, this bug fix was not ported to Python 2.7 (as of Python 2.7.13).


NOTE (running under the Windows operating system):

Although the described behaviour applies to the Windows operating system as well, the attached code snippets would fail when executed under Windows due to the different process creation mechanism, that relies on the CreateProcess() API rather than the fork() system call, which isnt supported.

Whenever a new process is created via the multiprocessing module, Windows creates a fresh Python interpreter process that imports the main module, with potentially hazardous side effects. In order to circumvent this issue, the following programming guideline is recommended:

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

Therefore, executing the attached code snippets as is under Windows would try to create an infinite number of processes due to the manager = Manager() line. This can be easily fixed by creating the Manager and Manager.dict objects inside the if __name__ == __main__ clause and passing the Manager.dict object as an argument to f(), as done in this answer.

More details on the issue may be found in this answer.

python – Manager dict in Multiprocessing

I think this is a bug in manager proxy calls. You can circumvent avoiding call methods of shared list, like:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # get the shared list
    shared_list = d[1]

    shared_list.append(4)

    # forces the shared list to 
    # be serialized back to manager
    d[1] = shared_list

    print d

if __name__ == __main__:
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

    print d

Leave a Reply

Your email address will not be published. Required fields are marked *