Releasing memory in Python

Releasing memory in Python

Im guessing the question you really care about here is:

Is there a way to force Python to release all the memory that was used (if you know you wont be using that much memory again)?

No, there is not. But there is an easy workaround: child processes.

If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and wont touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.

This isnt completely trivial and free, but its pretty easy and cheap, which is usually good enough for the trade to be worthwhile.

First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

If you need a little more control, use the multiprocessing module.

The costs are:

  • Process startup is kind of slow on some platforms, notably Windows. Were talking milliseconds here, not minutes, and if youre spinning up one child to do 300 seconds worth of work, you wont even notice it. But its not free.
  • If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course youre saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
  • Sending large amounts of data between processes can be slow. Again, if youre talking about sending over 2K of arguments and getting back 64K of results, you wont even notice it, but if youre sending and receiving large amounts of data, youll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
  • Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).

Memory allocated on the heap can be subject to high-water marks. This is complicated by Pythons internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes — up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.

Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.

Try it like this, and tell me what you get. Heres the link for psutil.Process.memory_info.

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
mem0 = proc.memory_info().rss

# create approx. 10**7 int objects and pointers
foo = [abc for x in range(10**7)]
mem1 = proc.memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
mem3 = proc.memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print Allocation: %0.2f%% % pd(mem1, mem0)
print Unreference: %0.2f%% % pd(mem2, mem1)
print Collect: %0.2f%% % pd(mem3, mem2)
print Overall: %0.2f%% % pd(mem3, mem0)


Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%


I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.

The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isnt surprising if the heap shrinks by more — even a lot more — than the block that you free.

In 3.x range doesnt create a list, so the test above wont create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesnt implement a freelist.

Releasing memory in Python

eryksun has answered question #1, and Ive answered question #3 (the original #4), but now lets answer question #2:

Why does it release 50.5mb in particular – what is the amount that is released based on?

What its based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.

First, depending on how youre measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as freed, even though it hasnt been freed.

Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.

If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what youre asking about), and pages have really been deallocated, two circumstances in which this can happen: Either youve used brk or equivalent to shrink the data segment (very rare nowadays), or youve used munmap or similar to release a mapped segment. (Theres also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)

But most programs dont directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). Theres no way your application can reasonably predict this, or even detect that it happened in advance.

CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you dont even directly control when the top-level objects are deallocated.

So, when you release an object, how do you know whether its going to release memory to the OS? Well, first you have to know that youve released the last reference (including any internal references you didnt know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as its allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, youre releasing the PyString object, and the string buffer).

If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how its implemented. (It obviously cant happen unless youre deallocating the last thing in the block, and even then, it may not happen.)

If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how its implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)

If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how its implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)

So, if you want to understand why it happened to release exactly 50.5mb, youre going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? Youd have to read your platforms malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isnt usually a problem.) And then you have to do the same thing at the 3 levels above that.

So, the only useful answer to the question is Because.

Unless youre doing resource-limited (e.g., embedded) development, you have no reason to care about these details.

And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).

Leave a Reply

Your email address will not be published. Required fields are marked *