[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Debugging a memory leak

Pasha Stetsenko wrote at 2020-10-22 17:51 -0700:
> ...
>I'm a maintainer of a python library "datatable" (can be installed from
>PyPi), and i've been recently trying to debug a memory leak that occurs in
>my library.
>The program that exposes the leak is quite simple:
>import datatable as dt
>import gc  # just in case
>def leak(n=10**7):
>    for i in range(n):
>        z = dt.update()
>input("Press enter")
>Note that despite the name, the `dt.update` is actually a class, though it
>is defined via Python C API. Thus, this script is expected to create and
>then immediately destroy 10 million simple python objects.
>The observed behavior, however,  is  that the script consumes more and more
>memory, eventually ending up at about 500M. The amount of memory the
>program ends up consuming is directly proportional to the parameter `n`.
>The `gc.get_objects()` does not show any extra objects however.

For efficiency reasons, the garbage collector treats only
objects from types which are known to be potentially involved in cycles.
A type implemented in "C" must define `tp_traverse` (in its type
structure) to indicate this possibility.
`tp_traverse` also tells the garbage collector how to find referenced
You will never find an object in the result of `get_objects` the
type of which does not define `tp_traverse`.

> ...
>Thus, the object didn't actually "leak" in the normal sense: its refcount
>is 0 and it was reclaimed by the Python runtime (when i print a debug
>message in tp_dealloc, i see that the destructor gets called every time).
>Still, Python keeps requesting more and more memory from the system instead
>of reusing the memory  that was supposed to be freed.

I would try to debug what happens further in `tp_dealloc` and its callers.
You should eventually see a `PyMem_free` which gives the memory back
to the Python memory management (built on top of the C memory management).

Note that your `tp_dealloc` should not call the "C" library's "free".
Python builds its own memory management (--> "PyMem_*") on top
of the "C" library. It handles all "small" memory requests
and, if necessary, requests big data chunks via `malloc` to split
them into the smaller sizes.
Should you "free" small memory blocks directly via "free", that memory
becomes effectively unusable by Python (unless you have a special
allocation as well).