At one time or another, we all have to deal with Python modules which are
simply wrappers around code written in other languages such as C or Fortran.
Particulatly when it comes to scientific data, there are simply too many
maintained and pre-existing code libraries to ignore.
That Python is great glue for such libraries is one of the great things about
However, code libraries written in static languages without garbage collection
often have memory leaks.
Sometimes the memory leaks are even well known.
Python prevents its own objects from leaking memory through its garbage
But extension libraries can leave unfreed objects which are unreachable by
Python's garbage collector.
These leaks can evenutally crash your Python if allowed to build up by
repeated invocations of the leaky code within the same program.
If you can isolate the offending leaky extension objects into a
function, then you are in luck.
Python will allow you to call that function in its own process.
Then when the process ends, your operating system will reclaim all that leaked
memory for you.
The multiprocessing module is your friend.
Use it to run leaky extensions in their own processes:
# Put your leaky extension code here.
# For instance, matplotlib functions which
# crash with "std::bad_alloc" errors when
# called repeatedly.
for datum in data:
p = multiprocessing.Process(target=sir_leaks_a_lot, args=(datum,))
assert not p.exitcode, \
"Exitcode %s from processing %s" % (p.exitcode, datum)
The start() method will run sir_leaks_a_lot with its parameters bound to the
elements of the args tuple in the call to Process().
The join() method will wait for the process to finish.
You could run multiple sir_leaks_a_lot processes at once intead of allowing
each to finish one at a time.
But then, that would allow the leaks to build up and crash your program again.
So using join() causes each leaky process to get cleaned up before running the
Now you are ready to generate tens of thousands of large matplotlib plots in a
single cron job!