sábado, 17 de março de 2007

deleting instance -memory leak

Qt objects, Python objects and shadow objects

What happens when you create a Qt object from Python? A Qt object is an instance of a C++ class, but in order to be able to manipulate that object from Python, there needs to be a Python object, too. The answer is that both are created: a C++ object that contains the real functionality, and a Python object that "wraps" the C++ object.

Thus, when you call QWidget() from Python, three things are created:

*

The C++ QWidget instance. Actually it will be a sipQWidget instance which is a sub-class of QWidget - it is needed to be a catcher for QWidget's virtual methods and to expose its protected methods and enums.
*

the Python object shadow/proxy which is a thin wrapper around the C++ instance and which has a unique Python type.
*

the Python class instance that you actually deal with in your script. Its instance dictionary contains a reference (named __sipThis__) to the shadow object.

The two Python objects are needed because the programmer wants a class instance, but a Python class instance doesn't provide you with the capability of wrapping a C++ pointer.

In fact, both the Qt library and the PyQt library can create C++ objects. These are passed between them. For instance, an object could be created by PyQt and passed to Qt, or it could be created by Qt and passed to Python. Both Qt and Python have the concept of deleting things — so there has to be a method of making sure that C++ instances are deleted properly. If both try to delete the same instance your program crashes, if neither do then you get memory leaks. Therefore there is the concept of ownership: the current owner (PyQt or Qt) is responsible for deleting the C++ instance. Ownership of an instance may be transferred between PyQt and C++ during the life of the instance.

Mostly you won't need to concern yourself with this problem, since PyQt knows exactly when to transfer ownership of C++ instances automatically.. Complications arise if you create QObject derived objects that ‘own', through the QObject parent-child mechanism, other objects. (This ownership of objects by other objects is one of the places where Qt deviates from the C++ standard practice, where the object that creates another object should also take care of deleting it.)

http://www.commandprompt.com/community/pyqt/x2393

=====================================

Deletion of a name removes the binding of that name from the local or
global namespace, depending on whether the name occurs in a global
statement in the same code block. If the name is unbound, a NameError
exception will be raised.

It is illegal to delete a name from the local namespace if it occurs as
a free variable in a nested block.

Deletion of attribute references, subscriptions and slicings is passed
to the primary object involved; deletion of a slicing is in general
equivalent to assignment of an empty slice of the right type (but even
this is determined by the sliced object).

Essentially, the del statement deletes a reference to an object. There may
be other references to the object, and even if there aren't the garbage
collector might not collect the object immediately, if at all.

http://www.thescripts.com/forum/thread30655.html

=========================

You can explicitly call the '__del__' method, but that doesn't
'del' the object. But if your expect to explicitly call your
__del__ function, then maybe it shouldn't be *called* "__del__".
Consider objects like sockets and files ( and similar user
written classes ) where there is an explicit close() method,
which closes the file or connection, but keeps the object
in existance, and an implicit destructor method which will
call the close method before the object goes away.

http://www.python.org/search/hypermail/python-1994q3/0084.html

=====================================

[ Path: > Evan Jones' Scratch Pad | Written by Evan Jones ]
[ 2004-October-19 19:50 ]

[Note: This problem is fixed in Python 2.5] I really enjoy working with Python. One benefit is that it automatically manages memory, as any high level language worth the name should. Unfortunately, there are times when abstractions break, leaving you wondering what the heck is actually going on under the covers. This week, I broke Python's automatic memory management. I'm working with a program that does lengthy searches through a graph. It places objects in a queue to track each path through the search space. With some graphs, there can be millions of paths, which eats up a ton of memory. It is not unusual to see my Python process eating up a gigabyte of RAM. The problem is that after each search I do a few hours of computation on the result. This computation needs very little memory, but for some reason, my program was not releasing any memory, which slowed down the whole system. I thought I had a memory leak, but the actual problem was much more interesting. [Updated 2005-05-28: Follow-up post with details about my patch] [Note: This article has been translated into Korean and Chinese]

Even though some high level languages handle memory automatically doesn't mean that memory leaks are impossible. A leak in C or C++ is created when you allocate memory using malloc or new, and never deallocate it. In Python or Java, a memory leak is created when you accidentally leave a reference to some massive data structure hanging around, so the entire thing cannot be freed. The solution is to find this reference and get rid of it.

I began looking for my memory leak by creating a minimal test that ran a big search. Next, I added "del reference" statements one variable at a time, trying to find which one was holding on to memory. I deleted all my variables and still my program's size did not change. Next, I added calls to force Python's cyclic garbage collector to run, in case I had circular references that were keeping the objects from being deleted. Python still consumed a gigabyte of memory. Finally, I turned to Google to see if anyone else has had a similar problem. I turned up a mailing list thread about Python never returning memory to the operating system. It turns out that this is a flaw with the Python interpreter. People work around it by spawning multiple processes or using data structures on disk. Then I was curious, why would Python choose to never free memory? Many hours later, I now know far more than I ever wanted to about how Python deals with memory.

Python does a lot of allocations and deallocations. All objects, including "simple" types like integers and floats, are stored on the heap. Calling malloc and free for each variable would be very slow. Hence, the Python interpreter uses a variety of optimized memory allocation schemes. The most important one is a malloc implementation called pymalloc, designed specifically to handle large numbers of small allocations. Any object that is smaller than 256 bytes uses this allocator, while anything larger uses the system's malloc. This implementation never returns memory to the operating system. Instead, it holds on to it in case it is needed again. This is efficient when it is used again in a short time, but is wasteful if a long time passes before it is needed.

So how can Python's memory usage sometimes decrease if it never releases small objects? Common Python objects, such as integers, floats, and lists, maintain their own, private memory pools which are not shared with any other type. Integers and floats use a policy similar to pymalloc: They never get released. Lists, on the other hand, are freed. Python only keeps a small, fixed number of unused list objects (currently 80), which limits the wasted memory. Most importantly, the actual array stored in the list is freed immediately, which is a significant amount of memory for big lists.

There are some serious inefficiencies in the interpreter's memory allocation policies. For example, applications like mine, always hold on to the peak amount of memory, even though the average memory consumed is much lower. This can be an issue for long running servers, such as Twisted, Zope or Plone servers, or applications where memory is critical. For example, Plone recommends using a caching server with lots of RAM, so having Python release memory that is not currently being used would be helpful in that configuration. Another problem is that the private memory pools are not shared between types. For example, if you allocate then free a large number of integers, the cached available memory cannot be used to allocate floats. I am attempting to rewrite Python's memory allocator, to try and make Python use memory more efficiently. I'll keep you posted on how it goes.



http://evanjones.ca/python-memory.html