Recently, I try to create a trie tree with python. My data has 3M+ different words, which ate 2G+ of my memory. It made rough disappointed, so I searched online for solutions. And I found the greatest improvement is to using __slots__ property. It saves me 50% of the memory, so I think this a damn pitfall. Watch out, really.

 

Example:

 

class Node(object):

    '''without __slots__

    '''


    ChildrenFactory = dict


    def __init__(self, value=NULL):

        self.value = value

        self.children = self.ChildrenFactory()

 

 

class Node(object):

    '''

    :ivar value: The value of the key corresponding to this node or :const:`NULL`

        if there is no such key.

    :ivar children: A ``{key-part : child-node}`` mapping.

    '''

    __slots__ = ('value', 'children')


    ChildrenFactory = dict


    def __init__(self, value=NULL):

        self.value = value

        self.children = self.ChildrenFactory()

 

 

So what is __slots__ doing here?

 

Quoting Jacob Hallen:

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots. While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Unfortunately there is a side effect to slots. They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies. This is bad, because the control freaks should be abusing the metaclasses and the static typing weenies should be abusing decorators, since in Python, there should be only one obvious way of doing something.

Making CPython smart enough to handle saving space without __slots__ is a major undertaking, which is probably why it is not on the list of changes for P3k (yet).

Tags: python

Posted in python |

Leave a Reply