The magic behind Attribute Access in Python

Most people know just one thing when it comes to attribute access – the dot ‘.’ (as in x.some_attribute). In simple terms, attribute access is the way you retrieve an object linked to the one you already have. To someone who uses Python without delving too much into the details, it may seem pretty straightforward. However, under the hood, theres a lot that goes on for this seemingly trivial task.

Lets look at each of the components one by one.

The __dict__ attribute

Every object in Python has an attribute denoted by __dict__. This dictionary/dictionary-like (I will explain this shortly) object contains all the attributes defined for the object itself. It maps the attribute name to its value.

Heres an example:


>>> class C(object):
	x = 4

>>> c = C()
>>> c.y = 5
>>> c.__dict__
{'y': 5}

Notice how 'x' is not in c.__dict__. The reason for this is simple enough. While y was defined for the object c, x was defined for its class (C). Therefore, it will appear in the __dict__ of C. In fact, C‘s __dict__ contains a lot of other keys too (including '__dict__'):


>>> c.__class__.__dict__['x']
4
>>> c.__class__.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'C' objects>, 'x': 4, 
'__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'C' objects>, 
'__doc__': None})

We will look at what dictproxy means soon.

The __dict__ of an object is simple enough to understand. It behaves like a Python dict, and is one too.


>>> c.__dict__
{'y': 5}
>>> c.__dict__.__class__
<type 'dict'>
>>> c.__dict__ = {}
>>> c.y

Traceback (most recent call last):
  File "<pyshell#81>", line 1, in <module>
    c.y
AttributeError: 'C' object has no attribute 'y'
>>> c.__dict__['y'] = 5
>>> c.y
5

The __dict__ of a class however, is not that straight-forward. Its actually an object of a class called dictproxy. dictproxy is a special class whose objects behave like normal dicts, but they differ in some key behaviours.


>>> C.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'C' objects>, 'x': 4, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'C' objects>, '__doc__': None})
>>> C.__dict__.__class__
<type 'dictproxy'>
>>> C.__dict__['x']
4
>>> C.__dict__['x'] = 6

Traceback (most recent call last):
  File "<pyshell#87>", line 1, in <module>
    C.__dict__['x'] = 4
TypeError: 'dictproxy' object does not support item assignment
>>> C.x = 6
>>> C.__dict__ = {}

Traceback (most recent call last):
  File "<pyshell#89>", line 1, in <module>
    C.__dict__ = {}
AttributeError: attribute '__dict__' of 'type' objects is not writable

Notice how you cannot set a key in a dictproxy directly (C.__dict__['x'] = 4 does not work). You can accomplish the same using C.x = 6 however, since the internal behaviour then is different. Also notice how you cannot set the __dict__ attribute itself either(C.__dict__ = {} does not work).

Theres a reason behind this weird implementation. If you don’t want to get into the details, just know that its for the Python interpreter to keep working properly, and to enforce some optimizations. If you want a more detailed explanation, have a look at Scott H’s answer to this StackOverflow question.

Descriptors

A descriptor is an object that has atleast one of the following magic methods in its attributes: __get__, __set__ or __delete__ (Remember, methods are ultimately objects in Python). Mind you, its the object we are talking about. Its class may or may not have implemented them.

Descriptors can help you define the behaviour of an object’s attribute in Python. With each of the magic methods just mentioned, you implement how the attribute (‘described’ by the descriptor) will be retrieved, set and deleted in the object respectively. There are two types of descriptors – Data Descriptors, and Non-Data Descriptors.

Non-Data Descriptors only have __get__ defined. All others are Data Descriptors. You would naturally think, why these two types are called so. The answer is intuitive. Usually, its data-related attributes that we tend to ‘set’ or ‘delete’ with respect to an object. Other attributes, like methods themselves, we don’t. So their descriptors are called Non-Data Descriptors. As with a lot of other things in Python, this is not a hard-and-fast rule, but a convention. You could just as well describe a method with a Data Descriptor. But then, its __get__ should return a function.

Heres an example of two classes that will come up with data and non-data descriptor objects respectively:


class DataDesc(object):
    def __init__(self, name):
        self._name = name

    def __get__(self, obj, objclass):
        try:
            print("Retrieving attr " + self._name + " from " +
                  str(obj) + "...")
            return objclass.x + " + " + obj.y
        except:
            raise AttributeError("Attr " + self._name + " could not be " +
                                 "retrieved from " + str(obj))
    
    def __set__(self, obj, value):
        raise AttributeError("Attr " + self._name + " cannot be " +
                             "set in " + str(obj))

    def __delete__(self, obj):
        raise AttributeError("Attr " + self._name + " cannot be " +
                             "deleted in " + str(obj))

class NonDataDesc(object):
    def __init__(self, name):
        self._name = name

    def __get__(self, obj, objclass):
        try:
            print("Retrieving attr " + self._name + " from " +
                  str(obj) + "...")
            return objclass.x + " + " + obj.y
        except:
            raise AttributeError("Attr " + self._name + " could not be " +
                                 "retrieved from " + str(obj))

Notice how the __get__ function takes in an object obj and (its) class objclass. Similarly, setting the value requires obj and some candidate value. Deletion just needs obj. Taking these parameters in (along with the initializer __init__) helps you differentiate between objects of the same descriptor class. Mind you, its the objects that are intended to be the descriptors.
(P.S. If you don’t define the __get__ method for a descriptor, the descriptor object itself will get returned).

Lets use these classes in some code.


class ParentClass(object):
    x = "x1"
    y = "y1"
    data_attr_parent = DataDesc("desc1")
    data_attr_child = DataDesc("desc2")

class ChildClass(ParentClass):
    x = "x2"
    y = "y2"
    data_attr_child = DataDesc("desc3")
    non_data_attr_child = NonDataDesc("desc4")

some_object = ChildClass()

Thats it! You can access the ‘described’ objects as usual in Python.


>>> some_object.data_attr_parent
Retrieving attr desc1 from <__main__.ChildClass object at 0x1062c5790>...
'x2 + y2'

Descriptors are used for a lot of attribute and method related functionality in Python, including static methods, class methods and properties. Using descriptors, you can gain better control over how attributes and methods of a class/its objects are accessed – including defining some ‘behind the scenes’ functionality like logging.

Now lets look at the high-level rules governing attribute access in Python.

The Rules

Quoting Shalabh Chaturvedi’s book verbatim, the workflow is as follows:

  1. If attrname is a special (i.e. Python-provided) attribute for objectname, return it.
  2. Check objectname.__class__.__dict__ for attrname. If it exists and is a data-descriptor, return the descriptor result. Search all bases of objectname.__class__ for the same case.
  3. Check objectname.__dict__ for attrname, and return if found. If objectname is a class, search its bases too. If it is a class and a descriptor exists in it or its bases, return the descriptor result.
  4. Check objectname.__class__.__dict__ for attrname. If it exists and is a non-data descriptor, return the descriptor result. If it exists, and is not a descriptor, just return it. If it exists and is a data descriptor, we shouldn’t be here because we would have returned at point 2. Search all bases of objectname.__class__for same case.
  5. Raise AttributeError

 

To make things clearer, heres some tinkering using the code we wrote in the Descriptors section (Have a look at it again just to be clear about things):

data_attr_child is a Data descriptor in some_object‘s class. So you cant write over it. Also, the version in ChildClass (‘desc3’) is used, not the one in ParentClass.


>>> some_object.data_attr_child
Retrieving attr desc3 from <__main__.ChildClass object at 0x1110c9790>...
'x2 + y2'
>>> some_object.data_attr_child = 'xyz'

Traceback (most recent call last):
  File "<pyshell#112>", line 1, in <module>
    some_object.data_attr_child = 'xyz'
  File "/Users/srjoglekar/metaclasses.py", line 16, in __set__
    "set in " + str(obj))
AttributeError: Attr desc3 cannot be set in <__main__.ChildClass object at 0x10883f790>

Infact, even if you make an appropriate entry in some_object‘s dict, it still won’t matter (as per Rule 1).


>>> some_object.__dict__['data_attr_child'] = 'xyz'
>>> some_object.data_attr_child
Retrieving attr desc3 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'

The Non-Data Descriptor attribute, on the other hand, can be easily overwritten.


>>> some_object.non_data_attr_child
Retrieving attr desc4 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'
>>> some_object.non_data_attr_child = 'xyz'
>>> some_object.non_data_attr_child
'xyz'
>>> some_object.__dict__
{'data_attr_child': 'xyz', 'non_data_attr_child': 'xyz'}

You can, however, change the behaviour of data_attr_child, if you go to some_object‘s class and modify it in the dictproxy there itself.


>>> some_object.__class__.data_attr_child = 'abc'
>>> some_object.data_attr_child
'xyz'

Notice how the moment you replace the Data-Descriptor in the class with some non-data descriptor (or some object like a String in this case), the entry that we initially made in some_object‘s __dict__ comes into play. Therefore, some_object.data_attr_child returns 'xyz', not 'abc'.

The data_attr_parent attribute behaves similar to data_attr_child.


>>> some_object.data_attr_parent
Retrieving attr desc1 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'
>>> some_object.data_attr_parent = 'xyz'

Traceback (most recent call last):
  File "<pyshell#127>", line 1, in <module>
    some_object.data_attr_parent = 'xyz'
  File "/Users/srjoglekar/metaclasses.py", line 16, in __set__
    "set in " + str(obj))
AttributeError: Attr desc1 cannot be set in <__main__.ChildClass object at 0x10883f790>
>>> some_object.__class__.data_attr_parent = 'xyz'
>>> some_object.__class__.data_attr_parent
'xyz'

Notice how you cant ‘write-over’ data_attr_parent in ChildClass itself. Once you do that, we go through Rules 1-2-3 and stop at 4, to get the result 'xyz'.

Rules for Setting Attributes

Way simpler than the rules for ‘getting them’. Quoting Shalabh’s book again,

  1. Check objectname.__class__.__dict__ for attrname. If it exists and is a data-descriptor, use the descriptor to set the value. Search all bases of objectname.__class__ for the same case.
  2. Insert something into objectname.__dict__ for key "attrname".

Thats it! :-).

__slots__

To put it concisely, __slots__ is a way to disallow objects from having their own __dict__ in Python. This means, that if you define __slots__ in a Class, then you cannot set arbitrary attributes(apart from the ones mentioned in the ‘slots’) on its objects.

Heres an example of such a class:


class SomeClass(object):
    __slots__ = ['x', 'y']

obj = SomeClass()

Now see how this behaves:


>>> obj.x = 4
>>> obj.y = 5
>>> obj.x
4
>>> obj.y
5
>>> obj.z = 6

Traceback (most recent call last):
  File "<pyshell#135>", line 1, in <module>
    obj.z = 6
AttributeError: 'SomeClass' object has no attribute 'z'

You can ofcourse do this:


>>> obj.__class__.z = 6
>>> obj.z
6

But then, remember you have now defined z in SomeClass‘s __dict__, not in obj‘s.

As Guido van Rossum himself mentions in his blog post, __slots__ were implemented in Python to introduce efficiency, not ‘stricter’ attribute-setting. The basic intuition is this: Suppose you have a class, whose objects you intend to construct in a large number. You don’t really need the flexibility of having ‘dynamic’ attributes on the objects themselves, but you want efficiency. Since slots essentially eliminates the __dict__ attribute in each one of the objects, you get a lot of memory savings this way.

Interestingly, slots are implemented using descriptors in Python.

 

Further Reading

Have a look at this book I have already quoted in the post. It goes into a lot of detail regarding attribute access in Python, including method resolutions.

Thats all for now. Cheers!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s