Most people know just one thing when it comes to attribute access – the dot ‘.’ (as in x.some_attribute
). In simple terms, attribute access is the way you retrieve an object linked to the one you already have. To someone who uses Python without delving too much into the details, it may seem pretty straightforward. However, under the hood, theres a lot that goes on for this seemingly trivial task.
Lets look at each of the components one by one.
The __dict__ attribute
Every object in Python has an attribute denoted by __dict__
. This dictionary/dictionary-like (I will explain this shortly) object contains all the attributes defined for the object itself. It maps the attribute name to its value.
Heres an example:
>>> class C(object):
x = 4
>>> c = C()
>>> c.y = 5
>>> c.__dict__
{'y': 5}
Notice how 'x'
is not in c.__dict__
. The reason for this is simple enough. While y
was defined for the object c
, x
was defined for its class (C
). Therefore, it will appear in the __dict__
of C
. In fact, C
‘s __dict__
contains a lot of other keys too (including '__dict__'
):
>>> c.__class__.__dict__['x']
4
>>> c.__class__.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'C' objects>, 'x': 4,
'__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'C' objects>,
'__doc__': None})
We will look at what dictproxy
means soon.
The __dict__
of an object is simple enough to understand. It behaves like a Python dict
, and is one too.
>>> c.__dict__
{'y': 5}
>>> c.__dict__.__class__
<type 'dict'>
>>> c.__dict__ = {}
>>> c.y
Traceback (most recent call last):
File "<pyshell#81>", line 1, in <module>
c.y
AttributeError: 'C' object has no attribute 'y'
>>> c.__dict__['y'] = 5
>>> c.y
5
The __dict__
of a class however, is not that straight-forward. Its actually an object of a class called dictproxy
. dictproxy
is a special class whose objects behave like normal dict
s, but they differ in some key behaviours.
>>> C.__dict__
dict_proxy({'__dict__': <attribute '__dict__' of 'C' objects>, 'x': 4, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'C' objects>, '__doc__': None})
>>> C.__dict__.__class__
<type 'dictproxy'>
>>> C.__dict__['x']
4
>>> C.__dict__['x'] = 6
Traceback (most recent call last):
File "<pyshell#87>", line 1, in <module>
C.__dict__['x'] = 4
TypeError: 'dictproxy' object does not support item assignment
>>> C.x = 6
>>> C.__dict__ = {}
Traceback (most recent call last):
File "<pyshell#89>", line 1, in <module>
C.__dict__ = {}
AttributeError: attribute '__dict__' of 'type' objects is not writable
Notice how you cannot set a key in a dictproxy
directly (C.__dict__['x'] = 4
does not work). You can accomplish the same using C.x = 6
however, since the internal behaviour then is different. Also notice how you cannot set the __dict__
attribute itself either(C.__dict__ = {}
does not work).
Theres a reason behind this weird implementation. If you don’t want to get into the details, just know that its for the Python interpreter to keep working properly, and to enforce some optimizations. If you want a more detailed explanation, have a look at Scott H’s answer to this StackOverflow question.
Descriptors
A descriptor is an object that has atleast one of the following magic methods in its attributes: __get__
, __set__
or __delete__
(Remember, methods are ultimately objects in Python). Mind you, its the object we are talking about. Its class may or may not have implemented them.
Descriptors can help you define the behaviour of an object’s attribute in Python. With each of the magic methods just mentioned, you implement how the attribute (‘described’ by the descriptor) will be retrieved, set and deleted in the object respectively. There are two types of descriptors – Data Descriptors, and Non-Data Descriptors.
Non-Data Descriptors only have __get__
defined. All others are Data Descriptors. You would naturally think, why these two types are called so. The answer is intuitive. Usually, its data-related attributes that we tend to ‘set’ or ‘delete’ with respect to an object. Other attributes, like methods themselves, we don’t. So their descriptors are called Non-Data Descriptors. As with a lot of other things in Python, this is not a hard-and-fast rule, but a convention. You could just as well describe a method with a Data Descriptor. But then, its __get__
should return a function.
Heres an example of two classes that will come up with data and non-data descriptor objects respectively:
class DataDesc(object):
def __init__(self, name):
self._name = name
def __get__(self, obj, objclass):
try:
print("Retrieving attr " + self._name + " from " +
str(obj) + "...")
return objclass.x + " + " + obj.y
except:
raise AttributeError("Attr " + self._name + " could not be " +
"retrieved from " + str(obj))
def __set__(self, obj, value):
raise AttributeError("Attr " + self._name + " cannot be " +
"set in " + str(obj))
def __delete__(self, obj):
raise AttributeError("Attr " + self._name + " cannot be " +
"deleted in " + str(obj))
class NonDataDesc(object):
def __init__(self, name):
self._name = name
def __get__(self, obj, objclass):
try:
print("Retrieving attr " + self._name + " from " +
str(obj) + "...")
return objclass.x + " + " + obj.y
except:
raise AttributeError("Attr " + self._name + " could not be " +
"retrieved from " + str(obj))
Notice how the __get__
function takes in an object obj
and (its) class objclass
. Similarly, setting the value requires obj
and some candidate value
. Deletion just needs obj
. Taking these parameters in (along with the initializer __init__
) helps you differentiate between objects of the same descriptor class. Mind you, its the objects that are intended to be the descriptors.
(P.S. If you don’t define the __get__
method for a descriptor, the descriptor object itself will get returned).
Lets use these classes in some code.
class ParentClass(object):
x = "x1"
y = "y1"
data_attr_parent = DataDesc("desc1")
data_attr_child = DataDesc("desc2")
class ChildClass(ParentClass):
x = "x2"
y = "y2"
data_attr_child = DataDesc("desc3")
non_data_attr_child = NonDataDesc("desc4")
some_object = ChildClass()
Thats it! You can access the ‘described’ objects as usual in Python.
>>> some_object.data_attr_parent
Retrieving attr desc1 from <__main__.ChildClass object at 0x1062c5790>...
'x2 + y2'
Descriptors are used for a lot of attribute and method related functionality in Python, including static methods, class methods and properties. Using descriptors, you can gain better control over how attributes and methods of a class/its objects are accessed – including defining some ‘behind the scenes’ functionality like logging.
Now lets look at the high-level rules governing attribute access in Python.
The Rules
Quoting Shalabh Chaturvedi’s book verbatim, the workflow is as follows:
- If
attrname
is a special (i.e. Python-provided) attribute for objectname
, return it.
- Check
objectname.__class__.__dict__
for attrname
. If it exists and is a data-descriptor, return the descriptor result. Search all bases of objectname.__class__
for the same case.
- Check
objectname.__dict__
for attrname
, and return if found. If objectname
is a class, search its bases too. If it is a class and a descriptor exists in it or its bases, return the descriptor result.
- Check
objectname.__class__.__dict__
for attrname
. If it exists and is a non-data descriptor, return the descriptor result. If it exists, and is not a descriptor, just return it. If it exists and is a data descriptor, we shouldn’t be here because we would have returned at point 2. Search all bases of objectname.__class__
for same case.
- Raise
AttributeError
To make things clearer, heres some tinkering using the code we wrote in the Descriptors section (Have a look at it again just to be clear about things):
data_attr_child
is a Data descriptor in some_object
‘s class. So you cant write over it. Also, the version in ChildClass
(‘desc3’) is used, not the one in ParentClass
.
>>> some_object.data_attr_child
Retrieving attr desc3 from <__main__.ChildClass object at 0x1110c9790>...
'x2 + y2'
>>> some_object.data_attr_child = 'xyz'
Traceback (most recent call last):
File "<pyshell#112>", line 1, in <module>
some_object.data_attr_child = 'xyz'
File "/Users/srjoglekar/metaclasses.py", line 16, in __set__
"set in " + str(obj))
AttributeError: Attr desc3 cannot be set in <__main__.ChildClass object at 0x10883f790>
Infact, even if you make an appropriate entry in some_object
‘s dict, it still won’t matter (as per Rule 1).
>>> some_object.__dict__['data_attr_child'] = 'xyz'
>>> some_object.data_attr_child
Retrieving attr desc3 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'
The Non-Data Descriptor attribute, on the other hand, can be easily overwritten.
>>> some_object.non_data_attr_child
Retrieving attr desc4 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'
>>> some_object.non_data_attr_child = 'xyz'
>>> some_object.non_data_attr_child
'xyz'
>>> some_object.__dict__
{'data_attr_child': 'xyz', 'non_data_attr_child': 'xyz'}
You can, however, change the behaviour of data_attr_child
, if you go to some_object
‘s class and modify it in the dictproxy there itself.
>>> some_object.__class__.data_attr_child = 'abc'
>>> some_object.data_attr_child
'xyz'
Notice how the moment you replace the Data-Descriptor in the class with some non-data descriptor (or some object like a String in this case), the entry that we initially made in some_object
‘s __dict__
comes into play. Therefore, some_object.data_attr_child
returns 'xyz'
, not 'abc'
.
The data_attr_parent
attribute behaves similar to data_attr_child
.
>>> some_object.data_attr_parent
Retrieving attr desc1 from <__main__.ChildClass object at 0x10883f790>...
'x2 + y2'
>>> some_object.data_attr_parent = 'xyz'
Traceback (most recent call last):
File "<pyshell#127>", line 1, in <module>
some_object.data_attr_parent = 'xyz'
File "/Users/srjoglekar/metaclasses.py", line 16, in __set__
"set in " + str(obj))
AttributeError: Attr desc1 cannot be set in <__main__.ChildClass object at 0x10883f790>
>>> some_object.__class__.data_attr_parent = 'xyz'
>>> some_object.__class__.data_attr_parent
'xyz'
Notice how you cant ‘write-over’ data_attr_parent
in ChildClass
itself. Once you do that, we go through Rules 1-2-3 and stop at 4, to get the result 'xyz'
.
Rules for Setting Attributes
Way simpler than the rules for ‘getting them’. Quoting Shalabh’s book again,
- Check
objectname.__class__.__dict__
for attrname
. If it exists and is a data-descriptor, use the descriptor to set the value. Search all bases of objectname.__class__
for the same case.
- Insert
something
into objectname.__dict__
for key "attrname"
.
Thats it! :-).
__slots__
To put it concisely, __slots__
is a way to disallow objects from having their own __dict__
in Python. This means, that if you define __slots__
in a Class, then you cannot set arbitrary attributes(apart from the ones mentioned in the ‘slots’) on its objects.
Heres an example of such a class:
class SomeClass(object):
__slots__ = ['x', 'y']
obj = SomeClass()
Now see how this behaves:
>>> obj.x = 4
>>> obj.y = 5
>>> obj.x
4
>>> obj.y
5
>>> obj.z = 6
Traceback (most recent call last):
File "<pyshell#135>", line 1, in <module>
obj.z = 6
AttributeError: 'SomeClass' object has no attribute 'z'
You can ofcourse do this:
>>> obj.__class__.z = 6
>>> obj.z
6
But then, remember you have now defined z
in SomeClass
‘s __dict__
, not in obj
‘s.
As Guido van Rossum himself mentions in his blog post, __slots__
were implemented in Python to introduce efficiency, not ‘stricter’ attribute-setting. The basic intuition is this: Suppose you have a class, whose objects you intend to construct in a large number. You don’t really need the flexibility of having ‘dynamic’ attributes on the objects themselves, but you want efficiency. Since slots essentially eliminates the __dict__
attribute in each one of the objects, you get a lot of memory savings this way.
Interestingly, slots are implemented using descriptors in Python.
Further Reading
Have a look at this book I have already quoted in the post. It goes into a lot of detail regarding attribute access in Python, including method resolutions.
Thats all for now. Cheers!