This article is a compilation of the content announced at PyCon JP 2014 held on September 12-14, 2014.
A descriptor is an object that defines the following methods.
class Descriptor(object):
def __get__(self, obj, type=None): pass
def __set__(self, obj, value): pass
def __delete__(self, obj): pass
In Python, a set of methods that an object with a particular property should implement is called a protocol (a typical protocol is the Iterator Protocol (http://docs.python.jp/3.4/library/stdtypes.). html # typeiter) etc.). Descriptors are one such protocol.
This descriptor is used behind basic Python functionality such as properties, methods (static methods, class methods, instance methods), and super
. Descriptors are also a generic protocol and can be user-defined.
There are two main types of descriptors.
__get__
and __set__
__get__
Data descriptors behave like regular attribute access, typically properties. Non-data descriptors are typically used in method calls.
Data descriptors that raise a `ʻAttributeErrorwhen
setis called are called" read-only data descriptors ". Read-only properties for which
fset`` is not defined are classified as read-only data descriptors rather than non-data descriptors.
This classification affects the priority of attribute access. Specifically, the order of priority is as follows.
We'll see more about this later on why this is so.
At this point, you may be wondering how descriptors and properties are different.
First, for different purposes, properties are usually used as decorators in class definitions to customize attribute access for instances of that class. Descriptors, on the other hand, are defined independently of a particular class and are used to customize attribute access for other classes.
More essentially, properties are a type of descriptor. In other words, the descriptor has a wider range of applications, and conversely, the property can be said to be specialized for the common usage of the descriptor.
X.Y
If you write X.Y
in your source code, what's happening behind the scenes is complicated, contrary to its simple appearance. In fact, what happens depends on whether X
is a class or instance, and whether Y
is a property, method, or regular attribute.
In the case of instance attribute, it means to refer to the value corresponding to the specified key from the instance attribute dictionary __dict__
.
class C(object):
def __init__(self):
self.x = 1
obj = C()
assert obj.x == obj.__dict__['x']
For class attributes, it means referencing values from the class's attribute dictionary, both via the class and via the instance.
class C(object):
x = 1
assert C.x == C.__dict__['x']
obj = C()
assert obj.x == C.__dict__['x']
So far the story is easy.
In the case of a property, it is the property itself when referenced from the class, and the return value of the function when referenced from the instance.
class C(object):
@property
def x(self):
return 1
assert C.x == C.__dict__['x'].__get__(None, C)
#Property itself when referenced from class
assert isinstance(C.x, property)
obj = C()
assert obj.x == C.__dict__['x'].__get__(obj, C)
#Function return value when referenced from an instance
assert obj.x == 1
Behind the scenes, the object's __get__
method is called by looking up the value from the class's attribute dictionary. Descriptors are used in this part. At this time, the first argument of __get__
is None
when via a class, and that instance when via an instance, and the value obtained will differ depending on this difference.
Methods are basically the same as properties. Since the descriptor is called behind the scenes, different values will be obtained when referencing via the class and when referencing via the instance.
class C(object):
def x(self):
return 1
assert C.x == C.__dict__['x'].__get__(None, C)
obj = C()
assert obj.x == C.__dict__['x'].__get__(obj, C)
assert C.x != obj.x
__getattribute__
You can customize all attribute access for your class by overriding \ _ \ _ getattribute \ _ \ _. The difference, on the other hand, is that descriptors allow you to customize specific attribute access.
In addition, the built-in __getattribute__
implementation takes descriptors into account, and as a result the descriptors behave as intended. This is the essential relationship.
Typical classes that implement __getattribute__
are object``, `` type``, and `` super``. Here we will compare
objectand
type``.
PyBaseObject_Type
corresponding to the object`` type in the Python [source code](http://hg.python.org/cpython/file/v3.4.1/Objects/typeobject.c#l4208) Since the structure is defined and the function `` PyObject_GenericGetAttr`` is specified in the slot `` tp_getattro``,
object.__ getattribute__`` calls this function.
The definition of this function can be found in Objects / object.c, which is in Python pseudocode. It looks like this:
def object_getattribute(self, key):
"Emulate PyObject_GenericGetAttr() in Objects/object.c"
tp = type(self)
attr = PyType_Lookup(tp, key)
if attr:
if hasattr(attr, '__get__') and hasattr(attr, '__set__'):
# data descriptor
return attr.__get__(self, tp)
if key in self.__dict__:
return self.__dict__[key]
if attr:
if hasattr(attr, '__get__'):
return attr.__get__(self, tp)
return attr
raise AttributeError
There are three main blocks, each of which makes 1) a data descriptor call, 2) an instance's own attribute dictionary reference, and 3) a non-data descriptor call or a class's attribute dictionary reference.
First of all, get the class of the object and search for the attributes of that class. Think of PyType_Lookup
as a function that traverses a class and its parent class and returns the value corresponding to the specified key from the attribute dictionary. If the attribute is found here and it is a data descriptor, then its __get__
will be called. If the data descriptor is not found, the instance's attribute dictionary is referenced and any values are returned. Finally, it checks again for the class attribute, and if it is a descriptor, __get__
is called, otherwise it returns the value itself. If no value is found, `ʻAttributeError`` is thrown.
Similarly, type.__ getattribute__
in ```Objects / typeobject.c It is defined in the
PyType_Type`` structure (http://hg.python.org/cpython/file/v3.4.1/Objects/typeobject.c#l3122).
This is expressed in Python pseudocode as follows:
def type_getattribute(cls, key):
"Emulate type_getattro() in Objects/typeobject.c"
meta = type(cls)
metaattr = PyType_Lookup(meta, key)
if metaattr:
if hasattr(metaattr, '__get__') and hasattr(metaattr, '__set__'):
# data descriptor
return metaattr.__get__(cls, meta)
attr = PyType_Lookup(cls, key)
if attr:
if hasattr(attr, '__get__'):
return attr.__get__(None, cls)
return attr
if metaattr:
if hasattr(metaattr, '__get__'):
return metaattr.__get__(cls, meta)
return metaattr
raise AttributeError
The first half and the second half are processed in the same way as for ʻobject``, so I will omit it (note that the class for the instance corresponds to the metaclass for the class), but the middle block is
This is different from the case of ʻobject. In the case of `ʻobject
, it was just a reference to the attribute dictionary, but in the case of a class, it traces the parent class to refer to the attribute dictionary, and if it is a descriptor, it calls the descriptor __get__
. I am.
To summarize what we have seen so far
type (self)
for `ʻobjectand
clsfor
typeas arguments to
get`` (It has become)For example, if you have code like this, even if you put the value directly into __dict__
, the property will take precedence.
class C(object):
@property
def x(self):
return 0
>>> o = C()
>>> o.__dict__['x'] = 1
>>> o.x
0
Now let's look at some specific descriptor examples.
According to the descriptor protocol, properties can be defined as Pure Python code as follows:
class Property(object):
"Emulate PyProperty_Type() in Objects/descrobject.c"
def __init__(self, fget=None, fset=None, fdel=None):
self.fget = fget
self.fset = fset
self.fdel = fdel
def __get__(self, obj, klass=None):
if obj is None:
# via class
return self
if self.fget is not None:
return self.fget(obj)
raise AttributeError
def __set__(self, obj, value):
if self.fset is not None:
self.fset(obj, value)
raise AttributeError
def __delete__(self, obj):
if self.fdel is not None:
self.fdel(obj)
raise AttributeError
In __get__
, if obj
is None
, that is, it returns itself when called via a class. If the fget
passed in the constructor is not None
, it calls fget
, and if it is None
, it throws ```AttributeError``.
The pseudo code for staticmethod
is as follows.
class StaticMethod(object):
"Emulate PyStaticMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
self.f = f
def __get__(self, obj, klass=None):
return self.f
This is easy, it always returns the function itself when __get__
is called. Therefore, staticmethod
behaves the same as the original function, whether called via a class or an instance.
The pseudo code for classmethod
is as follows.
class ClassMethod(object):
"Emulate PyClassMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
self.f = f
def __get__(self, obj, klass=None):
if klass is None:
klass = type(obj)
return types.MethodType(self.f, klass)
When __get__
is called, it creates and returns a MethodType
object from the function and class. In reality, this object's __call__
will be called immediately afterwards.
Instance methods are actually functions. For example, when there is such a class and function
class C(object):
pass
def f(self, x):
return x
Calling the __get__
function f
returns a MethodType
object. Calling this will return the same result as if you were calling an instance method. In this case, f
is a function that has nothing to do with the class C
, but it ends up being called as a method.
obj = C()
# obj.f(1)To emulate
meth = f.__get__(obj, C)
assert isinstance(meth, types.MethodType)
assert meth(1) == 1
This is a more extreme example of how a function is a descriptor.
>>> def f(x, y): return x + y
...
>>> f
<function f at 0x10e51b1b8>
>>> f.__get__(1)
<bound method int.f of 1>
>>> f.__get__(1)(2)
3
The function f
defined here is just a two-argument function that is neither a method nor anything, but when you call that __get__
, a bound method is returned. If you pass an argument to it and call it, you can see that the function call is made. As you can see, all functions are descriptors, and when called via a class, the descriptors act as methods.
An instance method, that is, a function as a descriptor, is represented by pseudo code like this.
class Function(object):
"Emulate PyFunction_Type() in Objects/funcobject.c"
def __get__(self, obj, klass=None):
if obj is None:
return self
return types.MethodType(self, obj)
When called via a class, it returns itself, and when called via an instance, it creates a MethodType
object from the function and instance and returns it.
The pseudo code for MethodType.__call__
is as follows. All we have to do is take __self__
and __func__
and add self
to the first argument of the function to call the function.
def method_call(meth, *args, **kw):
"Emulate method_call() in Objects/classobject.c"
self = meth.__self__
func = meth.__func__
return func(self, *args, **kw)
To summarize the story so far,
obj.func(x)
The method call is equivalent to the following processing.
func = type(obj).__dict__['func']
meth = func.__get__(obj, type(obj))
meth.__call__(x)
This is ultimately equivalent to a function call like this:
func(obj, x)
Let's get a little off here, but let's think about why the first argument of a method in Python is self
. The reason can be explained as follows based on the story so far. In Python, the instance method is actually a function, and the invocation of the instance method is finally converted into a simple function call by the action of the descriptor. It's just a function, so it's natural to pass it as an argument when passing the equivalent of self
. If the first argument, self
, could be omitted, different conventions would have to be used for function calls and method calls, complicating the language specification. I think Python's mechanics of using descriptors to translate method calls into function calls rather than treating functions and methods separately are very clever.
In Python 3, if you reference an instance method via a class, the function itself will be returned, but in Python 2, the unbound method will be returned. To refer to the function itself, you need to refer to the attribute __func__
. This writing will result in an error in Python 3, so be careful when porting to Python 3 if you have code like this. In Python 3, the concept of the unbound method has disappeared in the first place.
class C(object):
def f(self):
pass
$ python3
>>> C.f # == C.__dict__['f']
<function C.f at 0x10356ab00>
$ python2
>>> C.f # != C.__dict__['f']
<unbound method C.f>
>>> C.f.__func__ # == C.__dict__['f']
<function f at 0x10e02d050>
super
Another example of where descriptors are used is super
. See the example below.
class C(object):
def x(self):
pass
class D(C):
def x(self):
pass
class E(D):
pass
obj = E()
assert super(D, obj).x == C.__dict__['x'].__get__(obj, D)
In this example, super (D, obj) .x
gets the value corresponding to x
from the attribute dictionary of class C
and puts ```obj in its __get__
. It means to call with and
Das arguments. The point here is that the class that gets the attributes is
Cinstead of
D. The key is in the implementation of the
getattributeof the
super`` class.
The pseudo code for super.__ getattribute__
is as follows.
def super_getattribute(su, key):
"Emulate super_getattro() in Objects/typeobject.c"
starttype = su.__self_class__
mro = iter(starttype.__mro__)
for cls in mro:
if cls is su.__self_class__:
break
# Note: mro is an iterator, so the second loop
# picks up where the first one left off!
for cls in mro:
if key in cls.__dict__:
attr = cls.__dict__[key]
if hasattr(attr, '__get__'):
return attr.__get__(su.__self__, starttype)
return attr
raise AttributeError
Searches the mro inheritance tree for the first specified class for the "next" (or "above") class for that class. Then, starting from that point, the attribute dictionary is referenced while tracing the inheritance tree, and if the found attribute is a descriptor, the descriptor is called. This is the mechanism of super
.
super
is also a descriptor. However, this doesn't seem to be used very effectively in Python today. I found the only code in the Python source code in the test: http://hg.python.org/cpython/file/v3.4.1/Lib/test/test_descr.py#l2308
When I looked it up, PEP 367 suggested a specification called self.__super__.foo ()
, maybe It may have something to do with this. By the way, this PEP was finally adopted in Python 3 as PEP 3135, but in that case `` super () This notation was not adopted because the argument to
can be omitted.
reify
Finally, here is an example of a user-defined descriptor.
http://docs.pylonsproject.org/docs/pyramid/en/latest/_modules/pyramid/decorator.html#reify
This is the code for reify
in the web framework Pyramid. reify
is like a cached property, similar functionality exists in other frameworks, but the Pyramid implementation is very smart with descriptors. .. The point is the part where setattr
is executed in the __get__
method. Here, the value obtained by the function call is set in the attribute dictionary of the instance, so that the descriptor call will not occur from the next time. Because reify
is a non-data descriptor, the instance's attribute dictionary takes precedence.
Recommended Posts