Introduction

This article is a compilation of the content announced at PyCon JP 2014 held on September 12-14, 2014.

What is a descriptor?

A descriptor is an object that defines the following methods.

class Descriptor(object):
    def __get__(self, obj, type=None): pass
    def __set__(self, obj, value): pass
    def __delete__(self, obj): pass

In Python, a set of methods that an object with a particular property should implement is called a protocol (a typical protocol is the Iterator Protocol (http://docs.python.jp/3.4/library/stdtypes.). html # typeiter) etc.). Descriptors are one such protocol.

This descriptor is used behind basic Python functionality such as properties, methods (static methods, class methods, instance methods), and super. Descriptors are also a generic protocol and can be user-defined.

There are two main types of descriptors.

Data descriptor
Descriptor that defines both __get__ and __set__
Non-data descriptor
Descriptor that defines only __get__

Data descriptors behave like regular attribute access, typically properties. Non-data descriptors are typically used in method calls.

Data descriptors that raise a `ʻAttributeErrorwhen setis called are called" read-only data descriptors ". Read-only properties for which fset`` is not defined are classified as read-only data descriptors rather than non-data descriptors.

This classification affects the priority of attribute access. Specifically, the order of priority is as follows.

Data descriptor
Instance attribute dictionary
Non-data descriptor

We'll see more about this later on why this is so.

Difference from property

At this point, you may be wondering how descriptors and properties are different.

First, for different purposes, properties are usually used as decorators in class definitions to customize attribute access for instances of that class. Descriptors, on the other hand, are defined independently of a particular class and are used to customize attribute access for other classes.

More essentially, properties are a type of descriptor. In other words, the descriptor has a wider range of applications, and conversely, the property can be said to be specialized for the common usage of the descriptor.

What's going on behind `X.Y`

If you write X.Y in your source code, what's happening behind the scenes is complicated, contrary to its simple appearance. In fact, what happens depends on whether X is a class or instance, and whether Y is a property, method, or regular attribute.

For instance attributes

In the case of instance attribute, it means to refer to the value corresponding to the specified key from the instance attribute dictionary __dict__.

class C(object):
    def __init__(self):
        self.x = 1
  
obj = C()
assert obj.x == obj.__dict__['x']

For class attributes

For class attributes, it means referencing values from the class's attribute dictionary, both via the class and via the instance.

class C(object):
    x = 1

assert C.x == C.__dict__['x']

obj = C()
assert obj.x == C.__dict__['x']

So far the story is easy.

For properties

In the case of a property, it is the property itself when referenced from the class, and the return value of the function when referenced from the instance.

class C(object):
    @property
    def x(self):
        return 1

assert C.x == C.__dict__['x'].__get__(None, C)
#Property itself when referenced from class
assert isinstance(C.x, property)

obj = C()
assert obj.x == C.__dict__['x'].__get__(obj, C)
#Function return value when referenced from an instance
assert obj.x == 1

Behind the scenes, the object's __get__ method is called by looking up the value from the class's attribute dictionary. Descriptors are used in this part. At this time, the first argument of __get__ is None when via a class, and that instance when via an instance, and the value obtained will differ depending on this difference.

For methods

Methods are basically the same as properties. Since the descriptor is called behind the scenes, different values will be obtained when referencing via the class and when referencing via the instance.

class C(object):
    def x(self):
        return 1

assert C.x == C.__dict__['x'].__get__(None, C)

obj = C()
assert obj.x == C.__dict__['x'].__get__(obj, C)

assert C.x != obj.x

Relationship between descriptor and `getattribute`

You can customize all attribute access for your class by overriding \ _ \ _ getattribute \ _ \ _. The difference, on the other hand, is that descriptors allow you to customize specific attribute access.

In addition, the built-in __getattribute__ implementation takes descriptors into account, and as a result the descriptors behave as intended. This is the essential relationship.

Typical classes that implement __getattribute__ are object``, `` type``, and `` super``. Here we will compare objectand type``.

PyBaseObject_Type corresponding to the object`` type in the Python [source code](http://hg.python.org/cpython/file/v3.4.1/Objects/typeobject.c#l4208) Since the structure is defined and the function `` PyObject_GenericGetAttr`` is specified in the slot `` tp_getattro``, object.__ getattribute__`` calls this function.

The definition of this function can be found in Objects / object.c, which is in Python pseudocode. It looks like this:


def object_getattribute(self, key):
    "Emulate PyObject_GenericGetAttr() in Objects/object.c"
    tp = type(self)
    attr = PyType_Lookup(tp, key)
    if attr:
        if hasattr(attr, '__get__') and hasattr(attr, '__set__'):
            # data descriptor
            return attr.__get__(self, tp)
    if key in self.__dict__:
        return self.__dict__[key]
    if attr:
        if hasattr(attr, '__get__'):
            return attr.__get__(self, tp)
        return attr
    raise AttributeError

There are three main blocks, each of which makes 1) a data descriptor call, 2) an instance's own attribute dictionary reference, and 3) a non-data descriptor call or a class's attribute dictionary reference.

First of all, get the class of the object and search for the attributes of that class. Think of PyType_Lookup as a function that traverses a class and its parent class and returns the value corresponding to the specified key from the attribute dictionary. If the attribute is found here and it is a data descriptor, then its __get__ will be called. If the data descriptor is not found, the instance's attribute dictionary is referenced and any values are returned. Finally, it checks again for the class attribute, and if it is a descriptor, __get__ is called, otherwise it returns the value itself. If no value is found, `ʻAttributeError`` is thrown.

Similarly, type.__ getattribute__ in ```Objects / typeobject.c It is defined in the PyType_Type`` structure (http://hg.python.org/cpython/file/v3.4.1/Objects/typeobject.c#l3122).

This is expressed in Python pseudocode as follows:


def type_getattribute(cls, key):
    "Emulate type_getattro() in Objects/typeobject.c"
    meta = type(cls)
    metaattr = PyType_Lookup(meta, key)
    if metaattr:
        if hasattr(metaattr, '__get__') and hasattr(metaattr, '__set__'):
            # data descriptor
            return metaattr.__get__(cls, meta)
    attr = PyType_Lookup(cls, key)
    if attr:
        if hasattr(attr, '__get__'):
            return attr.__get__(None, cls)
        return attr
    if metaattr:
        if hasattr(metaattr, '__get__'):
            return metaattr.__get__(cls, meta)
        return metaattr
    raise AttributeError

The first half and the second half are processed in the same way as for ʻobject``, so I will omit it (note that the class for the instance corresponds to the metaclass for the class), but the middle block is This is different from the case of ʻobject. In the case of `ʻobject, it was just a reference to the attribute dictionary, but in the case of a class, it traces the parent class to refer to the attribute dictionary, and if it is a descriptor, it calls the descriptor __get__. I am.

To summarize what we have seen so far

Descriptors are always retrieved from the class's attribute dictionary
Descriptors are always called with a class (type (self) for `ʻobjectand clsfor typeas arguments to get`` (It has become)
Data descriptors take precedence over instance attribute dictionaries

For example, if you have code like this, even if you put the value directly into __dict__, the property will take precedence.

class C(object):
    @property
    def x(self):
        return 0

>>> o = C()
>>> o.__dict__['x'] = 1
>>> o.x
0

Specific example of descriptor

Now let's look at some specific descriptor examples.

Property

According to the descriptor protocol, properties can be defined as Pure Python code as follows:

class Property(object):
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel

    def __get__(self, obj, klass=None):
        if obj is None:
            # via class
            return self
        if self.fget is not None:
            return self.fget(obj)
        raise AttributeError

    def __set__(self, obj, value):
        if self.fset is not None:
            self.fset(obj, value)
        raise AttributeError

    def __delete__(self, obj):
        if self.fdel is not None:
            self.fdel(obj)
        raise AttributeError

In __get__, if obj is None, that is, it returns itself when called via a class. If the fget passed in the constructor is not None, it calls fget, and if it is None, it throws ```AttributeError``.

Static method

The pseudo code for staticmethod is as follows.

class StaticMethod(object):
    "Emulate PyStaticMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, klass=None):
        return self.f

This is easy, it always returns the function itself when __get__ is called. Therefore, staticmethod behaves the same as the original function, whether called via a class or an instance.

Class method

The pseudo code for classmethod is as follows.

class ClassMethod(object):
    "Emulate PyClassMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, klass=None):
        if klass is None:
            klass = type(obj)
        return types.MethodType(self.f, klass)

When __get__ is called, it creates and returns a MethodType object from the function and class. In reality, this object's __call__ will be called immediately afterwards.

Instance method

Instance methods are actually functions. For example, when there is such a class and function

class C(object):
    pass

def f(self, x):
    return x

Calling the __get__ function f returns a MethodType object. Calling this will return the same result as if you were calling an instance method. In this case, f is a function that has nothing to do with the class C, but it ends up being called as a method.

obj = C()
# obj.f(1)To emulate
meth = f.__get__(obj, C)
assert isinstance(meth, types.MethodType)
assert meth(1) == 1

This is a more extreme example of how a function is a descriptor.

>>> def f(x, y): return x + y
...
>>> f
<function f at 0x10e51b1b8>
>>> f.__get__(1)
<bound method int.f of 1>
>>> f.__get__(1)(2)
3

The function f defined here is just a two-argument function that is neither a method nor anything, but when you call that __get__, a bound method is returned. If you pass an argument to it and call it, you can see that the function call is made. As you can see, all functions are descriptors, and when called via a class, the descriptors act as methods.

An instance method, that is, a function as a descriptor, is represented by pseudo code like this.

class Function(object):
    "Emulate PyFunction_Type() in Objects/funcobject.c"

    def __get__(self, obj, klass=None):
        if obj is None:
            return self
        return types.MethodType(self, obj)

When called via a class, it returns itself, and when called via an instance, it creates a MethodType object from the function and instance and returns it.

The pseudo code for MethodType.__call__ is as follows. All we have to do is take __self__ and __func__ and add self to the first argument of the function to call the function.

def method_call(meth, *args, **kw):
    "Emulate method_call() in Objects/classobject.c"
    self = meth.__self__
    func = meth.__func__
    return func(self, *args, **kw)

To summarize the story so far,

obj.func(x)

The method call is equivalent to the following processing.

func = type(obj).__dict__['func']
meth = func.__get__(obj, type(obj))
meth.__call__(x)

This is ultimately equivalent to a function call like this:

func(obj, x)

Let's get a little off here, but let's think about why the first argument of a method in Python is self. The reason can be explained as follows based on the story so far. In Python, the instance method is actually a function, and the invocation of the instance method is finally converted into a simple function call by the action of the descriptor. It's just a function, so it's natural to pass it as an argument when passing the equivalent of self. If the first argument, self, could be omitted, different conventions would have to be used for function calls and method calls, complicating the language specification. I think Python's mechanics of using descriptors to translate method calls into function calls rather than treating functions and methods separately are very clever.

In Python 3, if you reference an instance method via a class, the function itself will be returned, but in Python 2, the unbound method will be returned. To refer to the function itself, you need to refer to the attribute __func__. This writing will result in an error in Python 3, so be careful when porting to Python 3 if you have code like this. In Python 3, the concept of the unbound method has disappeared in the first place.

class C(object):
      def f(self):
          pass

$ python3
>>> C.f  # == C.__dict__['f']
<function C.f at 0x10356ab00>

$ python2
>>> C.f  # != C.__dict__['f']
<unbound method C.f>
>>> C.f.__func__  # == C.__dict__['f']
<function f at 0x10e02d050>

super

Another example of where descriptors are used is super. See the example below.

class C(object):
    def x(self):
        pass

class D(C):
    def x(self):
        pass

class E(D):
    pass

obj = E()
assert super(D, obj).x == C.__dict__['x'].__get__(obj, D)

In this example, super (D, obj) .x gets the value corresponding to x from the attribute dictionary of class C and puts ```obj in its __get__. It means to call with and Das arguments. The point here is that the class that gets the attributes is Cinstead of D. The key is in the implementation of the getattributeof the super`` class.

The pseudo code for super.__ getattribute__ is as follows.

def super_getattribute(su, key):
    "Emulate super_getattro() in Objects/typeobject.c"
    starttype = su.__self_class__
    mro = iter(starttype.__mro__)
    for cls in mro:
        if cls is su.__self_class__:
            break
    # Note: mro is an iterator, so the second loop
    # picks up where the first one left off!
    for cls in mro:
        if key in cls.__dict__:
            attr = cls.__dict__[key]
            if hasattr(attr, '__get__'):
                return attr.__get__(su.__self__, starttype)
            return attr
    raise AttributeError

Searches the mro inheritance tree for the first specified class for the "next" (or "above") class for that class. Then, starting from that point, the attribute dictionary is referenced while tracing the inheritance tree, and if the found attribute is a descriptor, the descriptor is called. This is the mechanism of super.

super is also a descriptor. However, this doesn't seem to be used very effectively in Python today. I found the only code in the Python source code in the test: http://hg.python.org/cpython/file/v3.4.1/Lib/test/test_descr.py#l2308

When I looked it up, PEP 367 suggested a specification called self.__super__.foo (), maybe It may have something to do with this. By the way, this PEP was finally adopted in Python 3 as PEP 3135, but in that case `` super () This notation was not adopted because the argument tocan be omitted.

reify

Finally, here is an example of a user-defined descriptor.

http://docs.pylonsproject.org/docs/pyramid/en/latest/_modules/pyramid/decorator.html#reify

This is the code for reify in the web framework Pyramid. reify is like a cached property, similar functionality exists in other frameworks, but the Pyramid implementation is very smart with descriptors. .. The point is the part where setattr is executed in the __get__ method. Here, the value obtained by the function call is set in the attribute dictionary of the instance, so that the descriptor call will not occur from the next time. Because reify is a non-data descriptor, the instance's attribute dictionary takes precedence.

Summary

Descriptors are protocols for customizing attribute references.
Descriptors include data descriptors and non-data descriptors, which have different priorities.
Descriptor is a general-purpose protocol that is used in properties and methods, and can also define original descriptors.

Technology that supports Python Descriptor edition #pyconjp