Are you using a dictionary or ordinary class to store data in Python? Starting with Python 3.7, there is a dataclass decorator that is useful for storing data.
In this article, I will explain how to use it, touching on when it is convenient and why it should be used, which cannot be grasped by the explanation of Official Document and PEP557.
In previous versions, only Python 3.6 can be used by pip install data classes
. At the time of writing, the environment of Google Colaboratory is Python 3.6.9, but data classes are installed by default.
--People who know the existence of dataclass but don't know what it is --People who want to handle data with high readability ――People who think "I didn't have this function before, and I don't have to use it separately ..."
↓ This is
class Person:
def __init__(self, number, name='XXX'):
self.number = number
self.name = name
person1 = Person(0, 'Alice')
print(person1.number) # 0
print(person1.name) # Alice
↓ You can write like this. (The class name is explicitly changed for distinction)
import dataclasses
@dataclasses.dataclass
class DataclassPerson:
number: int
name: str = 'XXX'
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.number) # 0
print(dataclass_person1.name) # Alice
You can use it by adding the decorator @ dataclasses.dataclass
and writing the variable name you want to define instead of__ init__ ()
with type annotations.
__init__ ()
is created automatically, and type annotation is required.What has changed is that you no longer have to bother to assign ** arguments to instance variables with __init __ ()
. ** It means that __init __ ()
is created automatically. ** It's not a hassle when there are a lot of variables, and I'm happy that it's refreshing. ** Also, other special methods such as __eq__ ()
and __repr__ ()
are created automatically, as described below.
And since type annotation is mandatory, I'm happy to know the type. (However, this is where you want to set def __init __ (self, number: int, name: str ='XXX')
even in a normal class)
** It can be clearly stated that this class exists to store data **, which is also an important factor in terms of readability.
If you just want to do the above example, you can use a dictionary. Why bother to use a class, let alone a dataclass decorator? It seems that there are many people who use a dictionary for input and output for the time being.
dict_person1 = {'number': 0, 'name': 'Alice'}
print(dict_person1['number']) # 0
print(dict_person1['name']) # Alice
What are the disadvantages of dictionaries that are easy to understand?
3 and 4 are important for aiming for code that is easy to read and maintain later, which is a reason to avoid dictionaries even if you don't need methods. However, these can also be covered in regular classes.
Let's take a deep dive into how a class with the dataclass decorator is better than a regular class.
__eq__ ()
is automatically created and unittest is easy.When comparing instances, in a normal class, instances with the same contents but different contents will be False
. This is because we are comparing the values returned by id ()
, which is not very useful. ** Considering doing a unit test, I want it to be True when the elements match. ** **
↓ If you do nothing in a normal class, it will be like this.
class Person:
def __init__(self, number, name='XXX'):
self.number = number
self.name = name
person1 = Person(0, 'Alice')
print(person1 == Person(0, 'Alice')) # False
print(person1 == Person(1, 'Bob')) # False
↓ In order to compare elements in a normal class, you will have to define __eq__ ()
yourself.
class Person:
def __init__(self, number, name='XXX'):
self.number = number
self.name = name
def __eq__(self, other):
if not isinstance(other, Person):
return NotImplemented
return self.number == other.number and self.name == other.name
person1 = Person(0, 'Alice')
print(person1 == Person(0, 'Alice')) # True
print(person1 == Person(1, 'Bob')) # False
↓ If you use the dataclass decorator, this __eq__ ()
will be created automatically. It saves time and looks neat.
@dataclasses.dataclass
class DataclassPerson:
number: int
name: str = 'XXX'
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1 == DataclassPerson(0, 'Alice')) # True
print(dataclass_person1 == DataclassPerson(1, 'Bob')) # False
Also, if @ dataclasses.dataclass (order = True)
is set, __lt__ ()
, __le__ ()
, __gt__ ()
, and __ge__ ()
are also created for the operation of magnitude comparison. I will. These are specifications that first compare different elements, just like when comparing tuples. It's a little confusing, so you might want to define it yourself if you need it.
Use dataclasses.asdict ()
when you want to convert to a dictionary, such as when you want to output as JSON. It doesn't matter if you nest the dataclass.
@dataclasses.dataclass
class DataclassScore:
writing: int
reading: int
listening: int
speaking: int
@dataclasses.dataclass
class DataclassPerson:
score: DataclassScore
number: int
name: str = 'Alice'
dataclass_person1 = DataclassPerson(DataclassScore(25, 40, 30, 35), 0, 'Alice')
dict_person1 = dataclasses.asdict(dataclass_person1)
print(dict_person1) # {'score': {'writing': 25, 'reading': 40, 'listening': 30, 'speaking': 35}, 'number': 0, 'name': 'Alice'}
import json
print(json.dumps(dict_person1)) # '{"score": {"writing": 25, "reading": 40, "listening": 30, "speaking": 35}, "number": 0, "name": "Alice"}'
Even a normal class can be converted to a dictionary format by using __dict__
, but it takes some effort when nested.
When returning from the dictionary to the class, use unpack and do as follows.
DataclassPerson(**dict_person1)
You can easily make it immutable using the data class. By making immutable data that will not be rewritten, you can avoid the anxiety that it may have changed somewhere.
↓ It is mutable if nothing is specified,
@dataclasses.dataclass
class DataclassPerson:
number: int
name: str = 'XXX'
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.number) # 0
print(dataclass_person1.name) # Alice
dataclass_person1.number = 1
print(dataclass_person1.number) # 1
↓ If you set frozen = True
in the decorator argument, it will be immutable. At this time, __hash__ ()
is automatically created, and you can also use hash ()
to get the hash value.
@dataclasses.dataclass(frozen=True)
class FrozenDataclassPerson:
number: int
name: str = 'Alice'
frozen_dataclass_person1 = FrozenDataclassPerson(number=0, name='Alice')
print(frozen_dataclass_person1.number) # 0
print(frozen_dataclass_person1.name) # Alice
print(hash(frozen_dataclass_person1)) # -4135290249524779415
frozen_dataclass_person1.number = 1 # FrozenInstanceError: cannot assign to field 'number'
There are also standard libraries such as the following for applications that you want to make immutable.
By using these, you can create tuples (= immutable objects) that allow dot access.
from collections import namedtuple
CollectionsNamedTuplePerson = namedtuple('CollectionsNamedTuplePerson', ('number' , 'name'))
collections_namedtuple_person1 = CollectionsNamedTuplePerson(number=0, name='Alice')
print(collections_namedtuple_person1.number) # 0
print(collections_namedtuple_person1.name) # Alice
print(collections_namedtuple_person1 == (0, 'Alice')) # True
collections_namedtuple_person1.number = 1 # AttributeError: can't set attribute
↓ Furthermore, typing.NamedTuple can also type annotation.
from typing import NamedTuple
class NamedTuplePerson(NamedTuple):
number: int
name: str = 'XXX'
namedtuple_person1 = NamedTuplePerson(0, 'Alice')
print(namedtuple_person1.number) # 0
print(namedtuple_person1.name) # Alice
print(typing_namedtuple_person1 == (0, 'Alice')) # True
namedtuple_person1.number = 1 # AttributeError: can't set attribute
For more information Write beautiful python with namedtuple! (Translation) --Qiita is easy to understand.
dataclass and typing.NamedTuple are similar, but different in detail. As shown in the code above, it seems to be a disadvantage to be True when compared with tuples that have the same elements.
One of the more convenient features of typing.NamedTuple is that it is a tuple, so you can do unpacked assignments. Depending on the usage, it may be better to force it into a data class.
__repr__ ()
is created, you can easily check the contents.Since __repr__ ()
is created automatically, you can easily check the contents with print ()
etc.
@dataclasses.dataclass
class DataclassPerson:
number: int
name: str = 'XXX'
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1) # DataclassPerson(number=0, name='Alice')
If you want to have the same display in a normal class, you need to write the following.
class Person:
def __init__(self, number, name='XXX'):
self.number = number
self.name = name
def __repr__(self):
return f'{self.__class__.__name__}({", ".join([f"{key}={value}" for key, value in self.__dict__.items()])})'
person1 = Person(0, 'Alice')
print(person1) # Person(number=0, name=Alice)
__post_init__ ()
Use __post_init__ ()
when you are doing something other than assignment with the normal class __init __ ()
. This method will be called after the assignment. Also, use dataclasses.field (init = False)
to create an instance variable that is not passed as an argument.
@dataclasses.dataclass
class DataclassPerson:
number: int
name: str = 'XXX'
is_even: bool = dataclasses.field(init=False)
def __post_init__(self):
self.is_even = self.number%2 == 0
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.number) # 0
print(dataclass_person1.name) # Alice
print(dataclass_person1.is_even) # True
InitVar
As in the example below, there may be values that you want to pass as arguments at initialization but don't want to be instance variables.
class Person:
def __init__(self, number, name='XXX'):
self.name = name
self.is_even = number%2 == 0
person1 = Person(0, 'Alice')
print(person1.name) # Alice
print(person1.is_even) # True
In that case, use InitVar
.
@dataclasses.dataclass
class DataclassPerson:
number: dataclasses.InitVar[int]
name: str = 'XXX'
is_even: bool = dataclasses.field(init=False)
def __post_init__(self, number):
self.is_even = number%2 == 0
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.name) # Alice
print(dataclass_person1.is_even) # True
Since it is an Advent calendar less than a year after joining the company, it tends to be good for individual development, but it was an introduction of the parts that I want to cherish for team development.
It's convenient to use, but it's easy to neglect to catch up on features that can be managed without using them, but there are reasons to add new features. The atmosphere of recent Python has changed considerably from a few years ago, with the introduction of type annotations. There may be likes and dislikes, but first of all, I can't think of anything I don't know, so I want to make sure I don't leave it behind!
dataclasses --- Data Classes — Python 3.9.1 Documentation PEP 557 -- Data Classes | Python.org
If you read this article and thought it was "interesting" or "learned", please leave a comment on Twitter, facebook, or Hatena Bookmark!
In addition, DeNA Official Twitter Account @DeNAxTech publishes not only blog articles but also presentation materials at various study sessions. Please follow us! Follow @DeNAxTech