This article is the 16th day article of Python Part 2 Advent Calendar 2020.
Type Hints was introduced in Python 3.5, and it is now commonplace to write type information in code even in Python, which was originally a dynamically typed language.
In this article, I'll introduce you to pydantic
, a library that makes the most of this type information to help you write more robust Python code.
Since it is also used in the Python web framework FastAPI, which has been a hot topic recently, many people may know its existence.
Actually, I also learned about the existence of this pydantic
when I first used the Fast API.
pydantic
is a library that realizes the following functions.
There are many people who say that this is the only thing. I will explain using an example after this.
GitHub: samuelcolvin/pydantic: Data parsing and validation using Python type hints Official documentation: pydantic
Example
pydantic
works well in user-defined classes that inherit from the base class pydantic.BaseModel
.
First, consider a class definition that does not use pydantic
.
Use dataclasses.dataclass
.
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class NonPydanticUser:
name: str
age: int
Let's create one instance of this NonPydanticUser
class.
In this example, the two fields name
are of type str
and age
is of type int
.
It holds the data type as defined in the class.
Ichiro = NonPydanticUser(name="Ichiro", age=19)
print(Ichiro)
#> NonPydanticUser(name='Ichiro', age=19)
print(type(Ichiro.name))
#> <class 'str'>
print(type(Ichiro.age))
#> <class 'int'>
Let's create another instance.
Samatoki = NonPydanticUser(name="Samatoki", age="25")
print(Samatoki)
#> NonPydanticUser(name='Samatoki', age='25')
print(type(Samatoki.name))
#> <class 'str'>
print(type(Samatoki.age))
#> <class 'str'>
In this example, name
is of type str
, but age
is of type str
.
No exceptions such as TypeError
are thrown.
You can see again that the type information given by the type annotation works only at the time of coding.
Certainly, if you use mypy
or Pylance
etc., you can detect such type inconsistency at the time of coding, but if you want to throw an exception due to type inconsistency or invalid value at the time of code execution, check the input value by yourself. Must be done.
On the other hand, the class definition using pydantic
is as follows.
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
At first glance, it's similar to using dataclasses.dataclass
.
But there is a clear difference.
First, let's create an instance using normal field values.
Ramuda = User(name="Ramuda", age=24)
print(Ramuda)
#> name='Ramuda' age=24
print(type(Ramuda.name))
#> <class 'str'>
print(type(Ramuda.age))
#> <class 'int'>
You can't really tell the difference with this alone.
Next, give age
a str
type number such as " 23 "
or " 45 "
.
Jakurai = User(name="Jakurai", age="35")
#> name='Jakurai' age=35
print(type(Jakurai.name))
#> <class 'str'>
print(type(Jakurai.age))
#> <class 'int'>
** Jakurai.age
is cast to int
type. ** **
By the way, what happens if you give age
a value that cannot be cast to a int
type such as hoge
or fuga
?
Sasara = User(name="Sasara", age="Is it true?")
#> ValidationError: 1 validation error for User
#> age
#> value is not a valid integer (type=type_error.integer)
An exception called ValidationError
was thrown.
I have detected an invalid value even though I have not implemented validation in particular.
When pydantic
is used in this way, the described type information is applied not only when coding but also when executing code, and it throws an easy-to-understand exception for invalid values (described later), so it is a dynamically typed language. You can write type-strict code in Python!
I will give a basic explanation using the following code in the official Example.
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
external_data = {
'id': '123',
'signup_ts': '2019-06-01 12:22',
'friends': [1, 2, '3'],
}
user = User(**external_data)
print(user.id)
#> 123
print(repr(user.signup_ts))
#> datetime.datetime(2019, 6, 1, 12, 22)
print(user.friends)
#> [1, 2, 3]
print(user.dict())
"""
{
'id': 123,
'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
'friends': [1, 2, 3],
'name': 'John Doe',
}
"""
Define your own class by inheriting the base class pydantic.BaseModel
.
In this class definition, four fields are defined: id
, name
, signup_ts
, and friends
.
Each field has a different description. According to the document, it has the following meanings.
-- id
( int
) ... If you declare only Type Hints, it will be a required field. If a value of type str
, bytes
, or float
is given at the time of instantiation, it is forcibly converted to int
. If a value of any other data type (dict
, list
, etc.) is given, an exception will be thrown.
-- name
( str
) ... From the default value John Doe
, name
is inferred to be of type str
. Also, since the default value is declared, name
is not a required field.
--signup_ts
: (datetime
, optional) ... datetime
type where None
is allowed. Also, since the default value is declared, sign_up
is not a required field. You can give a int
type UNIX timestamp (e.g. 1608076800.0) or a str
type string representing a date and time as an argument.
--friends
: (List [int]
) ... Uses Python's built-in typing system. Also, since the default value is declared, it is not a required field. Like id
,"123"
and"45"
are converted to int
type.
I mentioned that if you try to give an invalid value when instantiating a class that inherits pydantic.BaseModel
, you will throw an exception called pydantic.ValidationError
.
Let's take a look inside the ValidationError
using the code below.
from pydantic import ValidationError
try:
User(signup_ts='broken', friends=[1, 2, 'not number'])
except ValidationError as e:
print(e.json())
The contents of ValidationError
for this code are as follows.
You can see what kind of inconsistency is occurring in each field.
[
{
"loc": [
"id"
],
"msg": "field required",
"type": "value_error.missing"
},
{
"loc": [
"signup_ts"
],
"msg": "invalid datetime format",
"type": "value_error.datetime"
},
{
"loc": [
"friends",
2
],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]
Tips
This article alone cannot introduce all of pydantic, but from now on, I would like to introduce some elements that can be used immediately.
There are a wide variety of data types that support pydantic
.
Here are some of them.
Standard Library Types
Of course, you can use primitive data types such as int
, str
, list
, and dict
.
It also supports built-in libraries such as typing
, ipaddress
, enum
, decimal
, pathlib
, and uuid
.
The following is an example using ipadress.IPv4Address
.
from pydantic import BaseModel
from ipaddress import IPv4Address
class IPNode(BaseModel):
address: IPv4Address
client = IPNode(address="192.168.0.12")
srv = IPNode(address="hoge")
#> ValidationError: 1 validation error for IPNode
#> address
#> value is not a valid IPv4 address (type=value_error.ipv4address)
pydantic
also supports URLs such as https://example.com
and ftp: // hogehoge
.
from pydantic import BaseModel, HttpUrl, AnyUrl
class Backend(BaseModel):
url: HttpUrl
bd1 = Backend(url="https://example.com")
bd2 = Backend(url="file://hogehoge")
#> ValidationError: 1 validation error for Backend
#> url
#> URL scheme not permitted (type=value_error.url.scheme; allowed_schemes={'https', 'http'})
You can also handle information that you do not want to output in the output such as logs.
For example, you can use pydantic.SecretStr
for passwords.
from pydantic import BaseModel, SecretStr
class Password(BaseModel):
value: SecretStr
p1 = Password(value="hogehogehoge")
print(p1.value)
#> **********
EmailStr
It is a type that can handle email addresses.
However, to use it, you need to install a library called email-vaidator separately from pydantic
.
Let's use this Email Str
and the Secret Types
in the previous section.
from pydantic import BaseModel, EmailStr, SecretStr, Field
class User(BaseModel):
email: EmailStr
password: SecretStr = Field(min_length=8, max_length=16)
# OK
Juto = User(email="[email protected]", password="hogehogehoge")
print(Juto)
#> email='[email protected]' password=SecretStr('**********')
# NG,email is not in the email address format
Rio = User(email="rio", password="hogehogehogehoge")
#> ValidationError: 1 validation error for User
#> email
#> value is not a valid email address (type=value_error.email)
# NG,The number of characters in password exceeds 16 characters
Gentaro = User(email="[email protected]", password="hogehogehogehogehoge")
#> ValidationError: 1 validation error for User
#> password
#> ensure this value has at most 16 characters (type=value_error.any_str.max_length; limit_value=16)
# NG,password has less than 8 characters
Daisu = User(email="[email protected]", password="hoge")
#> ValidationError: 1 validation error for User
#> password
#> ensure this value has at least 8 characters (type=value_error.any_str.min_length; limit_value=8)
from pydantic import BaseModel, HttpUrl, AnyUrl, SecretStr, conint
#Try to allow only positive numbers
class PositiveNumber(BaseModel):
value: conint(gt=0)
# OK
n1 = PositiveNumber(value=334)
#NG,Negative number
n2 = PositiveNumber(value=-100)
#> ValidationError: 1 validation error for PositiveNumber
#> value
#> ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)
In the example at the beginning of the article, I was thankful for casting str
type numbers such as"23"
and"45"
to int
type and accepting them.
You can also declare more stringent fields that don't even allow this cast.
from pydantic import BaseModel, conint, StrictInt
#Cast not allowed int
class StrictNumber(BaseModel):
value: StrictInt
# OK
n1 = StrictNumber(value=4)
#Even if it is a str type that can be cast and become an int type, it is not an int type, so it is NG
n2 = StrictNumber(value="4")
#> ValidationError: 1 validation error for StrictNumber
#> value
#> value is not a valid integer (type=type_error.integer)
It can also be combined with the Constrained Types in the previous section.
from pydantic import BaseModel conint
#Allow only natural numbers
class NaturalNumber(BaseModel):
value: conint(strict=True, gt=0)
# OK
n1 = NaturalNumber(value=334)
# NG,Negative number
n2 = NaturalNumber(value=-45)
#> ValidationError: 1 validation error for NaturalNumber
#> value
#> ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)
#Even if it is a str type that can be cast and become an int type, it is not an int type, so it is NG
n3 = NaturalNumber(value="45")
#> ValidationError: 1 validation error for NaturalNumber
#> value
#> value is not a valid integer (type=type_error.integer)
#float type is also not allowed
n4 = NaturalNumber(value=123.4)
#> ValidationError: 1 validation error for NaturalNumber
#> value
#> value is not a valid integer (type=type_error.integer)
Simple validations can be written at the time of field declaration, but user-defined validations can be created using pydantic.validator
.
Consider a simple example.
Define a validator
that is allowed only when the name
field contains a single-byte space.
from pydantic import BaseModel, validator
#Do not allow the case where the name does not contain a space
class User(BaseModel):
name: str
age: int
@validator("name")
def validate_name(cls, v):
if ' ' not in v:
raise ValueError("must contain a space")
return v
# OK
Jiro = User(name="Jiro Yamada", age=17)
# NG
Saburo = User(name="Saburo Yamada", age=14)
#> ValidationError: 1 validation error for User
#> name
#> must contain a space (type=value_error)
For example, consider a Event
class that holds the start and end times of an appointment as begin
and end
, respectively.
from datetime import datetime
from pydantic import BaseModel
class Event(BaseModel):
begin: datetime
end: datetime
event = Event(begin="2020-12-16T09:00:00+09:00", end="2020-12-16T12:00:00+09:00")
At this time, I want to guarantee that the time assigned to the end
field is later than the time assigned to the begin
field.
If the times of begin
and end
match, it is also considered to be an invalid value.
I think there are several ways to do it. I would like to introduce two.
The first way is to use pydantic.root_validator
instead of pydantic.validator
.
from datetime import datetime
from pydantic import BaseModel, root_validator
class Event(BaseModel):
begin: datetime
end: datetime
@root_validator(pre=True)
def validate_event_schedule(cls, values):
_begin: datetime = values["begin"]
_end: datetime = values["end"]
if _begin >= _end:
raise ValueError("Invalid event.")
return values
# OK
event1 = Event(begin="2020-12-16T09:00:00+09:00", end="2020-12-16T12:00:00+09:00")
# NG
event2 = Event(begin="2020-12-16T12:00:00+09:00", end="2020-12-16T09:00:00+09:00")
#> ValidationError: 1 validation error for Event
#> __root__
#> Invalid event. (type=value_error)
# NG
event3 = Event(begin="2020-12-16T12:00:00+09:00", end="2020-12-16T12:00:00+09:00")
#> ValidationError: 1 validation error for Event
#> __root__
#> Invalid event. (type=value_error)
The other utilizes the validator
specification.
I will introduce the code first.
from datetime import datetime
from pydantic import BaseModel, root_validator, validator
class Event(BaseModel):
begin: datetime
end: datetime
@validator("begin", pre=True)
def validate_begin(cls, v):
return v
@validator("end")
def validate_end(cls, v, values):
if values["begin"] >= v:
raise ValueError("Invalid schedule.")
return v
In this code we have defined two validator
s.
When instantiating this Event
class, validate_begin
with the argument pre = True
is executed first. In validate_begin
, the value specified in the argument begin
at the time of instantiation is set in the begin
field as it is.
Then validate_end
is processed.
However, unlike validate_begin
, validate_end
has an argument called values
as the third argument.
** As a specification of pydantic.validator
, you can access the fields checked by the validator
executed before a certain validator
using the third argument values
. ** **
This values
can't be either _values
or Values
. Think of it as a kind of reserved word.
In other words, in the case of this code, the order of input value check of each field is as follows.
begin
by validate_begin
is executed.validate_end
performs an input value check for end
. At this time, you can refer to the begin
field withvalues [" begin"]
from within the scope of validate_end
.We have introduced the above two methods. Please let me know if there is a better way.
List
, Dict
, Set
, etc.Consider a RepeatedExams
class that meets the following specifications.
List [int]
type field scores
that stores the scores of exactly 10 exams (int
type).The code is as follows.
If you want to check the input value by a certain validator
for each element of the field of type such as List
, Dict
, Set
, set each_item = True
to that validator
. I will.
The code below sets each_item = True
for a validator
called validate_each_score
.
from pydantic import BaseModel
from typing import List
class RepeatedExams(BaseModel):
scores: List[int]
#Verify that the number of test results is exactly 10
@validator("scores", pre=True)
def validate_num_of_exams(cls, v):
if len(v) != 10:
raise ValueError("The number of exams must be 10.")
return v
#Verify that the result of one test is 50 points or more
@validator("scores", each_item=True)
def validate_each_score(cls, v):
assert v >= 50, "Each score must be at least 50."
return v
#Verify that the total test results are 800 points or more
@validator("scores")
def validate_sum_score(cls, v):
if sum(v) < 800:
raise ValueError("sum of numbers greater than 800")
return v
# OK
result1 = RepeatedExams(scores=[87, 88, 77, 100, 61, 59, 97, 75, 80, 85])
# NG,I have only taken the test 9 times
result2 = RepeatedExams(scores=[87, 88, 77, 100, 61, 59, 97, 75, 80])
#> ValidationError: 1 validation error for RepeatedExams
#> scores
#> The number of exams must be 10. (type=value_error)
# NG,There are exams with less than 50 points
result3 = RepeatedExams(scores=[87, 88, 77, 100, 32, 59, 97, 75, 80, 85])
#> ValidationError: 1 validation error for RepeatedExams
#> scores -> 4
#> Each score must be at least 50. (type=assertion_error)
# NG,The total of 10 tests is less than 800 points
result4 = RepeatedExams(scores=[87, 88, 77, 100, 51, 59, 97, 75, 80, 85])
#> ValidationError: 1 validation error for RepeatedExams
#> scores
#> sum of numbers greater than 800 (type=value_error)
Instances of classes that inherit from pydantic.BaseModel
can be converted to dictionary or JSON format, and can be copied.
Not only can you convert and copy, but you can also specify the target field and output only a specific field.
from pydantic import BaseModel, conint
class User(BaseModel):
name: str
age: conint(strict=True, ge=0)
height: conint(strict=True, ge=0)
weight: conint(strict=True, ge=0)
Kuko = User(name="Kuko", age=19, height=168, weight=58)
print(Kuko)
#Convert to dict for all fields
Kuko_dict_1 = Kuko.dict()
print(Kuko_dict_1)
#> {'name': 'Kuko', 'age': 19, 'height': 168, 'weight': 58}
#Convert to dict only for name
Kuko_name = Kuko.dict(include={"name"})
print(Kuko_name)
#> {'name': 'Kuko'}
#Copy for all fields
print(Kuko.copy())
print(Kuko_2)
#> name='Kuko' age=19 height=168 weight=58
#Copy excluding only age
Kuko_3 = Kuko.copy(exclude={"age"})
print(Kuko_3)
#> name='Kuko' height=168 weight=58
#JSON for all fields
Kuko_json = Kuko.json()
print(Kuko_json)
#> {"name": "Kuko", "age": 19, "height": 168, "weight": 58}
print(type(Kuko_json))
#> <class 'str'>
I abandoned other elements such as Model Config and Schema because I couldn't get enough time to write them. I wish I could add it in the future ...
Recommended Posts