We are developing a serialization framework that uses a data class called pyserde. How to read is Paiselde.
TL;DR
Add @serialize
, @ deserialize
decorators to a normally defined dataclass
@deserialize
@serialize
@dataclass
class Foo:
i: int
s: str
f: float
b: bool
Then, serialize to JSON with to_json
.
>> h = Foo(i=10, s='foo', f=100.0, b=True)
>> print(f"Into Json: {to_json(h)}")
Into Json: {"i": 10, "s": "foo", "f": 100.0, "b": true}
You can serialize from JSON to object with from_json
.
>> s = '{"i": 10, "s": "foo", "f": 100.0, "b": true}'
>> print(f"From Json: {from_json(Foo, s)}")
From Json: Foo(i=10, s='foo', f=100.0, b=True)
In addition to JSON, it supports MsgPack, YAML, and Toml. There are various other functions.
I read Implementation because I thought the dataclass added to Python 3.7 would be useful, and found the @ dataclass
decorator. When the attached class is loaded into the module, use the exec function to [generate the method](https://github.com/python /cpython/blob/550f30c8f33a2ba844db2ce3da8a897b3e882c9a/Lib/dataclasses.py#L377-L401) I found. Ah, I thought this was interesting, and when I measured the performance of \ _ \ _ init \ _ \ _, \ _ \ _ repr \ _ \ _, and \ _ \ _ eq \ _ \ _, the result was almost the same as the handwritten class. , I thought that it would be possible to generate more methods using this.
Example of defining a function at runtime with exec:
#Define a function with a string
s = '''
def func():
print("Hello world!")
'''
#Pass a string to exec
exec(s)
#The function is defined at runtime!
func()
For the mechanism and performance of dataclass, refer to Explanation here.
Rust has a serialization framework called serde. This serde is a god anyway, and I personally think it's about 20% of the reason Rust is great.
I wanted to create a convenient, high performance and flexible framework like serde in Python, so I named it pyserde.
Getting started
Install with pip
pip install pyserde
dataclasses was added to Python 3.7, but 3.6 now uses the dataclasses backport on the PyPI.
Let's make a class like this. It's just a dataclass, but with the @serialize
and @deserialize
decorators provided by pyserde.
from serde import serialize, deserialize
from dataclasses import dataclass
@deserialize
@serialize
@dataclass
class Foo:
i: int
s: str
f: float
b: bool
With @serialize
, pyserde will generate a serialization method, and with @ deserialize
, a deserialization method will be generated. Method generation is only called once when the class is loaded into the Python interpreter (pyserde or decorator behavior), so there is no overhead when actually using the class.
Now, let's actually serialize and deserialize. pyserde supports JSON, Yaml, Toml and MsgPack as of 0.1.1. Helper functions for each format are in the serde. <Format name>
module and have a naming convention.
For example, in the case of JSON, it becomes like this.
from serde.json import from_json, to_json
Call to_json
to serialize the Foo
object to JSON
f = Foo(i=10, s='foo', f=100.0, b=True)
print(to_json(f))
When serializing with, just specify the JSON string in the class Foo
second argument in the first argument of from_json
.
s = '{"i": 10, "s": "foo", "f": 100.0, "b": true}'
print(from_json(Foo, s))
In the case of Yaml, Toml, MsgPack, it looks like this.
from serde.yaml import from_yaml, to_yaml
print(to_yaml(f))
print(from_yaml(Foo, s))
from serde.toml import from_toml, to_toml
print(to_toml(f))
print(from_toml(Foo, s))
from serde.msgpack import from_msgpack, to_msgpack
print(to_msgpack(f))
print(from_msgpack(Foo, s))
The execution time was measured under the following conditions and compared with other serialization libraries. If you want to see the code for measurement, see here
Serialize / Deserialize to JSON 10,000 times each
Serialize | Deserialize |
---|---|
Conversion to Tuple / Dict 10,000 times each
astuple | asdict |
---|---|
In the chart, the horizontal axis is the comparison target and the vertical axis is Latency. The lower this bar graph, the better the performance. As you can see from the chart, pyserde's performance is second only to handwritten raw. It seems that the difference in performance is that the code of raw
has fewer function calls than pyserde.
The comparison targets are listed below.
raw
: Handwritten serialization / deserialization.dataclass
: Use astuple, asdict of dataclassespyserde
: This library (Github 7 ⭐)dacite
:Simplecreationofdataclassesfromdictionaries.(Github447️⭐)mashumaro
:Fastandwelltestedserializationframeworkontopofdataclasses.(Github131️⭐)marshallow
:Alightweightlibraryforconvertingcomplexobjectstoandfromsimpledatatypes.(Github4668⭐)attrs
:PythonClassesWithoutBoilerplate.(Github3076⭐)cattrs
:Complexcustomclassconvertersforattrs.(Github214⭐)I benchmarked several other libraries, but I didn't put them on the chart because they were incomparably slow.
dataclass-json
:EasilyserializeDataClassestoandfromJSON(Github367⭐)
dataclasses_jsonschema
:JSONschemagenerationfromdataclasses(Github72⭐)
pavlova
:Apythondeserialisationlibrarybuiltontopofdataclasses(Github28⭐)
The number of Github stars is as of May 21, 2020.
It's a complete imitation of the original serde, but it implements the following useful functions.
Case Conversion
Convert snake_case
to camelCase
, kebab-case
, etc.
@serialize(rename_all = 'camelcase')
@dataclass
class Foo:
int_field: int
str_field: str
f = Foo(int_field=10, str_field='foo')
print(to_json(f))
snake_case
is now camelCase
.
'{"intField": 10, "strField": "foo"}'
Rename Field
This is useful when you want to use a keyword such as class
in the field name.
@serialize
@dataclass
class Foo:
class_name: str = field(metadata={'serde_rename': 'class'})
print(to_json(Foo(class_name='Foo')))
The field name of class is class_name
, but JSON is now class
.
{"class": "Foo"}
Skip
You can exclude it from serialization / deserialization by adding serde_skip
to the field.
@serialize
@dataclass
class Resource:
name: str
hash: str
metadata: Dict[str, str] = field(default_factory=dict, metadata={'serde_skip': True})
resources = [
Resource("Stack Overflow", "hash1"),
Resource("GitHub", "hash2", metadata={"headquarters": "San Francisco"}) ]
print(to_json(resources))
The metadata
field has been excluded.
[{"name": "Stack Overflow", "hash": "hash1"}, {"name": "GitHub", "hash": "hash2"}]
Conditional Skip
If you want to exclude by the specified condition, you can pass a conditional expression to serde_skip_if
.
@serialize
@dataclass
class World:
player: str
buddy: str = field(default='', metadata={'serde_skip_if': lambda v: v == 'Pikachu'})
world = World('satoshi', 'Pikachu')
print(to_json(world))
world = World('green', 'Charmander')
print(to_json(world))
The buddy
field will now be excluded only when it is" Pikachu ".
{"player": "satoshi"}
{"player": "green", "buddy": "Charmander"}
I think you often use Yaml or Toml for your application's configuration files. With pyserde you can easily map from your config file to your class.
from dataclasses import dataclass
from serde import deserialize
from serde.yaml import from_yaml
@deserialize
@dataclass
class App:
addr: str
port: int
secret: str
workers: int
def main():
with open('app.yml') as f:
yml = f.read()
cfg = from_yaml(App, yml)
print(cfg)
JSON WebAPI
JSON WebAPI is fairly easy to implement with Flask, but pyserde makes it easy to map to your own types.
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
[packages]
pyserde = "~=0.1"
flask = "~=1.1"
from dataclasses import dataclass
from flask import Flask, request, Response
from serde import serialize, deserialize
from serde.json import to_json, from_json
@deserialize
@serialize
@dataclass
class ToDo:
id: int
title: str
description: str
done: bool
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello, World!'
@app.route('/todos', methods=['GET', 'POST'])
def todos():
print(request.method)
if request.method == 'GET':
body = to_json([ToDo(1, 'Play games', 'Play Holy Sword Legend 3', False)])
return Response(body, mimetype='application/json')
else:
todo = from_json(ToDo, request.get_data())
return f'A new ToDo {todo} successfully created.'
if __name__ == '__main__':
app.run(debug=True)
pipenv install
pipenv run python app.py
$ curl http://localhost:5000/todos
[{"id": 1, "title": "Play games", "description": "Play Holy Sword Legend 3", "done": false}]⏎
$ curl -X POST http://localhost:5000/todos -d '{"id": 1, "title": "Play games", "description": "Play Holy Sword Legend 3", "done": false}'
A new ToDo ToDo(id=1, title='Play games', description='Play Holy Sword Legend 3', done=False) successfully created.⏎
RPC
Unfortunately I can't show you the code, but my company has its own RPC framework and uses pyserde to serialize it into a message MsgPack.
References