PyYAML is a yaml library for python.
A story about trying to change the behavior when loading / dumping yaml using this library.
Let's try changing the behavior of PyYAML using the following two examples as an example so that it behaves as you like.
--Supports OrderedDict (input / output in a format that can be read by other yaml loaders) --To output in an expression compatible with json
This is a memo that you should know as a premise.
--Representer --hook object to add tag at dump timing --Constructor --hook object to generate python expression at load timing
To support OrderedDict, add the following code.
def represent_odict(dumper, instance):
return dumper.represent_mapping('tag:yaml.org,2002:map', instance.items())
yaml.add_representer(OrderedDict, represent_odict)
def construct_odict(loader, node):
return OrderedDict(loader.construct_pairs(node))
yaml.add_constructor('tag:yaml.org,2002:map', construct_odict)
The setting example also exists in qiita as follows.
-Read with PyYAML in order --Qiita
This time I happened to take a peek inside PyYAML, so I decided to take a note of the details such as the meaning of the above settings.
Use yaml.dump ()
when stringifying a Python object as yaml.
import yaml
with open("ok.yaml", "w") as wf:
yaml.dump({200: "ok"}, wf)
This yaml.dump ()
internally calls Representer.represent ()
.
This transforms the Python object into a Node object inside yaml, which is converted to a string with Serializer.serialize ()
.
Internally, type ()
is called for each Python object and the processing content is branched according to the return type.
If there are no branch candidates, the mro of the object is traced to search for conversion candidates.
(If no candidate is found, the object is reached as the final point of the candidate search, and represent_object ()
is called.)
From the standpoint of PyYAML, it seems that I want to use OrderedDict and dict properly, so I purposely set the representation function for OrderedDict.
Representer.add_representer(collections.OrderedDict,
Representer.represent_ordered_dict)
That's why the following Python objects are
d = OrderedDict()
d["a"] = 1
d["z"] = 2
The output will be as follows.
!!python/object/apply:collections.OrderedDict
- - [a, 1]
- [z, 2]
To prevent this, rewrite the presenter settings so that ʻOrderedDict` outputs the same as a normal dict. However, it is necessary to output so that the order is maintained at the output timing.
As with OrderedDict, there is a Representer setting for dict. Internally it is treated as a map Node.
SafeRepresenter.add_representer(dict,
SafeRepresenter.represent_dict)
class SafeRepresenter:
def represent_dict(self, data):
return self.represent_mapping('tag:yaml.org,2002:map', data)
That's why you can add the following to output OrderedDict as a map node.
def represent_odict(dumper, instance):
return dumper.represent_mapping('tag:yaml.org,2002:map', instance.items())
yaml.add_representer(OrderedDict, represent_odict)
import yaml
with open("ok.yaml") as rf:
data = yaml.load(rf)
The same is true for load, and inside yaml.load ()
,Constructor.get_single_data ()
is called.
Again, use get_single_node ()
to create a Node object and construct_document ()
to convert it to a Python object.
This time as well, you can add conversion support for each Node. The Node object itself is an object with the following definition.
class Node(object):
def __init__(self, tag, value, start_mark, end_mark):
self.tag = tag
self.value = value
self.start_mark = start_mark
self.end_mark = end_mark
This tag part determines how to process.
Regardless of whether it is OrderedDict or not in the previous correspondence, it will be retained as a node with a tag of map (treated as dict). So you can set the conversion for the node with the tag of map.
By the way, there are actually pairs in yaml, which is convenient for keeping the order. It is easy to make an Ordered Dict via this.
Those with the pair tag specified are converted as follows (tuple with length 2 as the meaning of pair).
s = """\
foo:
!!pairs
- a: b
- x: y
"""
load(s) # {'foo': [('a', 'b'), ('x', 'y')]
Use these pairs to add the following settings.
def construct_odict(loader, node):
return OrderedDict(loader.construct_pairs(node))
yaml.add_constructor('tag:yaml.org,2002:map', construct_odict)
Managing the configuration file with json can be troublesome because you can't write comments. In such a case, it may be written in yaml. Especially, it is hard to write settings such as json schema and swagger in json, so I sometimes use yaml.
Most of the time it doesn't cause any problems. It is sometimes said that key is a numerical dict expression because there is a difference between json and yaml. Specifically, the situation is as follows.
Suppose there is such a json schema.
{
"type": "object",
"patternProperties": {
"\d{3}": {"type": "string"}
},
"additionalProperties": False
}
When considering the following dict as a value that matches this schema, a slight inconvenience occurs (it is inconvenient and not inappropriate). This code is valid though. If you do this with yaml, it will be invalid.
import json
validate(json.loads(json.dumps({200: "ok"})), schema)
import yaml
from io import StringIO
#For yaml, dumps as well as json module,I want you to define loads. ..
def loads(s):
io = StringIO(s)
return yaml.load(io)
def dumps(d):
io = StringIO()
yaml.dump(d, io)
return io.getvalue()
validate(loads(dumps({200: "ok"})), schema) # error
The reason is that the above dict data is expressed as follows on yaml.
200: ok
The above yaml is a little verbose and accurate, it looks like this: On yaml, even if the key type of map Node is numeric, it remains numeric.
!!int 200: ok
#If you write like this{'200': ok}Recognized as
'200': ok
#Or if you write like this{'200': ok}Recognized as
!!str 200: ok
Since only a character string is allowed as the key of the object on json, it is automatically treated as a character string. The story of adding settings that match this behavior. This is like a review so far, and you can add the following settings.
def construct_json_compatible_map(loader, node):
return {str(k): v for k, v in loader.construct_pairs(node)}
yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG # 'tag:yaml.org,2002:map'
yaml.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, construct_json_compatible_map)
Recommended Posts