--When outputting a YAML file using Python, I needed a little trick to output a character string including line breaks, so I will leave it as a memo. (Although there are rumors that I didn't have enough knowledge of YAML ...)
PyYAML
--Because Python handles YAML, we will use PyYAML.
――When there is character string data including such line breaks
{
'aa': 'bbbb\ncccc\ndddd',
'bb': 'eeee'
}
――I want to output in block style with a line feed code like this.
aa: |
bbbb
cccc
dddd
bb: eeee
--For the time being, I will try yaml.dump as it is -* What was actually needed was a process to output a file, but the process is to print in an easy-to-understand manner.
import yaml
def main():
test_dict = {
'aa': 'bbbb\ncccc\ndddd',
'bb': 'eeee'
}
print(
yaml.dump(test_dict,
allow_unicode=True,
encoding='utf-8',
default_flow_style=False).decode()
)
if __name__ == '__main__':
main()
――Something is different. ..
aa: 'bbbb
cccc
dddd'
bb: eeee
--When I looked it up, it was similar to around here. Since there was content, I corrected it and tried to deal with it. --Here is the modified code.
import yaml
def represent_str(dumper, instance):
if "\n" in instance:
return dumper.represent_scalar('tag:yaml.org,2002:str',
instance,
style='|')
else:
return dumper.represent_scalar('tag:yaml.org,2002:str',
instance)
def main():
test_dict = {
'aa': 'bbbb\ncccc\ndddd',
'bb': 'eeee'
}
yaml.add_representer(str, represent_str)
print(
yaml.dump(test_dict,
allow_unicode=True,
encoding='utf-8',
default_flow_style=False).decode()
)
if __name__ == '__main__':
main()
--The style for outputting with the add_representer method is defined.
--Specify style ='|'
only if the string contains a line feed code.
--The execution result is here
aa: |-
bbbb
cccc
dddd
bb: eeee
――It feels good!
――When I was outputting with the above implementation, there were some cases where it was not output as expected. ――When I investigated, I found that if there was a space before the line break of the character string, it would not be output properly. --For such data
test_dict = {
'aa': 'bbbb\ncccc \ndddd',
'bb': 'eeee'
}
――It doesn't break. ..
aa: "bbbb\ncccc \ndddd"
bb: eeee
――I was curious, so I checked the PyYAML code.
--analyze_scalar I'm determining the type of data in the method, but next to the space If there is a line feed code in
, the variable space_break will be True.
--Then, all the flags of Around here will be set to False.
if space_break or special_characters:
allow_flow_plain = allow_block_plain = \
allow_single_quoted = allow_block = False
--When actually outputting yaml, judge the style in this choose_scalar_style method It is designed to output.
--Originally, as implemented earlier, it should be output in the style specified by the ʻadd_representer method, but among the flags above, the style specified that ʻallow_block
is False (here| It is not output with
).
if self.event.style and self.event.style in '|>':
if (not self.flow_level and not self.simple_key_context
and self.analysis.allow_block):
return self.event.style
--There is such a description in the comment of analyze_scalar that set the flag earlier. --It seems that if there is a line break following a space, the output will not be in block style.
Spaces followed by breaks, as well as special character
are only allowed for double quoted scalars.
--Furthermore, if you look at analyze_scalar, you can see that location where ʻallow_blockis set to False is There is. --It seems that allow_block is false set to
if there is a space at the end of the string. --There is such a description in the comment, and it seems that the output in block style is not output as well. -
We do not permit trailing spaces for block scalars.`
--In summary, if there is a line break following a space in the character string, and if there is a space at the end of the character string, the output will not be output in the block style where the character string is broken by the line feed code. is.
――So, in the end, we implemented it like this. --Simply replace the corresponding string in the represent method.
import yaml
import re
def represent_str(dumper, instance):
if "\n" in instance:
instance = re.sub(' +\n| +$', '\n', instance)
return dumper.represent_scalar('tag:yaml.org,2002:str',
instance,
style='|')
else:
return dumper.represent_scalar('tag:yaml.org,2002:str',
instance)
def main():
test_dict = {
'aa': 'bbbb\ncccc \ndddd',
'bb': 'eeee'
}
yaml.add_representer(str, represent_str)
print(
yaml.dump(test_dict,
allow_unicode=True,
encoding='utf-8',
default_flow_style=False).decode()
)
if __name__ == '__main__':
main()
--Even if there is a line break after the space, the output is as expected.
aa: |-
bbbb
cccc
dddd
bb: eeee
――I haven't investigated it properly, but it seems that the process is based on the YAML specifications. ――I didn't really care about it, but the YAML specifications seem to be profound. ..
Recommended Posts