This article is the 7th day article of WACUL Advent Calendar 2016. I've been working at WACUL since September of this year, writing mainly go code and sometimes python on the backend.
By the way, it got longer while I was writing this and that, so if you just want to know what you did [Introduction](http://qiita.com/podhmo/items/a12952b07648e42911b4#%E3%81%AF%E3 After reading% 81% 98% E3% 82% 81% E3% 81% AB), [Finally the main subject](http://qiita.com/podhmo/items/a12952b07648e42911b4#%E3%82%88%E3 Please skip to around% 81% 86% E3% 82% 84% E3% 81% 8F% E6% 9C% AC% E9% A1% 8C).
Recently, I'm writing go code every day in my business. Occasionally I wanted to write go code in python. You may think it is strange. But please wait. For the time being, it is the result of following the following thought flow.
Of course, it is certainly better to complete the above steps as much as possible.
The difference between * 1. * and * 2. * is subjective and may be difficult to understand.
As for * 3. *, I feel lucky. If not, it may be good to make it with go.
That said, I think there are times when it's an interest-oriented quest, or when you're doing a little work in a familiar language. It may be quicker to quickly create something that behaves according to your personal taste, rather than trying hard to make something general-purpose. Personally, I was familiar with python, so I chose python.
When you think of go code generation, you usually think of the following two things.
--Generate embed code by using text / template
etc. to randomly pass the character string as an argument
--Generate code by tracing AST and shaping it into the required shape
This time it's a story other than this.
prestring?
Actually, I have been using my own library for work such as code generation. It is a library called prestring. I don't think anyone knows it, so I'll explain a little about this library.
This library is not a (transpile) library that generates go code from python or some DSL, but really just a library for directly handwriting the code for a specific language Y (here go) on python. It's positioned in the same way as using a template engine for automatic code generation. The feeling of using the full language function may be different.
As a feature, I emphasized the area of making indentation management easier. The original idea itselfD language for victory|programming| POSTDAppearsinarticlesetc.srcgenThe library is derived from.
The prestring is provided by a class called Module
for each module. It is a mechanism that code is generated when the result of adding various operations to the object of this class is output. By the way, although the name is prestring, I remember that it meant the state before the character string.
m = Module()
m.stmt("hai")
print(m) # => "hai\n"
Let's write hello world as a simple code. hello world looks like this:
from prestring.go import Module
m = Module()
m.package("main")
with m.import_group() as im:
im.import_("fmt")
with m.func("main"):
m.stmt("fmt.Println(`hello world`)")
print(m)
As a usage, use the with syntax when you want indentation. You can write it with the feeling of using a method with a name similar to the reserved word of each language. Once you get used to it, you will be able to see the output language as it is. Probably. surely. maybe.
In fact, the above code outputs a go code similar to the following. Even if the output is a little clunky, it is convenient because gofmt
will format it.
package main
import (
"fmt"
)
func main() {
fmt.Println(`hello world`)
}
Let's automatically generate a little more complicated code. I would like to write a code that calculates the Cartesian product of a list and a list. For example, the direct product of the two lists xs and ys is as follows.
xs = [1, 2, 3]
ys = ["a", "b", "c"]
[(x, y) for x in xs for y in ys]
# => [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c'), (3, 'a'), (3, 'b'), (3, 'c')]
Similarly, consider two cases, three cases, ... N cases, and so on. Normally, you would write the code any number of times using recursion. This time I would like to write a code that outputs the code for each case.
First, let's write the code that generates the go code for the two cases directly. It will look like the following.
from prestring.go import GoModule #Same as Module
# cross product
def cross2(m):
with m.func("cross2", "vs0 []string", "vs1 []string", return_="[][]string"):
m.stmt("var r [][]string")
with m.for_("_, v0 := range vs0"):
with m.for_("_, v1 := range vs1"):
m.stmt("r = append(r, []string{v0, v1})")
m.return_("r")
return m
m = GoModule()
m.package("main")
print(cross2(m))
If with is attached, you can read it somehow if you keep indentation in mind. The output result is as follows.
func cross2(vs0 []string, vs1 []string) [][]string {
var r [][]string
for _, v0 := range vs0 {
for _, v1 := range vs1 {
r = append(r, []string{v0, v1})
}
}
return r
}
Now consider the case where the number of lists passed is 3, 4, ... and any number of lists passed. It can be a bit annoying to see how many loop nests change depending on the number of lists passed. In such a case, you cannot write it directly as it is, but if you write it recursively, you can generate looped code that is nested at any time while maintaining the indentation structure.
def crossN(m, n):
def rec(m, i, value):
if i >= n:
m.stmt("r = append(r, []string{{{value}}})".format(value=", ".join(value)))
else:
v = "v{}".format(i)
vs = "vs{}".format(i)
with m.for_("_, {} := range {}".format(v, vs)):
value.append(v)
rec(m, i + 1, value)
args = ["vs{} []string".format(i) for i in range(n)]
with m.func("cross{}".format(n), *args, return_="[][]string"):
m.stmt("var r [][]string")
rec(m, 0, [])
m.return_("r")
return m
It's a little hard to read. Considering that the inner with is written before exiting the outer with, and the call is nested, the structure is as follows, so please get used to it.
loop of stage with v0
loop of stage with v1
loop with v2 stage
...
loop with vN stage
If you try to output the result of crossN (m, 5)
when N = 5, it will be as follows.
package main
func cross5(vs0 []string, vs1 []string, vs2 []string, vs3 []string, vs4 []string) [][]string {
var r [][]string
for _, v0 := range vs0 {
for _, v1 := range vs1 {
for _, v2 := range vs2 {
for _, v3 := range vs3 {
for _, v4 := range vs4 {
r = append(r, []string{v0, v1, v2, v3, v4})
}
}
}
}
}
return r
}
There was another distinctive feature. It is a function called submodule. This function is a function that you can add a marker to a specific position in a certain output and embed the character string expression you want to insert at that position later. It looks like the following.
m = Module()
m.stmt("begin foo")
sm = m.submodule()
m.stmt("end foo")
m.stmt("bar")
sm.sep()
sm.stmt("** yay! **")
sm.sep()
print(m)
Create a submodule called sm at the position surrounded by foo of m. I'm embedding line breaks and some wording later. The output result is as follows.
begin foo
** yay! **
end foo
bar
This feature is used in the import part. So you can write the import of the package that you will need later.
from prestring.go import Module
m = Module()
m.package('main')
with m.import_group() as im:
pass
with m.func('main'):
im.import_('log')
m.stmt('log.Println("hmm")')
print(m)
Although "log" is imported for the first time in the main function. Looking at the output result, it is inserted at the position specified by ʻimport_group ()`.
package main
import (
"log"
)
func main() {
log.Println("hmm")
}
However, it is a good idea to use goimports
instead of gofmt
to automatically insert the import part, so it may not be used much in go.
Generating Code Now that I have the ability to generate code, I sometimes felt like I was going to do everything with code generation. In conclusion, I don't recommend it very much. It's best to do what can be calculated at runtime (not to use reflection).
For example, the following fizzbuzz written by * serious person * I'm not so happy to do it. Because I can do it The distinction between good and bad is based on my own lazyness. Please judge.
Personally, I think the following criteria are good.
――Before you start code generation, think about whether you really need it. --We do not recommend automatic generation of application code. Glue code for automatic generation --Never edit the automatically generated code. Automatic generation at the moment of modification becomes a liability.
The last is a slightly generic version of the generation gap pattern (which I recently learned).
Also, in addition, trying to make up for the missing language features will probably fail.
Taking generics as an example, it is possible to generate a single-phase TContainer
-like definition corresponding toContainer <T>
. I can't define something like a function that takes Container <T>
as an argument.
This is because T, which is treated as a type variable T in the original generics, disappears at the same time as it is generated and does not propagate. It takes a lot of muddy work to get it to work seriously, and in the end it feels like it's not worth the effort.
It was a long time, but it is finally the main subject. It's about halfway here.
JSON-to-GO?
It seems that Qiita had the following article.
-Coding pattern to avoid crying in JSON in Go language --Qiita
I read the above article and found out. There seems to be a convenient service called JSON-to-Go. This means that if you pass the JSON of API Response, the corresponding go type definition will be output.
You can see what it looks like by clicking the sample link on the GitHub API.
By the way, the Implementation of conversion part used in this service seems to be on Github. It was code written in js.
It is good to use this as it is. It's a good subject, so I'll try porting it on python. However, it is an attempt to make something that produces almost the same output rather than a complete port using the library called prestring mentioned above.
The ported code will look like the link below.
It's about 120 lines, so it's not that hard code. The original js code is also a little more than the python code at around 230 lines, but it's not that big. The reason for the difference in code size is that the original js implementation defines the process of converting to a name for go. That's reduced, so it's about 120 lines. (By the way, the process of converting to the name for go used in the original json-to-go seemed to be convenient. There is a history that the same thing was taken inside the prestring)
Well, I'd like to actually do the porting work, but if you can do the following operations, it seems that you can convert from JSON to go code.
--Name conversion for go --Guess the go type from the value of each JSON field --Generate go type definition
For each, take a look at the original json-to-go.js code and write what you find interesting.
Some of the better name conversions for go are:
>>> from prestring.go import goname
>>> goname("foo_id")
'FooID'
>>> goname("404notfund")
'Num404Notfund'
>>> goname("1time")
'OneTime'
>>> goname("2times")
'TwoTimes'
>>> goname("no1")
'No1'
Conversions such as removing underscores or adding a special prefix to names that start with a number. It's something like CamelCase.
For example, it seems that special processing is done for specific acronym such as URL, API, ASCII, .... I'm doing my best. For a list of these special acronyms, it seemed taken from the golint code.
Basically, it feels like the type is branched one-to-one with an if statement from the value of the result of deserializing JSON.
I thought it was the following three things.
This can be written neatly as it is. I mean, I haven't read the original js code seriously. It's short so I'll post it here. It is generated by the following procedure.
For the json load of 1.
, just load the passed json. As a pre-process, I will include a process to convert the above ".0" to ".1" (probably not necessary in python, but it seems to be just in case).
It is the generation of struct info of 2.
. The struct info is not an exaggeration, but a dictionary like the one below. It is an image that parses the entire JSON once and takes the following information.
{"freq": 1, "type": "int", "children": {}, "jsonname": "num"}
freq is the frequency of occurrence, type is the type in go, children contains child elements if it is a dictionary (struct), and jsonname is the original json name (the key of the dictionary is the type name of go).
This is the code generation part of 3.
. This is the part where I can write beautifully as it is. Isn't it a level of code that can be read as it is?
def emit_code(sinfo, name, m):
def _emit_code(sinfo, name, m, parent=None):
if sinfo.get("type") == "struct":
with m.block("{} struct".format(name)):
for name, subinfo in sorted(sinfo["children"].items()):
_emit_code(subinfo, name, m, parent=sinfo)
else:
m.stmt('{} {}'.format(name, to_type_struct_info(sinfo)))
# append tag
if is_omitempty_struct_info(sinfo, parent):
m.insert_after(' `json:"{},omitempty"`'.format(sinfo["jsonname"]))
else:
m.insert_after(' `json:"{}"`'.format(sinfo["jsonname"]))
with m.type_(name, to_type_struct_info(sinfo)):
for name, subinfo in sorted(sinfo["children"].items()):
_emit_code(subinfo, name, m, parent=sinfo)
return m
That's why I got a JSON to go struct converter.
In fact, using ported code, [github API response](https://github. The result of converting com / podhmo / advent2016 / blob / master / json / github.json) is this is what it looks like .go). Since it is a long output, I made it a separate link.
The JSON-to-go converter I got, let's modify it a little.
After that, I will continue to check the output result using JSON of API response of github which was also used in JSON-to-Go.
You may want to assign a specific string, such as a URI or email address, to a different type. For example, change to use Uri of strfmt.
Try using strfmt.Uri if ": //" is included. I also tried to add import when strfmt.Uri is used. Change a few lines.
If you want to deal with various types seriously, it seems like you will write the correspondence in the part that guesses the type of go.
The following output part
package autogen
type AutoGenerated struct {
CloneURL string `json:"clone_url"`
CreatedAt time.Time `json:"created_at"`
...
It became as follows.
package autogen
import (
"github.com/go-openapi/strfmt"
"time"
)
type AutoGenerated struct {
CloneURL strfmt.Uri `json:"clone_url"`
CreatedAt time.Time `json:"created_at"`
...
From here onward is the beginning of the hesitation. Changes that I thought would be possible if I took a little care with a casual feeling began to cause various troublesome problems.
[Linked article](http://qiita.com/minagoro0522/items/dc524e38073ed8e3831b#%E8%A4%87%E9%9B%91%E3%81%AA%E6%A7%8B%E9%80% A0% E3% 81% AE-json-% E5% 87% A6% E7% 90% 86% E3% 81% A7% E7% 9B% B4% E9% 9D% A2% E3% 81% 99% E3% 82 % 8B% E5% 95% 8F% E9% A1% 8C) was rather dissed, but I got a code generator. I wanted to incorporate changes that would significantly change the output results. Try changing the structure of the output struct from a nested one to one with a flat structure. For the time being, I decided to use the field name at that time as the name of the struct.
The following definition
type AutoGenerated struct {
CloneURL strfmt.Uri `json:"clone_url"`
...
Name string `json:"name"`
OpenIssuesCount int `json:"open_issues_count"`
Organization struct {
AvatarURL strfmt.Uri `json:"avatar_url"`
EventsURL strfmt.Uri `json:"events_url"`
It has changed as follows.
type AutoGenerated struct {
CloneURL strfmt.Uri `json:"clone_url"`
...
Name string `json:"name"`
OpenIssuesCount int `json:"open_issues_count"`
Organization Organization `json:"organization"`
...
type Organization struct {
AvatarURL strfmt.Uri `json:"avatar_url"`
EventsURL strfmt.Uri `json:"events_url"`
Certainly, if you look at the output result, you can see the [previous output](https://github. Since the structure of the nesting relationship obtained in com / podhmo / advent2016 / blob / master / dst / jsontogo / github2.go) has disappeared, I feel that the parent-child relationship of the value is difficult to understand. To do.
Since the structure of the parent-child relationship of values is difficult to understand in flat output, I decided to add a comment at the beginning of the struct definition so that the structure of the nested relationship can be overlooked.
Change is about 20 lines. The change of ʻemit_code ()` itself of the original output function is about 2 lines.
The comments that have been added to the beginning of the definition are as follows. Now you can see the nesting relationship.
/* structure
AutoGenerated
Organization
Owner
Parent
Owner
Permissions
Permissions
Source
Owner
Permissions
*/
By the way, I've noticed it, and I'm sure if you're a good person, you'll notice it right away. There is a problem. Names can conflict. In the output of the original nested struct, the struct didn't need to be named because it was an immediate definition. As a result of making the nested output flat, the struct definition now needs a name. I used to use the field name directly. For example, in the following JSON, the names will conflict.
{
"title": "First diary",
"author": "foo",
"content": {
"abbrev": "I will start blogging from today....",
"body": "I will start blogging from today. This article is the X-day of the advent calendar.... ....."
},
"ctime": "2000-01-01T00:00:00Z",
"comments": [
{
"author": "anonymous",
"content": {
"body": "hmm"
},
"ctime": "2000-01-01T00:00:00Z"
}
]
}
The content of the article part and the content of the comment part collide.
/* structure
Article
Comments
Content
Content
*/
//Content of the article part
type Content struct {
Abbrev string `json:"abbrev"`
Body string `json:"body"`
}
//The content of the comment part
type Content struct {
Body string `json:"body"`
}
For the time being, it doesn't matter if it's clunky, so avoid name conflicts. Use an object called prestring.NameStore.
This is a dictionary-like object that if you put a name in the value, it will return a name that avoids collisions for duplicate names (overriding NameStore.new_name ()
will generate a name You can change the rules).
>>> from prestring import NameStore
>>> ns = NameStore()
>>> ob1, ob2 = object(), object()
>>> ns[ob1] = "foo"
>>> ns[ob2] = "foo"
>>> ns[ob1]
'foo'
>>> ns[ob2]
'fooDup1'
The change is about 10 lines. The value that seems to be unique for the shape of the dictionary for struct is generated in the text, and the name is managed using that as the key.
/* structure
Article
Comments
ContentDup1
Content
*/
type Content struct {
Abbrev string `json:"abbrev"`
Body string `json:"body"`
}
type ContentDup1 struct {
Body string `json:"body"`
}
CotentDup1 is not a good name, though. For the time being, the collision can be avoided. It's ridiculous to say that the packages are separated so that structs with the same name can be used together. For example, it would be convenient if there was a function that could easily introduce a namespace, such as a ruby module. There seems to be no function that can be used for go, so I will leave it.
Now that the name conflict has been resolved, let's look at the output again passing the JSON of the Github API.
/* structure
AutoGenerated
Organization
Owner
Parent
Owner
Permissions
Permissions
Source
Owner
Permissions
*/
I feel that Parent and Source have the same shape. Peeking inside It seemed to be the same type definition. And in fact, the type definition of Permissions has appeared many times. The top level Auto Generated itself feels like a superset such as Source, but I will leave it for the time being. I feel like I want to combine duplicate definitions into one. let's do it.
The changed part is about 10 lines. Since the scan of the comment part that outputs the nested structure of the struct that was output in the previous change (4. Add the parent-child relationship of the value as a comment) and the scan when outputting the struct definition are different, in the first place Separated the functions.
The output result looks like link, and I was able to remove the duplicate definition. But. There is one problem. It will be fixed in the next step.
There is one problem left. You can see that by looking at the Parent and Source parts. It has the following structure, and Source and Parent have the same shape.
/* structure
AutoGenerated
Parent
Owner
Permissions
Source
Owner
Permissions
*/
The fact that Parent and Source have the same shape means that the same type is used. The name of this type is not good. If anything, Source is still a better name, but a type named Parent has been defined and used. sad.
type AutoGenerated struct {
...
Parent Parent `json:"parent"`
...
Source Parent `json:"source"`
...
}
In the original JSON-to-GO, it is an anonymous struct that is defined immediately, so you do not have to worry about this, but as a result of making it a flat structure, you need to give it a name, and also specify that name. You may not have the proper information to do so.
I will try to resist a little. It's a bit tricky, but I'll decide later on which type name to generate. In such cases, use prestring.PreString.
from prestring import PreString
from prestring.go import GoModule
m = GoModule()
p = PreString("")
m.stmt("foo")
m.stmt(p)
m.stmt("bar")
p.body.append("*inner*")
print(m)
Of course, you can use the submodule introduced earlier. PreString is the most basic object, so if you just want to set a string later, you should use this.
foo
*inner*
bar
It's a little difficult to understand, but the model name is decided at the end. Specifically, it is decided in the following form.
#Pass a table of words that will be deducted as an argument
name_score_map={"parent": -1, '': -10}
#Determine the model name before emitting. New calculated here_Use name as the type name
new_name = max(candidates, key=lambda k: name_score_map.get(k.lower(), 0))
Changed to use Source instead of Parent in the output result. The implementation around here may be decided according to taste and personal taste. For example, you might be able to specify the converted name directly. In that case, as a condition for narrowing down, it may be convenient to pass something like a path at the time of search including the type of toplevel to the argument instead of just name.
Anyway, I was able to make the model name at the time of output a better name.
type AutoGenerated struct {
...
Parent Source `json:"parent"`
...
Source Source `json:"source"`
...
}
There are quite a few tools in go that generate code. This is just a bit of a grumble, but the code generated by mockgen
in golang / mock doesn't add a comment to the beginning of the definition. This is scolded by golang / lint, which makes me feel very disturbed.
And this is always automatic generation, but if you add a comment to the code generated by golang / mock and make it correspond, all will disappear when the original interface definition is replaced and regenerated. .. Since it can't be helped, we have to do special processing such as excluding it from the target of golint. I'm tired.
By the way, let's add the result of this code generation as there is no comment at the beginning of the definition.
Changed one line. I need to use prestring.LazyFormat because I am using prstring.PreString. Just add one line. It's close to a dead rubber match.
In addition, let's include the original JSON value as an example in the tag as to what value will be entered.
Changes are a few lines. I'm getting tired of it, so I'll finish it.
It is now output in the following format.
// AutoGenerated : auto generated JSON container
type AutoGenerated struct {
CloneURL strfmt.Uri `json:"clone_url" example:"https://github.com/octocat/Hello-World.git"`
CreatedAt time.Time `json:"created_at" example:"2011-01-26T19:01:12Z"`
DefaultBranch string `json:"default_branch" example:"master"`
Description string `json:"description" example:"This your first repo!"`
...
The final code is here.
Only the first and last chords and output results are listed below.
-Output result of Original ported code /advent2016/blob/master/dst/jsontogo/github.go) -Final Code [Output](https://github.com/podhmo/advent2016/ blob / master / dst / jsontogo / github9.go)
It's a little detour. I was at a loss for a while and decided to select a flat format output. In the process, we added a function to remove duplicate definitions. This removal of duplicate definitions had an unexpected side effect. It's like a little extra, but I'll introduce it.
For example, take a look at the following JSON.
{
"total": 100,
"left": {
"total": 75,
"left": {
"total": 25,
"left": {
"total": 20
},
"right": {
"total": 5
}
},
"right": {
"total": 50,
"left": {
"total": 25
},
"right": {
"total": 25
}
}
},
"right": {
"total": 25,
"left": {
"total": 20,
"left": {
"total": 10
},
"right": {
"total": 10
}
},
"right": {
"total": 5
}
}
}
This is something that is a nice binary tree.
The output of this in the original nested format is as follows. Since the type that directly corresponds to the passed JSON value has been generated, not only is there a useless struct definition, but if the structure changes even a little, it will not be possible to parse. bad.
type Tree struct {
Left struct {
Left struct {
Left struct {
Total int `json:"total"`
} `json:"left"`
Right struct {
Total int `json:"total"`
} `json:"right"`
Total int `json:"total"`
} `json:"left"`
Right struct {
Left struct {
Total int `json:"total"`
} `json:"left"`
Right struct {
Total int `json:"total"`
} `json:"right"`
Total int `json:"total"`
} `json:"right"`
Total int `json:"total"`
} `json:"left"`
Right struct {
Left struct {
Left struct {
Total int `json:"total"`
} `json:"left"`
Right struct {
Total int `json:"total"`
} `json:"right"`
Total int `json:"total"`
} `json:"left"`
Right struct {
Total int `json:"total"`
} `json:"right"`
Total int `json:"total"`
} `json:"right"`
Total int `json:"total"`
}
On the other hand, if you output this in a flat format that eliminates duplicate definitions, it will be as follows (comments that specify the structure are omitted because they are annoying). It's just a recursive definition, so it's natural. good. (However, if this is left as it is, zero value will not be determined and it will recurse infinitely, so an error will occur. * It is useless unless it becomes Tree. Is it annoying to use references to all structs as pointers? Seriously Looking for a circular of references to determine the end)
// Tree : auto generated JSON container
type Tree struct {
Left Tree `json:"left"`
Right Tree `json:"right"`
Total int `json:"total" example:"100"`
}
It's been long, but that's it.
In this article I wrote python code to generate go code. Specifically, I implemented a process similar to the conversion of JSON-to-Go JSON to go struct definition in python. After that, I left it to the mood of the time and changed the code and changed the output result while muttering about my thoughts.
When I wrote a little about what I thought in the process, it was that the reinvention of the wheel (in this case, reimplementation) might not be unexpectedly bad. First of all, you will get an implementation that has a complete grasp of everything. Not only does the implementation know where and what to fix, but it also feels like it's yours, and you'll want to tweak it. From there it is the beginning of a personal exploration. How about changing this part like this? You may be able to see the divisions and ideas made by the person who created the original implementation while making small changes and comparing the difference between the change result and the original result. .. For example, in this example, it is strange to select nested output.
And this time it was automatically generated without considering the type of go. Next time, I thought it would be nice if I could write about automatic generation considering the type of go.
By the way, for this purpose only, there is no need to reimplement in python, and there are the following tools to output the definition of go struct from JSON made by go.
Postscript:
Some people mentioned gojson in this year's Advent calendar.
-Easy implementation of API library with go-json (revised) --Qiita
Recommended Posts