Aidemy 2020/10/
Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the third post on data handling. Nice to meet you.
Please note that this chapter is esoteric and I do not fully understand it.
This article is a summary of what you learned in "Aidemy" "in your own words". It may contain mistakes and misunderstandings. Please note.
What to learn this time ・ About Protocol Buffers ・ About hdf5 ・ About TF Record
Protocol Buffers are used by Google to store data and exchange all kinds of structured information. (Quote: wikipedia Protocol Buffers "https://ja.wikipedia.org/wiki/Protocol_Buffers")
-As a data processing method, Message Type is defined in advance. -Message Type is like a class and is defined in the language proto2.
・ First, let's see how to write it with reference to the source code of Message Type that summarizes the family structure.
-Code![Screenshot 2020-10-28 22.39.00.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/c16c44d0-1883-12cd- a3b4-81fd090d4650.png)
-Declare the use of proto2 with __ "syntax =" proto2 ";" . Be sure to add ";" at the end of the line. - "Message Person {}" __ represents a class called "Person". -Comments can be represented by __ "//" __ on one line and __ "/ * * /" __ on multiple lines. -For "required string name = 1;", __ "string name" __ indicates that "name" is str type. These two words are collectively called field. The __ "= 1" __ part is called the __ tag __ and has the role of distinguishing the data when outputting the data. __ "required" __ must be added to "required items". -Similarly, if "required int32 age = 2;", it means that "age" is an int type and the tag is 2.
-"Enum Relationship {}" newly defines "types" such as str and int. Here, it is called "Relationship type". -In enum, it is necessary to add a new tag to each value (MOTHER, etc.). Tags in enums start with "0". -"Required Relationship relationship = 4;" is the same as "str name", which means that "relationship" is a Relationship type and the tag is 4.
-"Message Family {}" is the same as Person and represents Family class. -__ "Repeated" __ of "repeated Person person = 1;" is like a "list", in which case Person type data is listed.
-The file in which the above code is written is called "family.proto". To make this file workable with python, use a command __protoc --python_out = Save destination file path Message Type File name __ Enter.
-By reading the Message Type file in python, you can use the type defined in it (Family type etc. in family.proto). Use this to actually enter data in python.
・ Code![Screenshot 2020-10-28 22.40.57.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/9b0bb720-d9c4-5b63- b475-8c496b12dd63.png)
・ Result (only part)![Screenshot 2020-10-28 22.41.42.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/b2c53d35 -0695-7324-2d0f-fc31d5fe81b7.png)
-Hdf5 is a data format used in keras, for example, it is used when saving a learning model created in keras. -In hdf5, __hierarchical structure can be completed in one file. __ In other words, even if multiple folders (directories) are created hierarchically, files can be created comprehensively on the hdf5 side.
-Use a library called h5py and Pandas to create it. -In the following, an hdf5 file will be created using the population of prefecture A as an example.
-Code![Screenshot 2020-10-28 22.43.22.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/aec2bdea-e1da-2327- 6f2b-9684858f1cbb.png)
· Open hdf5 file: __ hdf5.File ("filename") __ -Create group (directory): __ file .create_group ("group name") __ -Write the file with __flush () __ and close it with __close () __.
TFRecord is a simple record-oriented binary format that allows you to process large amounts of data that can't fit in memory. Quote: How to create and read tdl TensorFlow recommended format "TFRecord" [https://www.tdi.co.jp/miso/tensorflow-tfrecord-01#:~:text=TFRecord%E3%81%AF%E3%80%81%E3%80%8C%E3%83%A1%E3%83%A2%E3%83%AA%E3%81%AB%E5%8F%8E%E3%81%BE%E3%82%89,%E3%81%AE%E3%83%95%E3%82%A9%E3%83%BC%E3%83%9E%E3%83%83%E3%83%88%E3%80%8D%E3%81%A8%E3%81%84%E3%81%86%E3%81%93%E3%81%A8%E3%81%A7%E3%81%99%E3%80%82]
-TFRecord is a data format used in TensorFlow, which enables processing of large amounts of data as described above.
-The flow is like "reading an image", "defining what to write out", and "writing". ・ Actually do the following (file path is fictitious)
-Code![Screenshot 2020-10-28 22.45.18.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/259f8e5c-c321-d243- 1ae8-5ba83afb733a.png)
-For the "Definition of data to be exported", __ "tf.train.Example ()" "tf.train.Features ()" "tf.train.Feature ()" "tf.train.ByteList ()" __ Many instances such as are generated hierarchically, but each has a role. -For "tf.train.ByteList (value = [data])", this creates an instance with the data in __ [] __. This data needs to be byte type, so use __tobytes () __. -"'Key': tf.train.Feature ()" creates a __Feature instance with a __ key from a ByteList instance. -"Tf.train.Features ()" is a __dictionary that collects multiple Feature instances. -"Tf.train.Example ()" creates an Example instance from the Features instance. This allows you to write to files.
-For the "write" part, __ "tf.python_io.TFRecordWriter ('filename')" __ is the TFRecord version of "open ('w')". -__ "Fp.write (my_Example.SerializePartialToString ())" __, complete if you finally write.
-The list has a general __variable length __ whose length can be changed and a __fixed length __ where only fixed data can be entered. -The list of python is usually variable length, but "tf.train.Example ()" in the previous section is fixed length. -When generating variable length data, use __ "tf.train.SequenceExample ()" __.
This time is over. Thank you for reading to the end.
Recommended Posts