In the previous article (https://qiita.com/IchiLab/items/fd99bcd92670607f8f9b), we summarized how to use the Object Detection API provided by TensorFlow. In order to use this API to teach "this object is XX" from images and videos, Annotate and convert to TFRecord format, It will be used as teacher data and verification data, but I feel that there are still few articles that mention the specific contents of this TFRecord format.
In this article, we will start with a method that can be created without programming. We will summarize from various angles, including how to write and create Python code. Of course, I'm just summarizing the findings I found through trial and error. Please note that there may be some parts that cannot be reached.
--Procedure for creating a TFRecord file on Microsoft's VoTT ――What kind of file is TFRecord? --How to cut out the annotation part from the annotated data --Try creating a TFRecord for object detection in Python
Download it from the following site. Select .exe
if the OS is Windows or .dmg
if the OS is Mac.
VoTT
First, create a new project with "New Project".
Set the project.
Display Name
: Please use any name you like.Security Token
: OK by default.Source Connection
: Setting name of the place to read the imageTarget Connection
: The setting name of the location where the project will be saved. .vott
, .json
and TFRecord are saved in the directory specified here.When VoTT specifies each directory, it saves the setting information such as the location of the folder with "Add Connection". In the "~ Connection" item, the setting name is specified. Therefore, when using it for the first time, start by creating the setting with "Add Connection" on the right side.
Set the directory.
Finally, select "Save Connection".
After setting "Source Connection" and "Target Connection" respectively, the next step is annotation work.
Annotation can be done at the second from the top on the left (below the house mark). (The photo shows my cats Mimmy and Kitty)
First, set the tag. It says TAGS on the right side, and there is a + icon next to it. If you select this, you can set a new tag. This time, I set the name of my cat as "Mimmy" and "Kitty".
Then select the second square icon from the left at the top. Then drag the place you want to annotate and enclose it.
It may be a gray square at the time of enclosing. If you want to give an arbitrary tag name at the moment of enclosing, after selecting the tag name on the right side, If you select an icon like a key mark and set "I will attach it with this tag fixed", When you annotate, it will automatically give the tag name. (Or, on Mac, you can do the same operation by holding down the Command key and clicking the tag name. Ctrl key on Windows ...? Unconfirmed)
You can also annotate by fixing it to a square by pressing the Shift key during annotation. It's a good idea to get used to this area by touching it.
To generate a TFRecord, it must be set in advance in the export settings. Select the fourth arrow icon from the top in the menu icon on the left.
After completing the settings, select "Save Export Settings" to save the settings.
After that, return to the annotation screen, save the project with the floppy icon on the upper right side, and export the format (TFRecord this time) set with the arrow icon on the upper right. By the way, the json file is created without permission when you save it even if you do not set anything.
The above is the procedure for creating a TFRecord file using VoTT.
The next is the main subject of this article.
What is TFRecord in the first place?
Here's an excerpt from the official TensorFlow tutorial:
The TFRecord format is a simple format for storing a series of binary records. Protocol buffers are platform- and language-independent libraries that efficiently serialize structured data.
Reference) Usage of TFRecords and tf.Example
Just reading this doesn't come to mind. Now let's take a look at the contents of the TFRecord that was exported using VoTT earlier.
The contents of TFRecord can be visualized with the following sources.
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import IPython.display as display
#Specify the path of TFRecord
filenames = 'VoTT/Cat/Cat-TFRecords-export/Mimmy_and_Kitty.tfrecord'
raw_dataset = tf.data.TFRecordDataset(filenames)
#Export the read content to another format
# (.It may be txt. If it is json, it depends on the editor, but it is colored and easy to see, so it is recommended from txt)
tfr_data = 'tfr.json'
for raw_record in raw_dataset.take(1):
example = tf.train.Example()
example.ParseFromString(raw_record.numpy())
print(example)
#Write to a file. It's not required because you can see it on the console without exporting it.
with open(tfr_data, 'w') as f:
print(example, file=f)
Let's take a look at the file exported from the above source.
features {
feature {
key: "image/encoded"
value {
bytes_list {
value: "\377\330\377...
.....(Since it is a large amount, it is omitted)..."
}
}
}
feature {
key: "image/filename"
value {
bytes_list {
value: "Mimmy_and_Kitty.jpg "
}
}
}
feature {
key: "image/format"
value {
bytes_list {
value: "jpg"
}
}
}
feature {
key: "image/height"
value {
int64_list {
value: 1440
}
}
}
feature {
key: "image/key/sha256"
value {
bytes_list {
value: "TqXFCKZWbnYkBUP4/rBv1Fd3e+OVScQBZDav2mXSMw4="
}
}
}
feature {
key: "image/object/bbox/xmax"
value {
float_list {
value: 0.48301976919174194
value: 0.7260425686836243
}
}
}
feature {
key: "image/object/bbox/xmin"
value {
float_list {
value: 0.3009025752544403
value: 0.5285395383834839
}
}
}
feature {
key: "image/object/bbox/ymax"
value {
float_list {
value: 0.6981713175773621
value: 0.8886410593986511
}
}
}
feature {
key: "image/object/bbox/ymin"
value {
float_list {
value: 0.3555919826030731
value: 0.5664308667182922
}
}
}
feature {
key: "image/object/class/label"
value {
int64_list {
value: 0
value: 1
}
}
}
feature {
key: "image/object/class/text"
value {
bytes_list {
value: "Mimmy"
value: "Kitty"
}
}
}
feature {
key: "image/width"
value {
int64_list {
value: 2560
}
}
}
}
Other keys such as difficult
, truncated
, view
, source_id
are included,
Here, I have extracted only the contents that I think are necessary.
If you check the contents, you can see that it has the following structure.
, ʻimage / height
: Image sizexmin
, xmax
, ymin
, ymax
: Annotated coordinate information. Only the number of annotations are included.class / text
, class / label
: Tag information. You can think of text as the name of the tag and the number on the label as the number given to the tag name. In this case, "Mimmy" is 0 and "Kitty" is 1.At this point, you've probably come to understand what kind of structure TFRecord is made up of.
The following is a method to programmatically cut out the part annotated with VoTT. If you can do this, when you write your own program and create a TFRecord file, You may need an idea to ** synthesize the object you want to detect with the background **, as described below. Or it will be useful for machine learning of image classification.
Also, this is what I noticed after trying, ** The orientation of the image seen by VoTT and the orientation of the image when actually cropping may be different ** I found out that.
In other words, the image you see on the screen when annotating in VoTT, It means that the orientation of the image of the original data when cutting out with json information was sometimes different by 180 degrees.
Thanks to that, only one image of an unintended place was cut out. I'm not sure if this is annotated in the correct orientation in TFRecord, so It may be safe to look at it once.
Well, the introduction has become long, but let's check the json for image cropping immediately. I mentioned earlier that json is also automatically exported when you finish the annotation work with VoTT and export it.
The cat data given in the example was written out as follows.
{
"asset": {
"format": "jpg",
"id": "1da8e6914e4ec2e2c2e82694f19d03d5",
"name": "Mimmy_and_Kitty.jpg ",
"path": "【Folder name】/VoTT/Cat/IMAGES/Mimmy_and_Kitty.jpg ",
"size": {
"width": 2560,
"height": 1440
},
"state": 2,
"type": 1
},
"regions": [
{
"id": "kFskTbQ6Z",
"type": "RECTANGLE",
"tags": [
"Mimmy"
],
"boundingBox": {
"height": 493.3142744479496,
"width": 466.2200532386868,
"left": 770.3105590062112,
"top": 512.0524447949527
},
"points": [
{
"x": 770.3105590062112,
"y": 512.0524447949527
},
{
"x": 1236.5306122448978,
"y": 512.0524447949527
},
{
"x": 1236.5306122448978,
"y": 1005.3667192429023
},
{
"x": 770.3105590062112,
"y": 1005.3667192429023
}
]
},
],
"version": "2.1.0"
}
(The information on the other Kitty tag is omitted because it will be long if it is placed.)
As you can see, it contains the filename and image size, as well as the annotated coordinate information. It is quite possible to cut out an image using only this information.
The annotated coordinate information is carefully included in two types, boundingBox
and points
.
There seem to be various ways, but this time I went to see the boundingBox
and tried to cut it out.
Below is the source code.
import json
import os
import fnmatch
import cv2 as cv
JSON_DIR = 'VoTT/Cat/'
IMG_DIR = 'VoTT/Cat/'
CUT_IMAGE = 'cut_images/'
CUT_IMAGE_NAME = 'cat'
IMAGE_FORMAT = '.jpg'
class Check():
def filepath_checker(self, dir):
if not (os.path.exists(dir)):
print('No such directory > ' + dir)
exit()
def directory_init(self, dir):
if not(os.path.exists(dir)) :
os.makedirs(dir, exist_ok=True)
def main():
check = Check()
#Check if the directory containing the json file exists
check.filepath_checker(JSON_DIR)
#'Prepare a storage location for CUT images'
check.directory_init(CUT_IMAGE)
#Analyze json and cut out from image and annotation coordinates
count = 0
for jsonName in fnmatch.filter(os.listdir(JSON_DIR), '*.json'):
#open json
with open(JSON_DIR + jsonName) as f :
result = json.load(f)
#Get image file name
imgName = result['asset']['name']
print('jsonName = {}, imgName = {} '.format(jsonName, imgName))
img = cv.imread(IMG_DIR + imgName)
if img is None:
print('cv.imread Error')
exit()
#Loop as many times as annotated
for region in result['regions'] :
height = int(region['boundingBox']['height'])
width = int(region['boundingBox']['width'])
left = int(region['boundingBox']['left'])
top = int(region['boundingBox']['top'])
cutImage = img[top: top + height, left: left + width]
#Avoid information that you accidentally clicked on one point during annotation
if height == 0 or width == 0:
print('<height or width is 0> imgName = ', imgName)
continue
#Uncomment if you want to resize before exporting
#cutImage = cv.resize(cutImage, (300,300))
#「cut_images/cat0000.Export files with serial numbers such as "jpg"
cv.imwrite(CUT_IMAGE + CUT_IMAGE_NAME + "{0:04d}".format(count + 1) + IMAGE_FORMAT, cutImage)
print("{0:04d}".format(count+1))
count += 1
if __name__ == "__main__":
main()
There is a conditional branch of ʻif height == 0 or width == 0` in the source code. The part clicked by mistake during annotation with VoTT remains as data, Because there was an error because there was no area to cut out I included it to avoid that human error.
In my case, there was a need to annotate a lot on one sheet, so It was an increasingly difficult situation to notice. Moreover, it is even more so when there is a large amount of image data.
Well, it's been a long time, but let's write a program to create TFRecord from now on.
Now that you have a rough idea of the composition of the contents of TFRecord Finally, let's write the source code and generate TFRecord.
What we are doing with the sources posted this time is as follows.
First, the background image was borrowed from the material site. it's here.
And here is the object image to be combined.
Below is the source code.
import tensorflow as tf
import cv2 as cv
import utils.dataset_util as dataset_util
def img_composition(bg, obj, left, top):
"""
Function that synthesizes background and object
----------
bg : numpy.ndarray ~ background image
obj : numpy.ndarray ~ object image
left :int ~ Coordinates to synthesize (left)
top :int ~ Coordinates to synthesize (above)
"""
bg_img = bg.copy()
obj_img = obj.copy()
bg_h, bg_w = bg_img.shape[:2]
obj_h, obj_w = obj_img.shape[:2]
roi = bg_img[top:top + obj_h, left:left + obj_w]
mask = obj_img[:, :, 3]
ret, mask_inv = cv.threshold(cv.bitwise_not(mask), 200, 255, cv.THRESH_BINARY)
img1_bg = cv.bitwise_and(roi, roi, mask=mask_inv)
img2_obj = cv.bitwise_and(obj_img, obj_img, mask=mask)
dst = cv.add(img1_bg, img2_obj)
bg_img[top: obj_h + top, left: obj_w + left] = dst
return bg_img
def set_feature(image_string, label, label_txt, xmins, xmaxs, ymins, ymaxs):
"""
Function to set the information to be written to TFRecord
To use this function, you can use it in the "object detection" directory of the TensorFlow Object Detection API.
You need to bring the "util" library
----------
image_string :bytes ~ Precomposed image information
label :list ~ Annotated tag number
label_txt :list ~ Annotated tag name
xmins, xmaxs, ymins, ymaxs :list ~ 0 annotated coordinates.0~1.Value represented by 0
"""
image_shape = tf.io.decode_jpeg(image_string).shape
feature = {
'image/encoded': dataset_util.bytes_feature(image_string),
'image/format': dataset_util.bytes_feature('jpg'.encode('utf8')),
'image/height': dataset_util.int64_feature(image_shape[0]),
'image/width': dataset_util.int64_feature(image_shape[1]),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
#If you want to give only one piece of information, you can use this commented out function (match the types).
# 'image/object/class/label': dataset_util.int64_feature(label),
# 'image/object/class/text': dataset_util.bytes_feature(LABEL.encode('utf8')),
#If you want to tag two or more on one sheet, click "_list_Use a function that contains ""
#Of course, you can use only one, so it is recommended to use this after all
'image/object/class/label': dataset_util.int64_list_feature(label),
'image/object/class/text': dataset_util.bytes_list_feature(label_txt),
}
return tf.train.Example(features=tf.train.Features(feature=feature))
def main():
#Each file path name
bg_image_path = './comp/bg.jpg'
obj_img_path = './comp/Mimmy_image.png'
comp_img_path = './comp/img_comp.jpg'
tfr_filename = './mimmy.tfrecord'
#For TF Record
tag = {'Mimmy': 0, 'Kitty': 1, 'Mimelo': 2}
xmins = []
xmaxs = []
ymins = []
ymaxs = []
class_label_list = []
class_text_list = []
datas = {}
#Label name setting
class_label = tag['Mimmy']
#Loading background
bg_img = cv.imread(bg_image_path, -1)
bg_img = cv.cvtColor(bg_img, cv.COLOR_RGB2RGBA)
bg_h, bg_w = bg_img.shape[:2]
#Loading an object
obj_img = cv.imread(obj_img_path, -1)
obj_img = cv.cvtColor(obj_img, cv.COLOR_RGB2RGBA)
scale = 250 / obj_img.shape[1]
obj_img = cv.resize(obj_img, dsize=None, fx=scale, fy=scale)
obj_h, obj_w = obj_img.shape[:2]
#Combine background and object
x = int(bg_w * 0.45) - int(obj_w / 2)
y = int(bg_h * 0.89) - int(obj_h / 2)
comp_img = img_composition(bg_img, obj_img, x, y)
#Exporting a composite image
cv.imwrite(comp_img_path, comp_img)
#Added to TFRecord coordinate information list
xmins.append(x / bg_w)
xmaxs.append((x + obj_w) / bg_w)
ymins.append(y / bg_h)
ymaxs.append((y + obj_h) / bg_h)
#Added TFRecord label information
class_label_list.append(class_label)
class_text_list.append('Mimmy'.encode('utf8'))
datas[comp_img_path] = class_label
#Process to create TFRecord
with tf.io.TFRecordWriter(tfr_filename) as writer:
for data in datas.keys():
image_string = open(data, 'rb').read()
tf_example = set_feature(image_string, class_label_list, class_text_list, xmins, xmaxs, ymins, ymaxs)
writer.write(tf_example.SerializeToString())
if __name__ == "__main__":
main()
Here is the image created after executing the program.
As mentioned in the comments in the source code, please note that the library included in the TensorFlow Object Detection API is required.
xmin
, ymin
)This time, for the sake of explanation, I introduced one image for compositing and the source code to generate one TFRecord file. However, I think it is actually necessary to generate a large number of TFRecord files from more and more images.
In my case, how to randomly select and combine images from the specified folder, I tried a method of mass production by making the coordinates to be combined a little random.
If you can find out various tips for mass production of teacher data from this article, I would love to hear from you.
Finally, just in case, the contents of the TFRecord file I just created, Let's check with the method introduced in the first half.
features {
feature {
key: "image/encoded"
value {
bytes_list {
value: "\377\330\377...
.....(Since it is a large amount, it is omitted)..."
}
}
}
feature {
key: "image/format"
value {
bytes_list {
value: "jpg"
}
}
}
feature {
key: "image/height"
value {
int64_list {
value: 1397
}
}
}
feature {
key: "image/object/bbox/xmax"
value {
float_list {
value: 0.5151041746139526
}
}
}
feature {
key: "image/object/bbox/xmin"
value {
float_list {
value: 0.38489583134651184
}
}
}
feature {
key: "image/object/bbox/ymax"
value {
float_list {
value: 0.9878310561180115
}
}
}
feature {
key: "image/object/bbox/ymin"
value {
float_list {
value: 0.7916964888572693
}
}
}
feature {
key: "image/object/class/label"
value {
int64_list {
value: 0
}
}
}
feature {
key: "image/object/class/text"
value {
bytes_list {
value: "Mimmy"
}
}
}
feature {
key: "image/width"
value {
int64_list {
value: 1920
}
}
}
}
As mentioned above, you only have to do it once, so check the contents properly and If it is the intended value, there will be no problem!
For data in TFRecord format that is useful for object detection in TensorFlow It's been a long article, but I've summarized it as far as I can understand.
We hope that this article will expand the range of teacher data creation. Thank you for reading to the end.
Recommended Posts