In this tutorial, we will explain the specifications and behavior of image recognition by Pepper through samples.
** For image recognition, there is no operation confirmation method for virtual robots, and an actual Pepper machine is required. ** I would like you to experiment with the actual Pepper machine at Aldebaran Atelier Akihabara. (Reservation URL: http://pepper.doorkeeper.jp/events)
The specifications related to Pepper's image processing are as follows.
Pepper uses these cameras to recognize people and objects.
You can check the information of the image taken by Pepper's camera in Choregraphe.
Use the video monitor panel for operations related to Pepper images. The Video Monitor panel is usually located in the same area as the Pause Library and can be viewed by selecting the Video Monitor tab. If you cannot find it, select [Video Monitor] from the [View] menu.
The video monitor panel provides the ability to view images from Pepper's camera, as well as manage the visual recognition database described below.
You can also use the Monitor application that is installed with Choregraphe. Start the Monitor application as follows.
Launch the ** Monitor ** application that is installed with Choregraphe.
Click ** Camera ** in the Monitor application launch menu
A dialog will open asking you to connect to Pepper, so select the Pepper you are using.
The Monitor window will open. Click the ** Play button **.
You can check the image taken by Pepper's camera You can stop playback with the ** Pause button **
In addition to images, you can check information related to image recognition. By checking ** face detection [A] **, you can check the status of ** face recognition in Pepper [B] **.
If you want to check the contents of the 3D camera, ** Select [3d sensor monitor] from the [Load Plugin] menu **
You can check the depth map by clicking the ** play button ** in the same way as a 2D camera. You can stop playback with the ** Pause button **
With this Monitor application, you can check the contents of the image recognized by Pepper in detail.
By using the Face Detection box provided as a standard box, you can get the number of faces currently recognized by Pepper. Here, I'll try to combine the Say Text boxes I've used several times to make ** talk the number of faces that Pepper recognizes **.
By connecting numberOfFaces (orange, type: number) in the Face Detection box and onStart (blue, type: string) in the Say Text box, you can make Pepper speak the value of numberOfFaces output by the Face Detection box. I will.
The application is now complete. When a face is recognized, the Face Detection box outputs numberOfFaces, and Pepper speaks in response to this output.
Connect to Pepper and try playing. When you show your face to Pepper, Pepper talks about the number of faces in sight, like "Ichi" and "Ni".
In Robot View, you can also know the position of the face recognized by Pepper.
For example, as shown above, when Pepper recognizes a face, a face mark will appear in the robot view. This shows the position of the face that Pepper is aware of.
This sample is as simple as saying numbers such as "ichi" and "ni". It's enough to see the movement of the Face Detection box, but it's an application that's hard to tell what you're doing.
Here, as an example, if the number of detected faces is 1, let's change it so that it says "There is one person in front of me" **.
The Say Text box is a Python box that uses the talking API ** ALTextToSpeech API **. If you can manipulate the string passed from the Say Text box to ALTextToSpeech, you can change what you say.
This time, I will try to operate the character string in the Say Text box. Double-clicking on the Say Text box will open the Python code, looking for the ʻonInput_onStart (self, p) `function in this code. You can see that there is a line like this:
sentence = "\RSPD="+ str( self.getParameter("Speed (%)") ) + "\ "
sentence += "\VCT="+ str( self.getParameter("Voice shaping (%)") ) + "\ "
sentence += str(p)
sentence += "\RST\ "
id = self.tts.post.say(str(sentence))
self.ids.append(id)
self.tts.wait(id, 0)
p contains the value entered in the Say Text box and constructs the string to give to the ALTextToSpeech API at sentence + = str (p)
.
Therefore, if you change this part to sentence + =" there is a person in front of me "+ str (p) +" there are people "
etc., instead of "one", "there is one person before me" "(" One person "speaks" one person ").
Similar to voice allows Pepper to track the direction of the face. In the audio example, we were moving only the angle of the neck, but here we will use the Face Tracker box to ** move in the direction of the face **.
You can track your face just by starting the Face Tracker box.
The application is now complete. The Face Tracker box has a big feature that "identifies the face and moves in that direction", so the flow can be as simple as this.
Connect to Pepper and try playing. When a human is nearby, it tries to track the recognized face by bending its neck, but as it gradually moves away with the face facing, Pepper moves toward the face. If you are connected to Pepper by wire, be careful not to move it in an unexpected direction.
Before I get used to it, I'm a little scared to be chased while staring at Pepper, but eventually I may think that my eyes are cute ...!
In the previous example, I simply counted and chased "faces". Here, let's take a look at learning to remember who the face is.
You can let Pepper learn with faces by using the Learn Face box. Here, we will try ** to memorize the face that Pepper saw with the name "Taro" 5 seconds after playback **.
Connect the boxes
Set the character string
Now, Learn Face enables applications where Pepper's face is learned as a "taro".
After connecting to Pepper and playing, make sure your face is within the range of Pepper's camera. Five seconds after playback, Pepper's eyes will turn green ** if the face can be learned normally, and ** red ** if it fails.
You can delete the learned face data by executing ** Unlearn All Faces Box **.
#### [Reference] Learn Face contents You can see how "Wait for 5 seconds" and "Eyes turn green" in the Learn Face box are realized by double-clicking the Learn Face box. You can see that the Learn Face box is a flow diagram box and is represented as a collection of simpler boxes such as Wait. By looking inside the box in this way, you can use it as a reference when thinking about how to use the box.
After learning a face, based on the learning data, ** determine who the face Pepper is currently recognizing is and let him speak its name **.
Connect to Pepper and play. If you show your face to Pepper and say the name you learned, such as "Taro", you will be successful. Learn Face multiple faces to see if Pepper can be identified properly.
Choregraphe has a function to operate the visual recognition database, which can be used to make Pepper learn something other than the human face.
To learn the images, use Choregraphe's video monitor panel. Here, let's learn ** NAO ** in the atelier.
Connect to Pepper, with the object on the video monitor, click the ** Learn button **
Click the left mouse button on ** Boundary of Object **
Create a vertex by clicking the left button as if drawing a straight line.
Set the vertices to surround the object, and finally left-click on the starting point.
The area of the object will be identified and a dialog will open asking you to enter information. Enter the appropriate information.
Enter NAO here.
You have now associated Pepper with the string "NAO" to the image features of NAO.
As with face discrimination, we will try to ** talk about what you are looking at based on the contents of the trained visual recognition database **.
The Vision Reco. Box allows you to match what Pepper is currently looking at with the visual recognition database to get the name of the object.
Connect the boxes (1) First, give the onPictureLabel output (blue, string) of the Vision Reco. Box to the Say Text box, similar to face discrimination.
Connect the boxes (2) In this sample, the ** onPictureLabel output ** and ** onStop input ** of the Vision Reco. Box are connected in order to stop the operation of the Vision Reco. Box while talking after recognition (the reason will be described later). After talking, connect the ** onStopped output ** of the Say Text box to the ** onStart input ** of the Vision Reco. Box to resume operation of the Vision Reco. Box.
#### [Tips] Vision Reco. Explicit stop of the box When the Vision Reco. Box recognizes an image, it outputs it periodically even if there is no change in the recognition situation. Also, in the request from the Say Text box to the ALTextToSpeech API that we saw earlier, even if the Say Text box is already running, the newly entered character string is not discarded and is accumulated so that it can be processed next. It has become like.
Therefore, if you only connect in 2., not only will you continue to talk "NAO", "NAO", "NAO" while showing NAO, but this will continue for a while even if you remove NAO from Pepper's field of view. It will be. To prevent such problems, after the Vision Reco. Box outputs the recognition result, the Vision Reco. Box is temporarily stopped, and after the Say Text box is finished, the Vision Reco. Operation is restarted. ..
Please connect to Pepper and play the created application. Speaking of "NAO" when showing NAO is a success.
As you can see, Pepper has various functions for image recognition. By controlling Pepper with the information obtained from the eyes, the range of control can be expanded. Please give it a try.
Recommended Posts