I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2

Preface

That's why I will describe the actual verification of speaker identification using the "Speaker Recognition API". (Please let me know if there is something strange!)

Process flow

The following three steps are required to identify the speaker.

  1. Create a user profile
  2. Register the voice in the user's profile
  3. Identify who said based on the registered voice

So this time, I would like to create three processes for each step so that I can easily understand them.

Step 1 Create a user profile

First, create the user you want to identify the speaker. As an API function, use "Create Profile" of "Identification Profile". It creates a profile for the user and returns the user's profile ID. (Since the name is not registered, it is necessary to manage the list separately)

In the verification script, the user name is specified as an argument and the user name and ID are linked to the file "Profile_List.csv" and output.

CreateProfile.py

########### module #############

import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv

########### Args & variable #########################
args = sys.argv
Profile_Name = args[1]
Profile_List = 'Profile_List.csv'

########### Create Profile #########################
with open(Profile_List) as fp:
    lst = list(csv.reader(fp))

for i in lst:
    if Profile_Name in i:

print ('The specified user is already registered.') sys.exit()

ApiPath = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identificationProfiles'

headers = {
    # Request headers
    'Content-Type': 'application/json',

'Ocp-Apim-Subscription-Key':'', }

body = {
    'locale':'en-us',
}

r = requests.post(
    ApiPath,            # URL 

headers = headers, # headers json = body # body )

try:
    ProfileId = r.json()['identificationProfileId']
except Exception:
    print('Error:{}'.format(r.status_code))
    print(r.json()['error'])
    sys.exit()

print(ProfileId)

f = open(Profile_List, 'a')
writer = csv.writer(f, lineterminator='\n')
writer.writerow([Profile_Name, ProfileId])
####################################

Step 2 Register the voice in the user's profile

We will register the voice to the user created above. (Unlike speaker authentication, no phrase is specified, so anything is OK.)

The following functions are used here.

  1. "Create Enrollment" of "Identification Profile" (voice registration)
  2. "Get Operation Status" of "Speaker Recognition" (confirmation of registration status)

I'm also crazy about it personally, but there are some pretty strict restrictions on the audio files available.

Property Required value
container WAV
Encode PCM
rate 16K
Sample format 16 bit
channel monaural

I couldn't get the sound that met the conditions, but I managed to record it with free software called "Audacity". (This is very convenient)

The argument of the script is the user name. (It is assumed that the audio file has a user name, but it is good to verify it.)

CreateEnrollment.py

########### module #############

import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv import time

########### Args & variable #########################
args = sys.argv
Profile_Name = args[1]
Profile_List = 'Profile_List.csv'
WavFile = f'{Profile_Name}.wav'

with open(Profile_List) as fp:
    lst = list(csv.reader(fp))

for i in lst:
    if Profile_Name in i:
        break

j = lst.index(i)
ProfileId = lst[j][1]

########### Create Enrollment #########################
ApiPath = f'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identificationProfiles/{ProfileId}/enroll?shortAudio=true'

headers = {
    # Request headers
    'Content-Type': 'application/octet-stream',

'Ocp-Apim-Subscription-Key':'', }

with open(WavFile, 'rb') as f:
    body = f.read()

r = requests.post(
    ApiPath,            # URL 

headers = headers, # headers data = body # body )

try:
    response = r
    print('response:', response.status_code)
    if response.status_code == 202:
        print(response.headers['Operation-Location'])
        operation_url = response.headers['Operation-Location']
    else:
        print(response.json()['error'])
        sys.exit()
except Exception:
    print(r.json()['error'])
    sys.exit()
####################################
########### Get Operation Status #########################
url = operation_url

headers = {
    # Request headers

'Ocp-Apim-Subscription-Key':'', }

status = ''
while status != 'succeeded':
    
    r = requests.get(
        url,            # URL 

headers = headers, # headers )

    try:
        response = r
        print('response:', response.status_code)
        if response.status_code == 200:
            status = response.json()['status']

print (f'current status; {status}') if status == 'failed': message = response.json()['message'] print(f'error:{message}') sys.exit() elif status != 'succeeded': time.sleep(3) else: print(r.json()['error']) sys.exit() except Exception: print(r.json()['error']) sys.exit()

enrollmentStatus = response.json()['processingResult']['enrollmentStatus']
remainingEnrollmentSpeechTime = response.json()['processingResult']['remainingEnrollmentSpeechTime']
speechTime = response.json()['processingResult']['speechTime']

if enrollmentStatus == 'enrolling':

status ='Profile is currently being registered and is not ready for identification. ' elif enrollmentStatus == 'training': status ='Profile is currently being trained and is not ready for identification. ' else: status ='The profile is currently being registered and ready for identification. '

print (f'\ n status; {enrollmentStatus}') print (f'current status; {status}') print (f'total valid audio time (seconds): {speechTime}') print (f'Remaining audio time (seconds) required for successful registration: {remainingEnrollmentSpeechTime}')

Step 3 Identify who said based on the registered voice

It's finally the main process. The following functions are used here.

  1. "Identification" of "Speaker Recognition"
  2. "Get Operation Status" of "Speaker Recognition"

In this verification, the audio file whose arguments you want to identify is used. By the way, regarding speaker identification, it seems that up to 10 users (profiles) can be verified at the same time so far. As a process, POST the voice and profile ID (plural) that you want to identify with "Identification", execute "Get Operation Status" for the returned URL ʻOperation-Location`, and check the identification status and result. The image to get. {In the verification, it took up to 9 seconds to complete the identification) Also, since the "profile ID" is returned as the identification result, it is necessary to replace it with the user name separately. The reliability of identification is also returned, but it seems that there are three levels: low, medium, and high.

Identification.py

########### module #############

import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv import time

########### Args & variable #########################
args = sys.argv
WavFile = args[1]
Profile_List = 'Profile_List.csv'

with open(Profile_List) as fp:
    lst = list(csv.reader(fp))

########### Identification #########################
ProfileIds = ''
for a, b in lst:
    ProfileIds += b + ','

ProfileIds = ProfileIds[:-1]

url = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identify'

params = {
    'identificationProfileIds': ProfileIds,
    'shortAudio': True,
}

headers = {
    # Request headers
    'Content-Type': 'application/octet-stream',

'Ocp-Apim-Subscription-Key':'', }

with open(WavFile, 'rb') as f:
    body = f.read()

r = requests.post(
    url,            # URL 
    params = params,

headers = headers, # headers data = body # body )

try:
    response = r
    print('response:', response.status_code)
    if response.status_code == 202:
        print(response.headers['Operation-Location'])
        operation_url = response.headers['Operation-Location']
    else:
        print(response.json()['error'])
        sys.exit()
except Exception:
    print(r.json()['error'])
    sys.exit()

####################################
########### Get Operation Status #########################
url = operation_url
#url = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/operations/ea1edc22-32f4-4fb9-81d6-d597a0072c76'

headers = {
    # Request headers

'Ocp-Apim-Subscription-Key':'', }

status = ''
while status != 'succeeded':
    
    r = requests.get(
        url,            # URL 

headers = headers, # headers )

    try:
        response = r
        print('response:', response.status_code)
        if response.status_code == 200:
            status = response.json()['status']

print (f'current status; {status}') if status == 'failed': message = response.json()['message'] print(f'error:{message}') sys.exit() elif status != 'succeeded': time.sleep(3) else: print(r.json()['error']) sys.exit() except Exception: print(r.json()['error']) sys.exit()

identifiedProfileId = response.json()['processingResult']['identifiedProfileId']
confidence = response.json()['processingResult']['confidence']

for i in lst:
    if identifiedProfileId in i:
        break

j = lst.index(i)
Profile_Name = lst[j][0]

print (f'\ n speaker; {Profile_Name}') print (f'reliability; {confidence}') ####################################

end

So, this time I tried to verify the "Speaker Recognition API". It was said that it was not compatible with Japanese, but I personally felt that speaker identification was quite accurate. It seems that you can do various things if you use it well!

Previous article

I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services in Python. # 1

Recommended Posts

I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
I tried to get the authentication code of Qiita API with Python.
I tried to verify and analyze the acceleration of Python by Cython
I tried to get the movie information of TMDb API with Python
I tried to find the entropy of the image with python
I tried to improve the efficiency of daily work with Python
I tried to touch the API of ebay
I tried to streamline the standard role of new employees with Python
I tried to verify the result of A / B test by chi-square test
I tried to open the latest data of the Excel file managed by date in the folder with Python
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to solve the soma cube with python
I tried to solve the problem with Python Vol.1
[Python] Use the Face API of Microsoft Cognitive Services
I tried hitting the API with echonest's python client
I tried to summarize the string operations of Python
I tried to put out the frequent word ranking of LINE talk with Python
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to find the average of the sequence with TensorFlow
[Python] I tried to visualize the follow relationship of Twitter
[Python] I tried collecting data using the API of wikipedia
I tried to divide the file into folders with Python
I tried to make the weather forecast on the official line by referring to the weather forecast bot of "Dialogue system made with python".
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I tried to create a RESTful API by connecting the explosive Python framework FastAPI to MySQL.
How to write offline real time I tried to solve the problem of F02 with Python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I tried to verify the yin and yang classification of Hololive members by machine learning
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to predict the sales of game software with VARISTA by referring to the article of Codexa
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried to solve the ant book beginner's edition with python
I tried to automate the watering of the planter with Raspberry Pi
I want to output the beginning of the next month with Python
I tried to create a list of prime numbers with python
I tried to fix "I tried stochastic simulation of bingo game with Python"
I tried to expand the size of the logical volume with LVM
I tried to automatically collect images of Kanna Hashimoto with Python! !!
PhytoMine-I tried to get the genetic information of plants with Python
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to summarize the contents of each package saved by Python pip in one line
I tried to solve the first question of the University of Tokyo 2019 math entrance exam with python sympy
I tried to touch the COTOHA API
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
I tried to communicate with a remote server by Socket communication with Python.
765 I tried to identify the three professional families by CNN (with Chainer 2.0.0)
[AWS] [GCP] I tried to make cloud services easy to use with Python
I tried to automatically extract the movements of PES players with software
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to visualize the text of the novel "Weathering with You" with WordCloud
[Linux] I tried to verify the secure confirmation method of FQDN (CentOS7)
I tried to predict the behavior of the new coronavirus with the SEIR model.