I tried using Azure Speech to Text.

How to use Microsoft Azure voice recognition API

This section describes how to use the speech recognition API on MacOS Catalina (ver. 10.15.4). I tried to recognize Japanese using Speech to Text of Cognitive Services. Since the shell is zsh, bash may not be able to do it.

Resource creation

First of all, please create an Azure account as a prerequisite. You can create an account for free. It is recommended because it comes with a 200 $ deposit and you can use various APIs for free for one year.

--Once you have created an account, click ** Create Resource ** on the portal site. --Searching for ** Speech ** in the search bar will bring up the API options ** Voice ** or ** Speech **. --Click the Speech choice, then click ** Create **. --The creation form screen will appear, and the following items will appear. --Name: Name of the resource (anything is fine) --Subscription: Free Trial (displayed by default) --Location: East Japan (if you specify a region of Japan) --Price level: F0 --Resource group: Click ** New ** to decide the resource name. Anything will be fine. --Once the resource has been created, the created resource should be reflected in the dashboard, so click it. --Then, there is an item called ** Key Management ** in the overview, so click it. The resource name, endpoint, and ** two subscription keys ** are written there. Remember that you will use your subscription key later.

** Make sure your subscription key is never seen by others. ** ** This is the voice recognition instance creation.


Next, make the settings on the PC. First, install the Speech SDK.


python3 -m pip install --upgrade pip
pip install azure-cognitiveservices-speech

Next, since the sample code for voice recognition prepared by MicroSoft is on GIT, create a quickstart.py file locally and copy and paste it. Since git has ** quickstart.py **, code for jupyter (Quickstart.ipynb) and README.md, please copy the contents of ** quickstart.py **. (The code is here) A code like this is written. If you copy it, there is one place to change and one additional note.


# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE.md file in the project root for full license information.

# <code>
import azure.cognitiveservices.speech as speechsdk

# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").

The following changes
Subscription key: One of the two keys you can see from the resource overview you just checked
Location: In eastern Japan'japaneast',In western Japan'japanwest'Please.
speech_key, service_region = "Subscription key", "place"

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

Addendum below
Settings for recognizing Japanese. Without this, only English is recognized by default.

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")

# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed.  The task returns the recognition text as result. 
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query. 
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))
# </code>

Now that you're ready, run the following from your terminal:


python quickstart.py

In my case, when I run it via VScode, the audio is not recognized, so if that happens, run it in the terminal. If you know how to do it with VScode and how to set it, please let me know. When you run

say something...

Is displayed, so please say something. The recognition result should be output. Due to the setting, only one word is recognized, but it can be changed to recognize the sequence.

That's it.

Recommended Posts

I tried using Azure Speech to Text.
I tried Watson Speech to Text
I tried to classify text using TensorFlow
Convert voice to text using Azure Speech SDK
I tried mushrooms Pepper x IBM Bluemix Text to Speech
I tried to make a simple text editor using PyQt
I tried to predict Covid-19 using Darts
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using aiomysql
I tried using Summpy
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried to debug.
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried to paste
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried to synthesize WAV files using Pydub.
I implemented Google's Speech to text in Django
I tried to make a ○ ✕ game using TensorFlow
I tried to make PyTorch model API in Azure environment using TorchServe
I tried to learn PredNet
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried to organize SVM.
I tried face recognition using Face ++
I tried using Random Forest
I tried using BigQuery ML
I tried to implement PCANet
I tried using Amazon Glacier
I tried to get an AMI using AWS Lambda
I tried to approximate the sin function using chainer
I tried to become an Ann Man using OpenCV
I tried using git inspector
I tried to reintroduce Linux
Speech to speech in python [text to speech]
[Python] I tried using OpenPose
I tried to introduce Pylint
I tried to summarize SparseMatrix
I tried using magenta / TensorFlow
I tried to touch jupyter
I tried to implement StarGAN (1)
I tried to identify the language using CNN + Melspectogram