I implemented Google's Speech to text in Django

In AWS, it is possible to transcribe by WEB operation, but in GCP it can be operated only from API. So, I studied django and tried it briefly. The reason is that Google's voice recognition accuracy is quite high. The flow is to upload to Google Storage and transcribe. The reason for Google Storage is that if it is local, there are conditions such as a file size of less than 10MB.

Completion drawing

スクリーンショット 2021-01-12 22.00.25.png

Development environment

MacBook Python(3.7.7) Django(3.1.5) google-cloud-storage(1.35.0) google-cloud-speech(2.0.1) pydub(0.24.1)

Get json for Google authentication

Grant administrator privileges for "Google Storage" when creating a service account. スクリーンショット 2021-01-12 20.51.33.png

Speech to Text API activation

Enable API from GCP library スクリーンショット 2021-01-12 20.56.21.png

Environmental setting

#Django
pip3 install django==3.1.5

#google-cloud-storage
pip3 install google-cloud-storage==1.35.0

#google-cloud-storage
pip3 install google-cloud-speech==2.0.1

#pydub
pip3 install pydub==0.24.1

Django settings

Project creation

A project folder will be created.

#Project name(project)
django-admin startproject project

Creating an application

Go to the project folder and create an application.

#Application creation
python3 manage.py startapp moji

Django (WEB server) basic settings

Set the files in the project folder in the project folder.

settings.py


#So that anyone can access it
ALLOWED_HOSTS = ['*']

#To use html files
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'mozi',  #Add application(Now search for templates in mozi)
]

urls.py


from django.contrib import admin
from django.urls import path,include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('mozi/', include('mozi.urls')), #urls in the mozi app.To be able to set py
]

Application (mozi) basic settings

Set the files in the mozi folder in the project folder. Create a new urls.py so that the screen transition can be set on the application side

urls.py


from django.urls import path
from . import views

urlpatterns = [
    path('', views.index, name='index'),
]

Create main function

Add the upload destination to settings.py in the project folder. BASE_DIR is where manage.py is located, so create an upload folder there.

settings.py


#FILE_UPLOAD
import os
MEDIA_ROOT = os.path.join(BASE_DIR, 'upload')
MEDIA_URL = '/upload/'

Added magic to urls.py in the project folder.

urls.py


if settings.DEBUG:
    urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

Add the information required for the form to models.py under the mozi folder. "Media" here is created under the upload folder and the files are saved in it.

models.py


from django.db import models

class Upload(models.Model):
    document = models.FileField(upload_to='media')
    uploaded_at = models.DateTimeField(auto_now_add=True)

Create a new forms.py under the mozi folder to upload files from the forms.

forms.py


from django import forms
from .models import Upload
 
class UploadForm(forms.ModelForm):
    class Meta:
        model = Upload
        fields = ('document',)

Create html to create a WEB screen.

inde.html


<!DOCTYPE html>
<html lang="ja-JP">
<head>
    <meta charset="UTF-8">
    <title>Transcription</title>
</head>
<body>
 
    <h1>Google Speech To Text</h1>
 
    <form method="post" enctype="multipart/form-data">
        {% csrf_token %}
        {{ form.as_p }}
        <button type="submit">start</button>
    </form>

   <h2>Transcription result</h2>
   <p>{{ transcribe_result }}</p>
 
</body>
</html>

Set views.py, which is the core of screen display.

views.py


from django.http import HttpResponse
from django.shortcuts import render,redirect
from .forms import UploadForm
from .models import Upload

def index(request):
    import os
    import subprocess

    #Save PATH
    source = "Path where the file is uploaded" 
  
    #GCS_URL
    GCS_BASE = "gs://Bucket name/"    

    #Save results
    speech_result = ""

    if request.method == 'POST':
        #Google Storage environment preparation
        from google.cloud import storage
        os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='json PATH'
        client = storage.Client()
        bucket = client.get_bucket('Google Storage bucket name')
       
        #Save upload file
        form = UploadForm(request.POST,request.FILES)
        form.save()

        #Get the uploaded file name
        #Separate file name and extension(ext->extension(.py))
        transcribe_file = request.FILES['document'].name
        name, ext = os.path.splitext(transcribe_file)

        if ext==".wav": 
            #Upload to Google Storage
            blob = bucket.blob( transcribe_file )
            blob.upload_from_filename(filename= source + transcribe_file )

            #Get play time
            from pydub import AudioSegment
            sound = AudioSegment.from_file( source + transcribe_file )
            length = sound.duration_seconds
            length += 1


            #Delete working files
            cmd = 'rm -f ' + source + transcribe_file     
            subprocess.call(cmd, shell=True)

            #Transcription
            from google.cloud import speech

            client = speech.SpeechClient()

            gcs_uri = GCS_BASE + transcribe_file

            audio = speech.RecognitionAudio(uri=gcs_uri)
            config = speech.RecognitionConfig(
                encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
                #sample_rate_hertz=16000,
                language_code="ja_JP",
                enable_automatic_punctuation=True,
            )

            operation = client.long_running_recognize(config=config, audio=audio)

            response = operation.result(timeout=round(length))

            for result in response.results:
                speech_result += result.alternatives[0].transcript

            #Google Storage file deletion
            blob.delete()

        else:
            #File conversion process
            f_input = source + transcribe_file
            f_output = source + name + ".wav"
            upload_file_name = name + ".wav"
            cmd = 'ffmpeg -i ' + f_input + ' -ar 16000 -ac 1 ' + f_output
            subprocess.call(cmd, shell=True)

            #Upload to Google Storage
            blob = bucket.blob( upload_file_name )
            blob.upload_from_filename(filename= f_output )

            #Get play time
            from pydub import AudioSegment
            sound = AudioSegment.from_file( source + transcribe_file )
            length = sound.duration_seconds
            length += 1


            #Delete working files
            cmd = 'rm -f ' + f_input + ' ' + f_output     
            subprocess.call(cmd, shell=True)
            
            #Transcription
            from google.cloud import speech

            client = speech.SpeechClient()

            gcs_uri = GCS_BASE + upload_file_name

            audio = speech.RecognitionAudio(uri=gcs_uri)
            config = speech.RecognitionConfig(
                encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
                #sample_rate_hertz=16000,
                language_code="ja_JP",
            )

            operation = client.long_running_recognize(config=config, audio=audio)

            response = operation.result(timeout=round(length))

            for result in response.results:
                speech_result += result.alternatives[0].transcript

            #Google Storage file deletion
            blob.delete()
    else:
        form = UploadForm()
    return render(request, 'mozi/index.html', {
        'form': form,
        'transcribe_result':speech_result
    })

Finally sync the application.

django-admin makemigrations mozi
django-admin migrate

Now that you're ready, start your web server.

python3 manage.py runserver server IP:8000

It was easy to build because I was able to describe the internal processing from the WEB server construction in Python. It will be a record of touching and memo.

Reference site

https://noumenon-th.net/programming/2019/10/28/django-forms/ https://qiita.com/peijipe/items/009fc487505dfdb03a8d https://cloud.google.com/speech-to-text/docs/async-recognize?hl=ja

Recommended Posts

I implemented Google's Speech to text in Django
Speech to speech in python [text to speech]
I tried Watson Speech to Text
I tried using Azure Speech to Text.
Implemented DQN in TensorFlow (I wanted to ...)
I want to pin Datetime.now in Django tests
Pass text to Django genericview
I tried mushrooms Pepper x IBM Bluemix Text to Speech
I made a command to generate a table comment in Django
[Django] I want to log in automatically after new registration
How to reflect CSS in Django
I tried to implement PLSA in Python
English speech recognition with python [speech to text]
I tried to implement permutation in Python
I want to print in a comprehension
How to delete expired sessions in Django
I tried to implement PLSA in Python 2
I tried to classify text using TensorFlow
I want to use the Django Debug Toolbar in my Ajax application
I implemented Cousera's logistic regression in Python
I tried to implement ADALINE in Python
I wanted to solve ABC159 in Python
I tried to implement PPO in Python
I want to embed Matplotlib in PySimpleGUI
How to do Server-Sent Events in Django
How to convert DateTimeField format in Django
I implemented the VGG16 model in Keras and tried to identify CIFAR10
[Django memo] I want to set the login user information in the form in advance
I referred to it when I got stuck in the django geodjango tutorial (editing)
I want to do Dunnett's test in Python
I implemented Robinson's Bayesian Spam Filter in python
Dynamically add fields to Form objects in Django
How to implement Rails helper-like functionality in Django
I implemented DCGAN and tried to generate apples
tse --Introduction to Text Stream Editor in Python
I was able to recurse in Python: lambda
I want to create a window in Python
I tried to integrate with Keras in TFv1.1
How to reflect ImageField in Django + Docker (pillow)
How to run some script regularly in Django
I wrote "Introduction to Effect Verification" in Python
I want to store DB information in list
I can't enter characters in the text area! ?? !! ?? !! !! ??
I want to merge nested dicts in Python
I tried to implement TOPIC MODEL in Python
Pass login user information to view in Django
I implemented the inverse gamma function in python
How to create a Rest Api in Django
I tried to implement selection sort in python
I would like to know about Django pagination.
I implemented Human In The Loop ― Part ① Dashboard ―
I want to display the progress in Python!
I want to upload a Django app to heroku
I defined ForeignKey to CustomUser specified in AUTH_USER_MODEL in Django, but it is not referenced
Models in Django
I tried to extract the text in the image file using Tesseract of the OCR engine
I implemented CycleGAN (1)
I implemented ResNet!
I tried Django
Forms in Django
I implemented breadth-first search in python (queue, drawing self-made)