I implemented Google's Speech to text in Django

In AWS, it is possible to transcribe by WEB operation, but in GCP it can be operated only from API. So, I studied django and tried it briefly. The reason is that Google's voice recognition accuracy is quite high. The flow is to upload to Google Storage and transcribe. The reason for Google Storage is that if it is local, there are conditions such as a file size of less than 10MB.

Completion drawing

スクリーンショット 2021-01-12 22.00.25.png

Development environment

MacBook Python(3.7.7) Django(3.1.5) google-cloud-storage(1.35.0) google-cloud-speech(2.0.1) pydub(0.24.1)

Get json for Google authentication

Grant administrator privileges for "Google Storage" when creating a service account. スクリーンショット 2021-01-12 20.51.33.png

Speech to Text API activation

Enable API from GCP library スクリーンショット 2021-01-12 20.56.21.png

Environmental setting

#Django
pip3 install django==3.1.5

#google-cloud-storage
pip3 install google-cloud-storage==1.35.0

#google-cloud-storage
pip3 install google-cloud-speech==2.0.1

#pydub
pip3 install pydub==0.24.1

Django settings

Project creation

A project folder will be created.

#Project name(project)
django-admin startproject project

Creating an application

Go to the project folder and create an application.

This time, I created an application called "mozi"

#Application creation
python3 manage.py startapp moji

Django (WEB server) basic settings

Set the files in the project folder in the project folder.

`settings.py`


#So that anyone can access it
ALLOWED_HOSTS = ['*']

#To use html files
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'mozi',  #Add application(Now search for templates in mozi)
]

`urls.py`


from django.contrib import admin
from django.urls import path,include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('mozi/', include('mozi.urls')), #urls in the mozi app.To be able to set py
]

Application (mozi) basic settings

Set the files in the mozi folder in the project folder. Create a new urls.py so that the screen transition can be set on the application side

`urls.py`


from django.urls import path
from . import views

urlpatterns = [
    path('', views.index, name='index'),
]

Create main function

Add the upload destination to settings.py in the project folder. BASE_DIR is where manage.py is located, so create an upload folder there.

`settings.py`


#FILE_UPLOAD
import os
MEDIA_ROOT = os.path.join(BASE_DIR, 'upload')
MEDIA_URL = '/upload/'

Added magic to urls.py in the project folder.

`urls.py`


if settings.DEBUG:
    urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

Add the information required for the form to models.py under the mozi folder. "Media" here is created under the upload folder and the files are saved in it.

`models.py`


from django.db import models

class Upload(models.Model):
    document = models.FileField(upload_to='media')
    uploaded_at = models.DateTimeField(auto_now_add=True)

Create a new forms.py under the mozi folder to upload files from the forms.

`forms.py`


from django import forms
from .models import Upload
 
class UploadForm(forms.ModelForm):
    class Meta:
        model = Upload
        fields = ('document',)

Create html to create a WEB screen.

Mozi/templates/mozi/inde.html-> template Create new
If you enclose it in {{}}, it will be treated as a variable.

`inde.html`


<!DOCTYPE html>
<html lang="ja-JP">
<head>
    <meta charset="UTF-8">
    <title>Transcription</title>
</head>
<body>
 
    <h1>Google Speech To Text</h1>
 
    <form method="post" enctype="multipart/form-data">
        {% csrf_token %}
        {{ form.as_p }}
        <button type="submit">start</button>
    </form>

   <h2>Transcription result</h2>
   <p>{{ transcribe_result }}</p>
 
</body>
</html>

Set views.py, which is the core of screen display.

`views.py`


from django.http import HttpResponse
from django.shortcuts import render,redirect
from .forms import UploadForm
from .models import Upload

def index(request):
    import os
    import subprocess

    #Save PATH
    source = "Path where the file is uploaded" 
  
    #GCS_URL
    GCS_BASE = "gs://Bucket name/"    

    #Save results
    speech_result = ""

    if request.method == 'POST':
        #Google Storage environment preparation
        from google.cloud import storage
        os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='json PATH'
        client = storage.Client()
        bucket = client.get_bucket('Google Storage bucket name')
       
        #Save upload file
        form = UploadForm(request.POST,request.FILES)
        form.save()

        #Get the uploaded file name
        #Separate file name and extension(ext->extension(.py))
        transcribe_file = request.FILES['document'].name
        name, ext = os.path.splitext(transcribe_file)

        if ext==".wav": 
            #Upload to Google Storage
            blob = bucket.blob( transcribe_file )
            blob.upload_from_filename(filename= source + transcribe_file )

            #Get play time
            from pydub import AudioSegment
            sound = AudioSegment.from_file( source + transcribe_file )
            length = sound.duration_seconds
            length += 1


            #Delete working files
            cmd = 'rm -f ' + source + transcribe_file     
            subprocess.call(cmd, shell=True)

            #Transcription
            from google.cloud import speech

            client = speech.SpeechClient()

            gcs_uri = GCS_BASE + transcribe_file

            audio = speech.RecognitionAudio(uri=gcs_uri)
            config = speech.RecognitionConfig(
                encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
                #sample_rate_hertz=16000,
                language_code="ja_JP",
                enable_automatic_punctuation=True,
            )

            operation = client.long_running_recognize(config=config, audio=audio)

            response = operation.result(timeout=round(length))

            for result in response.results:
                speech_result += result.alternatives[0].transcript

            #Google Storage file deletion
            blob.delete()

        else:
            #File conversion process
            f_input = source + transcribe_file
            f_output = source + name + ".wav"
            upload_file_name = name + ".wav"
            cmd = 'ffmpeg -i ' + f_input + ' -ar 16000 -ac 1 ' + f_output
            subprocess.call(cmd, shell=True)

            #Upload to Google Storage
            blob = bucket.blob( upload_file_name )
            blob.upload_from_filename(filename= f_output )

            #Get play time
            from pydub import AudioSegment
            sound = AudioSegment.from_file( source + transcribe_file )
            length = sound.duration_seconds
            length += 1


            #Delete working files
            cmd = 'rm -f ' + f_input + ' ' + f_output     
            subprocess.call(cmd, shell=True)
            
            #Transcription
            from google.cloud import speech

            client = speech.SpeechClient()

            gcs_uri = GCS_BASE + upload_file_name

            audio = speech.RecognitionAudio(uri=gcs_uri)
            config = speech.RecognitionConfig(
                encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
                #sample_rate_hertz=16000,
                language_code="ja_JP",
            )

            operation = client.long_running_recognize(config=config, audio=audio)

            response = operation.result(timeout=round(length))

            for result in response.results:
                speech_result += result.alternatives[0].transcript

            #Google Storage file deletion
            blob.delete()
    else:
        form = UploadForm()
    return render(request, 'mozi/index.html', {
        'form': form,
        'transcribe_result':speech_result
    })

Finally sync the application.

django-admin makemigrations mozi
django-admin migrate

Now that you're ready, start your web server.

python3 manage.py runserver server IP:8000

It was easy to build because I was able to describe the internal processing from the WEB server construction in Python. It will be a record of touching and memo.

Reference site

https://noumenon-th.net/programming/2019/10/28/django-forms/ https://qiita.com/peijipe/items/009fc487505dfdb03a8d https://cloud.google.com/speech-to-text/docs/async-recognize?hl=ja