In AWS, it is possible to transcribe by WEB operation, but in GCP it can be operated only from API. So, I studied django and tried it briefly. The reason is that Google's voice recognition accuracy is quite high. The flow is to upload to Google Storage and transcribe. The reason for Google Storage is that if it is local, there are conditions such as a file size of less than 10MB.
MacBook Python(3.7.7) Django(3.1.5) google-cloud-storage(1.35.0) google-cloud-speech(2.0.1) pydub(0.24.1)
Grant administrator privileges for "Google Storage" when creating a service account.
Enable API from GCP library
#Django
pip3 install django==3.1.5
#google-cloud-storage
pip3 install google-cloud-storage==1.35.0
#google-cloud-storage
pip3 install google-cloud-speech==2.0.1
#pydub
pip3 install pydub==0.24.1
A project folder will be created.
#Project name(project)
django-admin startproject project
Go to the project folder and create an application.
#Application creation
python3 manage.py startapp moji
Set the files in the project folder in the project folder.
settings.py
#So that anyone can access it
ALLOWED_HOSTS = ['*']
#To use html files
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'mozi', #Add application(Now search for templates in mozi)
]
urls.py
from django.contrib import admin
from django.urls import path,include
urlpatterns = [
path('admin/', admin.site.urls),
path('mozi/', include('mozi.urls')), #urls in the mozi app.To be able to set py
]
Set the files in the mozi folder in the project folder. Create a new urls.py so that the screen transition can be set on the application side
urls.py
from django.urls import path
from . import views
urlpatterns = [
path('', views.index, name='index'),
]
Add the upload destination to settings.py in the project folder. BASE_DIR is where manage.py is located, so create an upload folder there.
settings.py
#FILE_UPLOAD
import os
MEDIA_ROOT = os.path.join(BASE_DIR, 'upload')
MEDIA_URL = '/upload/'
Added magic to urls.py in the project folder.
urls.py
if settings.DEBUG:
urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
Add the information required for the form to models.py under the mozi folder. "Media" here is created under the upload folder and the files are saved in it.
models.py
from django.db import models
class Upload(models.Model):
document = models.FileField(upload_to='media')
uploaded_at = models.DateTimeField(auto_now_add=True)
Create a new forms.py under the mozi folder to upload files from the forms.
forms.py
from django import forms
from .models import Upload
class UploadForm(forms.ModelForm):
class Meta:
model = Upload
fields = ('document',)
Create html to create a WEB screen.
inde.html
<!DOCTYPE html>
<html lang="ja-JP">
<head>
<meta charset="UTF-8">
<title>Transcription</title>
</head>
<body>
<h1>Google Speech To Text</h1>
<form method="post" enctype="multipart/form-data">
{% csrf_token %}
{{ form.as_p }}
<button type="submit">start</button>
</form>
<h2>Transcription result</h2>
<p>{{ transcribe_result }}</p>
</body>
</html>
Set views.py, which is the core of screen display.
views.py
from django.http import HttpResponse
from django.shortcuts import render,redirect
from .forms import UploadForm
from .models import Upload
def index(request):
import os
import subprocess
#Save PATH
source = "Path where the file is uploaded"
#GCS_URL
GCS_BASE = "gs://Bucket name/"
#Save results
speech_result = ""
if request.method == 'POST':
#Google Storage environment preparation
from google.cloud import storage
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='json PATH'
client = storage.Client()
bucket = client.get_bucket('Google Storage bucket name')
#Save upload file
form = UploadForm(request.POST,request.FILES)
form.save()
#Get the uploaded file name
#Separate file name and extension(ext->extension(.py))
transcribe_file = request.FILES['document'].name
name, ext = os.path.splitext(transcribe_file)
if ext==".wav":
#Upload to Google Storage
blob = bucket.blob( transcribe_file )
blob.upload_from_filename(filename= source + transcribe_file )
#Get play time
from pydub import AudioSegment
sound = AudioSegment.from_file( source + transcribe_file )
length = sound.duration_seconds
length += 1
#Delete working files
cmd = 'rm -f ' + source + transcribe_file
subprocess.call(cmd, shell=True)
#Transcription
from google.cloud import speech
client = speech.SpeechClient()
gcs_uri = GCS_BASE + transcribe_file
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
#sample_rate_hertz=16000,
language_code="ja_JP",
enable_automatic_punctuation=True,
)
operation = client.long_running_recognize(config=config, audio=audio)
response = operation.result(timeout=round(length))
for result in response.results:
speech_result += result.alternatives[0].transcript
#Google Storage file deletion
blob.delete()
else:
#File conversion process
f_input = source + transcribe_file
f_output = source + name + ".wav"
upload_file_name = name + ".wav"
cmd = 'ffmpeg -i ' + f_input + ' -ar 16000 -ac 1 ' + f_output
subprocess.call(cmd, shell=True)
#Upload to Google Storage
blob = bucket.blob( upload_file_name )
blob.upload_from_filename(filename= f_output )
#Get play time
from pydub import AudioSegment
sound = AudioSegment.from_file( source + transcribe_file )
length = sound.duration_seconds
length += 1
#Delete working files
cmd = 'rm -f ' + f_input + ' ' + f_output
subprocess.call(cmd, shell=True)
#Transcription
from google.cloud import speech
client = speech.SpeechClient()
gcs_uri = GCS_BASE + upload_file_name
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
#sample_rate_hertz=16000,
language_code="ja_JP",
)
operation = client.long_running_recognize(config=config, audio=audio)
response = operation.result(timeout=round(length))
for result in response.results:
speech_result += result.alternatives[0].transcript
#Google Storage file deletion
blob.delete()
else:
form = UploadForm()
return render(request, 'mozi/index.html', {
'form': form,
'transcribe_result':speech_result
})
Finally sync the application.
django-admin makemigrations mozi
django-admin migrate
Now that you're ready, start your web server.
python3 manage.py runserver server IP:8000
It was easy to build because I was able to describe the internal processing from the WEB server construction in Python. It will be a record of touching and memo.
https://noumenon-th.net/programming/2019/10/28/django-forms/ https://qiita.com/peijipe/items/009fc487505dfdb03a8d https://cloud.google.com/speech-to-text/docs/async-recognize?hl=ja
Recommended Posts