[C] [python] Read with AquesTalk on Linux

In the Windows environment, you can easily read it out with softalk etc., but in the Linux environment you can not execute it unless you do your best with wine etc. There is no choice but to call AquesTalk directly, which softalk is calling.

If you just want to synthesize speech, it is easier to call Google Translate's text-to-speech API with gTTS. When I try to install open-jtalk, it conflicts in my environment, so the solution is troublesome and I have not tried it.

things to do

Use the AquesTalk2 library from c and python code to synthesize speech.

Execution environment

Arch Linux
python3.6.5
AquesTalk2 Linux 2.3.0

Install AquesTalk

download

Download the evaluation version from Download \ | Aquest Co., Ltd.. This article uses AquesTalk2. The operation with AquesTalk and AquesTalk10 is unconfirmed.

In addition, in the evaluation version, all the tones of "na line, ma line" will be "nu".

Installation

Install the library according to the manual included in the downloaded file. Please read as appropriate the version of the library and whether to use lib or lib64.

$ cd aqtk2-lnx-eva/lib64
$ cp libAquesTalk2Eva.so.2.3 /usr/lib
$ sudo ln -sf /usr/lib/libAquesTalk2Eva.so.2.3 /usr/lib/libAquesTalk2Eva.so.2
$ sudo ln -sf /usr/lib/libAquesTalk2Eva.so.2 /usr/lib/libAquesTalk2Eva.so
$ sudo /sbin/ldconfig -n /usr/lib

Run AquesTalk sample

Compile and run ʻaqtk2-lnx-eva / samples / SampleTalk.c`. You can follow the attached manual.

Modify the sample to match the encoding

Since a speech synthesis function is prepared for each encoding, modify SampleTalk.c according to the encoding such as terminal. UTF-8 is used here. If the function that matches the encoding is not called, it will not work properly.

//	unsigned char *wav = AquesTalk2_Synthe_Euc(str, 100, &size, NULL);
	unsigned char *wav = AquesTalk2_Synthe_Utf8(str, 100, &size, NULL);

compile

Compile with the library and header path specified.

$ g++ -o SampleTalk samples/SampleTalk.c -lAquesTalk2Eva -Ilib64

Run

The sample takes text from standard input and outputs wav format data to standard output. It will read out "Slow down ** Nu **".

$ echo "Take your time" | ./SampleTalk > sample.wav

Run from python

Load and run the library with ctypes. You can change the voice quality by specifying the font file attached to the evaluation version. If not specified, it works by default. The default is fairly easy to hear.

from ctypes import *


def synthe_utf8(text, speed=100, file_phont=None):
    if file_phont is not None:
        with open(file_phont, 'rb') as f:
            phont = f.read()
    else:
        phont = None

    aqtk = cdll.LoadLibrary("libAquesTalk2Eva.so")
    aqtk.AquesTalk2_Synthe_Utf8.restype = POINTER(ARRAY(c_ubyte, 0))
    size=c_int(0)
    wav_p = aqtk.AquesTalk2_Synthe_Utf8(text.encode('utf-8'), speed, byref(size), phont)
    if not bool(wav_p):
        print("ERR:", size.value)
        return None
    wav_p = cast(wav_p, POINTER(ARRAY(c_ubyte, size.value)))
    wav = bytearray(wav_p.contents)
    aqtk.AquesTalk2_FreeWave(wav_p)
    return wav


if __name__ == '__main__':
    with open('./default.wav', 'wb') as f:
        wav = synthe_utf8(u"Take your time", speed=100)
        f.write(wav)
    with open('./yukkuri.wav', 'wb') as f:
        wav = synthe_utf8(u"Take your time", speed=100, file_phont='aqtk2-lnx-eva/phont/aq_yukkuri.phont')
        f.write(wav)

Nu

In the evaluation version, "na line, ma line" is "nu", but it seems that you need to obtain a license to read it correctly. For personal use, you can get a development license for less than 2000 yen. (As of June 9, 2018) See below for details such as distribution rules. Personal license \ | Aquest Co., Ltd.