I wanted to use an open source speech recognition library on my Mac, but after a lot of research, I couldn't even conclude if I could. Requirement environments according to ESPnet official documents
Supported Linux distributions and other requirements
We support the following Linux distributions with CI. If you want to build your own Linux by yourself, please also check our CI configurations. to prepare the appropriate environments
ubuntu18
ubuntu16
centos7
debian9
(In the documentation, only Linux seems to work, but I was able to implement it on macOS) Here is a summary of my own methods & problems that have arisen.
** macOS Calatina 10.15.4 ** (currently the latest version of 2020.4) Anaconda3(Python3.7.3)
https://brew.sh/index_ja
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
According to the documentation, gcc 4.9+ for PyTorch 1.0.0+ is required
warp-transducer.done: espnet.done
rm -rf warp-transducer
git clone https://github.com/HawkAaron/warp-transducer.git
# Note: Requires gcc>=5.0 to build extensions with pytorch>=1.0
Looking at the ESPnet makefile, ** gcc> = 5.0 ** is required.
$ brew install gcc
$ ln -s /usr/local/bin/gcc-9 /usr/local/bin/gcc
$ ln -s /usr/local/bin/g++-9 /usr/local/bin/g++
$ gcc --version
gcc-9 (Homebrew GCC 9.3.0_1) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
Download and install the macOS version from https://cmake.org/download/
$ conda install -c conda-forge sox
$ conda install -c conda-forge ffmpeg
$ brew install flac
$ brew install automake
Download from http://ftp.gnu.org/gnu/autoconf/.
$ wget http://ftp.gnu.org/gnu/autoconf/autoconf-latest.tar.xz
$ tar xvfz autoconf-latest.tar.xz
$ cd autoconf-latest
$ ./configure --prefix=/usr #If you can't get permission here, prefix~Please in the following.
$ make
$ make install
$ git clone https://github.com/kaldi-asr/kaldi
$ cd kaldi/tools
$ ./extras/install_openblas.sh #openBLAS installation
$ ./extras/check_dependencies.sh
After implementing all dependencies
$ make -j <Number of threads> #It's unusually slow here, so hurry up and use all the cores.
compile without cuda
$ cd ../src
$ ./configure --mkl-root=/opt/intel/compilers_and_libraries_2020/mac/mkl --use-cuda=no
$ make -j clean depend; make -j <NUM-CPU>
$ git clone https://github.com/espnet/espnet
$ cd espnet/tools
If clang that comes with Xcode is already installed, clang will be called even if you hit the gcc command. Compiling with clang causes the problem of `clang: error: unsupported option'-fopenmp'`
, so you have to define the compiler variables to gcc before doing make.
$ export CC=gcc
$ export CXX=g++
$ make KALDI=<PATH_TO_KALDI> #OPTIONS
OPTIONS:
CUPY_VERSION='' #CPU-Empty cupy version to install only.
PYTHON=/opt/anaconda3/bin/python3.7 #Please specify PYTHON if there is a bug in miniconda.
TH_VERSION=1.2.0 #warp for macOS-Version 1 with integrated CTC Loss due to lack of ctc package.2.Requires 0 or more pytroch.
INFO: library availableness check start.
INFO: # libraries to be checked = 7
INFO: --> espnet is installed.
INFO: --> kaldiio is installed.
INFO: --> matplotlib is installed.
INFO: --> torch is installed.
INFO: --> chainer is installed.
INFO: --> chainer_ctc is installed.
WARNING: --> warprnnt_pytorch is not installed.
INFO: library availableness check done.
INFO: 6 / 7 libraries are correctly installed.
INFO: please try to setup again and then re-run this script.
make: *** [check_install] Error 1
It seems that warprnnt_pytorch does not need to be a CPU-only platform, so this is complete.
$cd espnet/egs/csj/asr1;bash ../../../utils/recog_wav.sh --models csj.transformer.v1 <wav file>
Below is the csj.transformer model with Japanese asr results
stage 0: Data preparation
stage 1: Feature Generation
steps/make_fbank_pitch.sh --cmd run.pl --nj 1 --write_utt2num_frames true decode/osa/data decode/osa/log decode/osa/fbank
steps/make_fbank_pitch.sh: moving decode/osa/data/feats.scp to decode/osa/data/.backup
utils/validate_data_dir.sh: Successfully validated data-directory decode/osa/data
steps/make_fbank_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_fbank_pitch.sh: Succeeded creating filterbank and pitch features for data
/Users/soshiyuu/signal/espnet/egs/csj/asr1/../../../utils/dump.sh --cmd run.pl --nj 1 --do_delta false decode/osa/data/feats.scp decode/download/csj.transformer.v1/data/train_nodup_sp/cmvn.ark decode/osa/log decode/osa/dump
stage 2: Json Data Preparation
/Users/soshiyuu/signal/espnet/egs/csj/asr1/../../../utils/data2json.sh --feat decode/osa/dump/feats.scp decode/osa/data decode/osa/dict
/Users/soshiyuu/signal/espnet/egs/csj/asr1/../../../utils/feat_to_shape.sh --cmd run.pl --nj 1 --filetype --preprocess-conf --verbose 0 decode/osa/dump/feats.scp decode/osa/data/tmp-EPng1/input_1/shape.scp
sym2int.pl: replacing X with 1
** Replaced 1 instances of OOVs with 1
stage 3: Decoding
Recognized text:Well, how about Monday morning, etc. I wonder if it's vacant.