I used the AudioRecord class to do audio processing on Android. This is a bit of a songwriter, and I've referred to some Japanese pages, but I'm afraid there are a lot of ambiguities in the specs ...
It may be difficult to understand without basic knowledge of speech processing in the first place, but there are many APIs such as `positionNotificationPeriod` and `` `notificationMarkerPosition``` that are not clear what is different.
So, I will leave a memo (code with comments) of the official document and the specifications investigated in the local test.
AudioRecordSample.kt
import android.media.AudioFormat
import android.media.AudioRecord
import android.media.MediaRecorder
import android.util.Log
import kotlin.math.max
/**
 *Sample code for AudioRecord class
 */
class AudioRecordSample {
    //Sampling rate(Hz)
    //All device support guarantee is 44100 only
    private val samplingRate = 44100
    //frame rate(fps)
    //How many times you want to process audio data per second
    //Decide on your own
    private val frameRate = 10
    //1 frame of audio data(=Short value)Number of
    private val oneFrameDataCount = samplingRate / frameRate
    //Number of bytes of audio data in one frame(byte)
    // Byte = 8 bit, Short =Because it's 16 bit,Double short
    private val oneFrameSizeInByte = oneFrameDataCount * 2
    //Audio data buffer size(byte)
    //Requirement 1:Must be larger than oneFrameSizeInByte
    //Requirement 2:Must be greater than the minimum required by the device
    private val audioBufferSizeInByte =
            max(oneFrameSizeInByte * 10, //Appropriately provided a buffer for 10 frames
                    android.media.AudioRecord.getMinBufferSize(samplingRate,
                            AudioFormat.CHANNEL_IN_MONO,
                            AudioFormat.ENCODING_PCM_16BIT))
    fun startRecording() {
        //Create an instance
        val audioRecord = AudioRecord(
                MediaRecorder.AudioSource.MIC, //Audio source
                samplingRate, //Sampling rate
                AudioFormat.CHANNEL_IN_MONO, //Channel settings.MONO and STEREO guarantees support for all devices
                AudioFormat.ENCODING_PCM_16BIT, //PCM16 guarantees support for all devices
                audioBufferSizeInByte) //buffer
        //How many audio data to process( =Number of data in one frame)
        audioRecord.positionNotificationPeriod = oneFrameDataCount
        //At the timing when the number specified here is reached,Subsequent onMarkerReached is called
        //Doesn't it seem necessary for normal streaming processing?
        audioRecord.notificationMarkerPosition = 40000 //Do not set if not used.
        //Array to store audio data
        val audioDataArray = ShortArray(oneFrameDataCount)
        //Specify callback
        audioRecord.setRecordPositionUpdateListener(object : AudioRecord.OnRecordPositionUpdateListener {
            //Processing for each frame
            override fun onPeriodicNotification(recorder: AudioRecord) {
                recorder.read(audioDataArray, 0, oneFrameDataCount) //Read voice data
                Log.v("AudioRecord", "onPeriodicNotification size=${audioDataArray.size}")
                //Process as you like
            }
            //Marker timing processing.
            //Called when notificationMarkerPosition is reached
            override fun onMarkerReached(recorder: AudioRecord) {
                recorder.read(audioDataArray, 0, oneFrameDataCount) //Read voice data
                Log.v("AudioRecord", "onMarkerReached size=${audioDataArray.size}")
                //Process as you like
            }
        })
        audioRecord.startRecording()
    }
}
It's just basic. I'll write it again if I can get more advanced insights such as performance.
Recommended Posts