This is a transcript of the app I made when I was in my third year of undergraduate school.
I think that you often listen to music with earphones while traveling by train. How loud do you listen? Some people may want to listen at a loud volume. I like the volume so that I can hear a little of the surrounding sounds. It's sad that I can't hear the music, and on the contrary, it's annoying that it's always loud. When I noticed, I found myself keeping clicking the volume button in the car. Since it is troublesome, I decided to create an application that automatically adjusts the playback volume of music etc. to a degree that is slightly louder than the ambient noise.
It was made into a smartphone (Android) application. The ambient volume is measured using the microphone of the smartphone, the playback volume is determined by the playback volume determination model (described later), and the volume is changed. I use AudioManager or AudioRecord.
First, there is the issue of how many milliseconds the peripheral volume should be measured. If it is too short, you will not be able to get the ambient volume correctly. Next, there is the question of how often the cycle (get ambient volume-> change playback volume) is performed. If the frequency is too high, the volume will change too quickly, which is annoying.
I tried various things, but in conclusion, I decided to get the ambient volume once every 100 milliseconds as the maximum value among the values measured several times. This was repeated 10 times, and the average value of those measured values was treated as the final ambient volume value. For convenience, the cycle of (Get ambient volume-> Change playback volume) is performed every 1000 milliseconds.
This process may be easier to understand if you read the source code. The source code that executes this cycle is as follows.
package com.gmail.axis38akasira.autovolumer;
import android.os.Handler;
import android.support.annotation.NonNull;
import android.widget.TextView;
import com.gmail.axis38akasira.autovolumer.notifications.NotificationWrapper;
class VolumeManager implements Runnable {
private AudioResources aRes;
private Handler handler;
private TextView textView_envVol, textView_playingVol;
private NotificationWrapper notificationWrapper;
private int micSenseCnt = 0, micSenseSum = 0;
VolumeManager(@NonNull AudioResources aRes, @NonNull Handler handler,
@NonNull TextView textView_envVol, @NonNull TextView textView_playingVol,
@NonNull NotificationWrapper notificationWrapper) {
this.aRes = aRes;
this.handler = handler;
this.textView_envVol = textView_envVol;
this.textView_playingVol = textView_playingVol;
this.notificationWrapper = notificationWrapper;
}
@Override
public void run() {
if (aRes.getMicAccessAllowed()) {
final short[] buffer = aRes.readByAudioRecorder();
//maximum
int max_val = Integer.MIN_VALUE;
for (short x : buffer) {
max_val = Math.max(max_val, x);
}
//Measure many times and use the average value as the measurement result during that time interval.
micSenseSum += max_val;
if (micSenseCnt != 9) micSenseCnt++;
else {
final double inputLevel = micSenseSum / 10.0;
micSenseSum = 0;
micSenseCnt = 0;
textView_envVol.setText(String.valueOf(inputLevel));
final int outLevel = aRes.applyPlayingVolume(inputLevel, textView_playingVol);
if (outLevel != -1) {
notificationWrapper.post(
MainActivity.class, "Automatic volume adjustment is enabled",
notificationWrapper.getActivity().getString(R.string.vol_playing)
+ String.valueOf(outLevel)
);
}
}
}
handler.postDelayed(this, 100);
}
}
First, I thought about creating something like a function that determines the playback volume from the surrounding volume level.
At this time, if the number of steps of the playback volume is the same for all devices (and it is guaranteed that it will not change even if the OS is updated), it seems that it can be easily realized by writing some ifs.
Input:Peripheral volume
return:Playback volume
func playback volume determination function:
if peripheral volume< 750:
return 0
if peripheral volume< 3750:
return 1
if peripheral volume< 9750:
return 2
(Continued for the number of steps of playback volume)
However, I found it difficult to assume (and it was too easy and not interesting). Therefore, we decided to create a mathematical model that derives the playback volume from the surrounding volume level in the form of a continuous function that extends the output to a real value. Specifically, I tried to create a model that returns the ratio of the peripheral volume level to the maximum volume as a real value.
First, using the actual device (Samsung Galaxy Feel SC-04J), we measured the ideal playback volume (the ratio of the to the maximum volume) to the surrounding volume. (However, we only examine the boundary part, because it is intuitively clear that the function to be created will be a monotonous increase in a broad sense)
Based on the measurement results, the corresponding data of the peripheral volume and the playback volume is created. Here is a scatter plot that visualizes this.
The input volume level from the microphone is given as a signed 16-bit integer type, so the range is [0, 32767). The horizontal axis is the raw volume level * 10 ^ 5 so that it can be processed easily later.
Based on this data, I would like to fit the model and obtain the following non-smooth function (?).
I don't want it to be in a gradual shape, so avoid highly expressive models like NN and use as simple a model as possible. The following functions are prepared as candidates.
a+bx \\
a+bx+cx^2 \\
a+b \sqrt{x} \\
a+b \log{x}
Let's train on Jupyter Notebook and see the accuracy and parameters after training.
# a +Fits bx
from sklearn.linear_model import LinearRegression
lr = LinearRegression().fit(np.array(x).reshape(-1, 1), y)
print(lr.score(np.array(x).reshape(-1, 1),y), lr.intercept_, lr.coef_)
# a + bx + cx^Fits 2
lr2 = LinearRegression().fit(np.dstack((np.power(np.array(x),2),np.array(x)))[0], y)
print(lr2.score(np.dstack((np.power(np.array(x),2),np.array(x)))[0],y), lr2.intercept_, lr2.coef_)
# a + b sqrt(x)Fits to
lr3 = LinearRegression().fit(np.array(np.sqrt(x)).reshape(-1, 1), y)
print(lr3.score(np.array(np.sqrt(x)).reshape(-1, 1),y), lr3.intercept_, lr3.coef_)
# a + b log(x)Fits to
log_r = LinearRegression().fit(np.array(np.log(x[1:])).reshape(-1, 1), y[1:])
print(log_r.score(np.array(np.log(x[1:])).reshape(-1, 1),y[1:]), log_r.intercept_, log_r.coef_)
Precision, constant term, coefficient
0.9566515430381373 0.05703034713007743 [0.85320093]
0.9858850873387448 0.035720497728703726 [-1.91782117 1.43981898]
0.9981469854250034 -0.013011305980174026 [0.56593706]
0.9695780780725732 0.39569447022473625 [0.09291432]
Let's visualize it as a graph.
#Graph depiction
RNG = np.linspace(0, 0.32, 100)
DIV_NUM = 15 #Maximum playback volume that varies from device to device
plt.figure(figsize=(18,9))
plt.xlabel("env_noise")
plt.ylabel("play_level")
plt.scatter(df["noise"]/100000, df["Playback volume"]/DIV_NUM, label="data")
plt.plot(RNG, lr.intercept_ + lr.coef_ * RNG, label="a+bx", color="green")
plt.plot(RNG, lr2.intercept_ + lr2.coef_[1] * RNG + lr2.coef_[0] * RNG * RNG, label="a+bx+cx^2", color="red")
plt.plot(RNG, lr3.intercept_ + lr3.coef_ * np.sqrt(RNG), label="a+ b sqrt(x)", color="purple")
plt.plot(RNG, log_r.intercept_ + log_r.coef_ * np.log(RNG), label="a+ b log(x)", color="cyan")
plt.legend(loc='upper left', prop={'size':20})
** I don't know which one! ** **
When actually determining the playback volume, it is necessary to calculate (maximum volume * volume ratio) and then convert it to an integer. Therefore, the playback volume (integer value) after the calculation was displayed on a scatter plot, and the difference from the measured data was confirmed.
#Try to actually determine the playback volume of the device using the function that calculates the playback volume from the noise.
#Round the value of the function to an integer
RNG = np.linspace(0.001, 0.32, 500)
DIV_NUM = 15 #Maximum playback volume that varies from device to device
plt.figure(figsize=(18,12))
plt.scatter(df["noise"]/100000, df["Playback volume"], label="data")
plt.plot(RNG, np.round(DIV_NUM * (lr.intercept_ + lr.coef_[0] * RNG)), label="a+bx round", color="green")
plt.plot(RNG, np.round(DIV_NUM * (lr2.intercept_ + lr2.coef_[1] * RNG + lr2.coef_[0] * RNG * RNG)), label="a+bx+cx^2 round", color="red")
plt.plot(RNG, np.round(DIV_NUM * (lr3.intercept_ + lr3.coef_ * np.sqrt(RNG))), label="a+ b sqrt(x)", color="purple")
plt.plot(RNG, np.round(DIV_NUM * (log_r.intercept_ + log_r.coef_ * np.log(RNG))), label="a+ b log(x)", color="cyan")
plt.legend(loc='upper left', prop={'size':20})
I chose this because it seems that the correct playback volume can be determined by using $ a + b \ sqrt {x} $ (I really want to make a more quantitative judgment). From the result when it fits, at this time
\begin{align}
a &= 0.56593706 \\
b &= -0.013011305980174026
\end{align}
You can see that. Finally, add a function to use this to the app. (Variable names are slightly different, but please guess)
package com.gmail.axis38akasira.autovolumer;
class RegressionModel {
private final static double[] beta = {-0.012529002932674477, 0.56377634};
static Double infer(final double x) {
return beta[0] + beta[1] * Math.sqrt(x);
}
}
UI
I made an activity appropriately.
I thought it would be better if it was easy to understand whether it was valid or not, so I made it possible to show the status in the notification,
here https://github.com/ryhoh/SmartVolumer
Personally, it's a convenient app. I regret that this was a good way to determine the model. This was achieved by using a smartphone microphone, but of course the effect will be worse if the smartphone is put in a bag. In reality, it would be ideal if the microphone was attached near the earphones. In that respect, I use it with wireless earphones, but I am wondering what happens when I use wired earphones with a built-in microphone.