What do you guys do when learning a model with machine learning? Are you looking at the log output to the console? "Oh, it's lower than I expected!" "Oh, the accuracy is getting worse, do your best !!" While thinking, it's surprisingly fun to see.
But even if you cheer for it, the model doesn't do its best and there is no dramatic development, so it's a waste of time to see it. I noticed when I was thinking about that.
** "Oh, why don't you train this time?" **
That's why I thought of an environment where I could train muscles between machine learning.
This is Hattori from NTT DoCoMo. This is the article on the 23rd day of AdventCalender2019, NTT DoCoMo Service Innovation Department.
--I like muscle training --People in the machine learning area who lack exercise --People who often use LightGBM such as kaggler
In order to improve the accuracy of the model, it is necessary to repeat the hypothesis and verification many times.
However, when the accuracy does not improve at all, I think that motivation may decrease.
** In such a case, muscle training is effective. ** **
When you do muscle training, the secretion of a hormone called "testosterone" is activated. Testosterone is a hormone that controls the motivation of the brain, and by secreting it Even when the accuracy does not improve and you are depressed
"OK! Once again!"
It came to be thought that the accuracy was improved.
Also, I feel that muscle training has become popular recently in the world. An animation of high school girls doing muscle training was broadcast, and an app for muscle training on the Yamanote line came out. After all the demand for muscle training seems to be high.
I can't really force muscle training, so
Focus on such places.
In addition, LightGBM is the target model to be implemented.
LightGBM is a gradient boosting library developed by Microsoft. It is often used in Kaggle [^ 1], and it is attractive for its high accuracy and fast learning speed. XGBoost is also famous, but I have the impression that LightGBM is being used these days.
About LightGBM, the following article is easy to understand.
LightGBM Official Document (English) 7 Reasons to Do "First LightGBM" A thorough introduction to LightGBM-How to use LightGBM, how it works, and how it differs from XGBoost
This time, we will implement it mainly using LightGBM's Callback function. LightGBM's Callback function is executed during learning by passing a Callback function that you define during learning.
Basically, in many machine learning libraries (excluding the NN framework), learning is performed on the library side, so it is difficult for the user to customize during learning, but the callback function allows various customizations. Become.
For example
--Send the learning progress to logger (and send it to Slack / LINE) --Implementing your own Early Stopping --Dynamic change of hyperparameters
There are many uses such as, and it is useful for heavy users.
The Callback function is defined to accept an argument of type namedtuple (env in the sample code below). Since it contains a variable that includes the learning status, define the process that uses it. What you can receive
--Hyperparameters --Current number of iterations --Train / valid score in current iteration --Learning model (Booster class)
Etc.
When the defined callback function is learned and executed such as lgb.train ()
and lgb.fit ()
, it is passed in list format with the argument name callbacks.
Below is a simple sample code.
import lightgbm as lgb
....
def callback_func(env):
"""
Callback function you define
Argument env(namedtuple)You can get the learning status from
"""
if env.iteration + 1 % 100 == 0:
print(f"now,{env.iteration + 1}Iteration is over")
lgb.train(
lgb_param,
.....
callbacks=[callback_func] #Pass the function defined in callbacks in list format
)
If you look at the links below, you'll find out more about the contents and other uses.
LightGBM/callback.py at master · Microsoft/LightGBM · GitHub How to use LightGBM (callback function) that is too detailed to convey Output learning history via logger using LightGBM callback
--At the start of learning: Guidance for starting muscle training menu --Example: "Start 30 abs" -Make a metronome sound (pawn sound) at regular intervals --Example: Pawn every 5 seconds --End guidance at the end of learning / maximum number of times --Example: "Thank you for your hard work." "I did my best."
Great for muscle training
--A fixed time (example: plank 30 seconds) --The number of times is fixed (Example: 30 abs)
There are two types, and the timing to output the metronome sound changes depending on which type.
Sound is emitted at regular intervals until the target time is reached. However, instead of setting it to normal time, I decided to set the target time and the interval of sounding with machine learning iteration because I wanted to do my best on the same time axis as LightGBM during learning.
I feel that ** "300iteration plank" ** makes friends with the model better than "30 second plank".
If the number of times is fixed, the pace is important, so it is difficult to match the number of iterations, so set it in seconds. In the first few iterations, the number of iterations for the specified number of seconds is calculated, and after that, the metronome sound is output each time.
It's not good to just do the same training, so I made it possible to specify or randomly set it myself. We recommend random settings and enjoy including what kind of training you do.
If you find out "Oh, the accuracy has improved!" During the painful muscle training, you will be motivated for the muscle training. It also prevents you from being unable to concentrate on muscle training due to concerns about accuracy during learning.
In order to implement this, the learning log is output to the log file every time, Implemented to read and compare the previous learning log.
The accuracy of the model also affects the muscle training in terms of giving it a game-like feel. This can also be handled by comparing with the previous learning log.
Stop learning on the way ≒ accuracy did not improve In that case, it is a penalty muscle training. It is handled by catching Keyboard Interrupt Exception.
I don't allow you to run through with "Ctrl + C".
#VLC installation
brew cask install vlc
# Python-VLC installation
pip install python-vlc
#Install LightGBM
pip install lightgbm
It is necessary to prepare a sound source.
For voice, create with MacOS standard voice reading function [^ 2], Other sound effects are available from the free sound source site [^ 3].
You may be more motivated by preparing the voice you like.
Set the muscle training menu, type, number of executions, sound source path, etc. It's not essential, so I'll fold it.
train_config = {
"planc":{
"train_menu": "planc",
"train_type": "duration",
"total_iteration": 500,
},
"abs":{
"train_menu": "abs",
"train_type": "iter",
"total_times": 50,
"seconds_1time": 3
},
"pushup":{
"train_menu": "pushup",
"train_type": "iter",
"total_times": 50,
"seconds_1time": 2
},
"squat":{
"train_menu": "squat",
"train_type": "iter",
"total_times": 50,
"seconds_1time": 2
},
}
def make_sound_dict(config):
sound_dict = {
"iteration_10":[
'sound/iter_sound_1.mp3'
],
"iteration_100":[
'sound/iter_sound_2.mp3'
],
"iteration_100_better":[
'sound/iter_sound_3.mp3'
],
"train_finish":[
'sound/finish.mp3'
]
}
if config["train_type"] == "duration":
sound_dict["train_start"] = [
f"sound/{config['total_iteration']}iter.mp3", #N iteration
f"sound/{config['train_menu']}_train.mp3",
"sound/start.mp3"
]
elif config["train_type"] == "iter":
sound_dict["train_start"] = [
f"sound/{config['train_menu']}_train.mp3", #Muscle training name (ex:Push-ups, abs ,. .. .. )
f"sound/{config['total_times']}times.mp3", #N iteration
"sound/start.mp3" #start
]
return sound_dict
It's a little long, but it's the main part, so I'll put it as it is.
class MuscleSound():
"""
Callback for muscle training with LightGBM
"""
def __init__(self, train_config, train_menu="planc"):
if train_menu == "random":
#In the case of random, set randomly from the menu
train_menu = random.choice(train_config.keys())
assert(train_menu in train_config.keys())
self.train_menu = train_menu
self.config = train_config[train_menu]
self.sound_dict = make_sound_dict(self.config)
self.log_dir = "./muscle"
self.start_time = None
self.n_iter_1time = None
# setup
os.makedirs(self.log_dir, exist_ok=True)
self._setup_prev_log()
self._load_prev_log()
def media_play(self, media_list):
"""
Specified media_Play the audio files in list in order
"""
p = vlc.MediaListPlayer()
vlc_media_list = vlc.MediaList(media_list)
p.set_media_list(vlc_media_list)
p.play()
def _setup_prev_log(self):
"""
The previous learning log is curr.It's a log
prev.rename to log
"""
log_filepath = os.path.join(self.log_dir, "curr.log")
if os.path.exists(log_filepath):
os.rename(
log_filepath,
os.path.join(self.log_dir, "prev.log")
)
def _load_prev_log(self, log_filepath="muscle/prev.log"):
"""
Read the log from the last learning
"""
if os.path.exists(log_filepath):
self.prev_log = pd.read_csv(
log_filepath, names=["iter","score"]
).set_index("iter")["score"]
else:
self.prev_log = None
def _check_score(self, env):
"""
Compare scores and save logs
"""
n_iter = env.iteration + 1
is_better_score = False
#Extract Validation score
# valid_Use the score of the last dataset in sets
curr_score = env.evaluation_result_list[-1][2]
#Whether the higher the number, the better the score
is_upper_metric = env.evaluation_result_list[-1][3]
#Compare if the previous log has the same iteration score
if self.prev_log is not None and n_iter in self.prev_log.index:
prev_score = self.prev_log.loc[n_iter]
is_better_score = curr_score > prev_score \
if is_upper_metric else curr_score < prev_score
#Save log
with open(os.path.join(self.log_dir, "curr.log"), "a") as f:
f.write(f"{n_iter},{curr_score}\n")
return is_better_score
def play_train_start(self, train_menu):
"""
Sound reproduction at the start of learning
"""
self.play_media_list(self.sound_dict["train_start"])
#A little sleep so that learning (muscle training) does not start before reading
time.sleep(5)
def duration_sound(self, env):
"""
For muscle training with a fixed time
Make a sound every certain number of iterations
"""
if (env.iteration + 1) > self.config["total_iteration"]:
#Do nothing if the maximum number of iterations for muscle training is exceeded
return
elif env.iteration + 1 == self.config["total_iteration"]:
#Signaled because the number of ends has been reached
self.media_play(self.sound_dict["train_finish"])
elif (env.iteration + 1) % 100 == 0:
#Sound every 100 times
is_better_score = self._check_score(env)
if is_better_score:
self.media_play(self.sound_dict["iteration_100_better"])
else:
self.media_play(self.sound_dict["iteration_100"])
elif (env.iteration + 1) % 10 == 0:
#Every 10 sounds
self.media_play(self.sound_dict["iteration_10"])
def iter_sound(self, env):
"""
Sound playback according to time (for muscle training with a fixed number of times)
Make a sound every fixed number of seconds
"""
if self.n_iter_1time is None:
return
if (env.iteration + 1) > self.config["total_times"]*self.n_iter_1time:
#Do nothing if the maximum number of muscle training is exceeded
return
if (env.iteration + 1) == self.config["total_times"]*self.n_iter_1time:
#When the maximum number of times is reached, the end will be announced
self.media_play(self.sound_dict["train_finish"])
if (env.iteration + 1)%self.n_iter_1time != 0:
#If it is not divisible by the number of iterations, do nothing
return
if ((env.iteration + 1)//self.n_iter_1time) % 10 == 0:
#Sound every 100 times
self.media_play(self.sound_dict["iteration_100"])
else:
#Every 10 sounds
self.media_play(self.sound_dict["iteration_10"])
def __call__(self, env):
if env.iteration == 0:
#At the beginning of learning
self.media_play(self.sound_dict["train_start"])
if self.config["train_type"] == "times":
#Set the appropriate number of iterations per session
if env.iteration == 1:
self.start_time = time.time()
elif env.iteration == 11:
time_10iter = time.time() - self.start_time
self.n_iter_1time = int(self.config["seconds_1time"] / time_10iter * 10)
print("Number of iterations per session", self.n_iter_1time)
if not env.evaluation_result_list:
return
#Metronome sound reproduction according to muscle training type
if self.config["train_type"] == "iter":
self.iter_sound(env)
elif self.config["train_type"] == "duration":
self.duration_sound(env)
This is a process to impose a penalty if you stop learning in the middle.
This is not the Callback function, but the Exception of KeyboardInterrupt is caught and processed.
I also make it available as a decorator so that I can write it easily when learning.
def penalty_muscle(func):
def play_media_list(media_list):
"""
Specified media_Play the audio files in list in order
"""
p = vlc.MediaListPlayer()
vlc_media_list = vlc.MediaList(media_list)
p.set_media_list(vlc_media_list)
p.play()
def wrapper_func(*args, **kwargs):
try:
func(*args, **kwargs)
except KeyboardInterrupt:
interrupt_list = [
'sound/keyboard_interrupt.mp3',
'sound/1000iter.mp3',
'sound/planc_train.mp3',
'sound/add.mp3'
]
print("Ctrl+Since I did C, I added muscle training!!!")
play_media_list(interrupt_list)
time.sleep(5)
for i in range(100):
if i % 10 == 0 and i > 0:
play_media_list([ 'sound/iter_sound_2.mp3'])
else:
play_media_list([ 'sound/iter_sound_1.mp3'])
time.sleep(1)
raise Exception(KeyboardInterrupt)
return wrapper_func
For normal LightGBM learning
--Add a penalty_muscle decorator to the function to be trained --Create an instance of the MuscleSound class and pass it to callbacks
only. The decorator only catches KeyboardInterrupt, so any function will do as long as you are learning LightGBM inside the function. With this, you can easily make a muscle training version of LightGBM at any time.
@penalty_muscle #Add a decorator to the learning function
def train_muscle_lgb(train_df, target_col, use_cols):
folds = KFold(n_splits=2, shuffle=True, random_state=2019)
for i, (trn_, val_) in enumerate(folds.split(train_df, train_df[target_col])):
print(f"============fold{i}============")
trn_data = lgb.Dataset(
train_df.loc[trn_, use_cols],
label=train_df.loc[trn_, target_col]
)
val_data = lgb.Dataset(
train_df.loc[val_, use_cols],
label=train_df.loc[val_, target_col]
)
lgb_param = {
"objective": "binary",
"metric": "auc",
"learning_rate": 0.01,
"verbosity": -1,
}
#Instantiation of MuscleSound class(Real Callback function)
callback_func = MuscleSound(train_config, train_menu="random")
model = lgb.train(
lgb_param,
trn_data,
num_boost_round=10000,
valid_sets=[trn_data, val_data],
verbose_eval=100,
early_stopping_rounds=500,
callbacks=[callback_func] #Specify callback function
)
It is an animated GIF due to various reasons. Actually, you will hear the voice and sound written in the subtitles, so please play it in your brain.
As a result of actually doing muscle training myself, there were still issues.
No matter how much you hear the sound, if you do it many times, you will want to skip it. It's a strange thing.
――If you do it too many times, you will get bored, so set the maximum number of muscle trainings per day. ――It senses that you are doing muscle training with IoT devices. I will give the result to twitter etc.
It seems that it is necessary to take measures such as. The latter seems to have a high hurdle, but if you do so far, everyone may use it?
When there was little learning data, I didn't have time to train. I want hundreds of thousands of cases. And LightGBM is still fast.
Countermeasures are difficult, but there is no choice but to lower the learning rate or create features. (Huh? Are you in a situation to use LightGBM for muscle training?)
I don't like muscle training so much, but for muscle training lovers, increasing the number of muscle training may not be a penalty. .. ?? Some people may dare not improve the accuracy or Ctrl + C. .. ??
Muscle training is deep, whether you like it or do it with a penalty.
Let's spend a meaningful time doing muscle training while learning the model! !!
[^ 1]: The world's most famous data analysis platform (https://www.kaggle.com/) [^ 2]: Synthetic voice narration sound source that can be created with macOS standard software [^ 3]: Sound effect lab (sound source site)
Recommended Posts