For those of you who glance at the log while learning with machine learning ~ Muscle training with LightGBM ~

What do you guys do when learning a model with machine learning? Are you looking at the log output to the console? "Oh, it's lower than I expected!" "Oh, the accuracy is getting worse, do your best !!" While thinking, it's surprisingly fun to see.

But even if you cheer for it, the model doesn't do its best and there is no dramatic development, so it's a waste of time to see it. I noticed when I was thinking about that.

** "Oh, why don't you train this time?" **

That's why I thought of an environment where I could train muscles between machine learning.

This is Hattori from NTT DoCoMo. This is the article on the 23rd day of AdventCalender2019, NTT DoCoMo Service Innovation Department.

Who is the target of this article

--I like muscle training --People in the machine learning area who lack exercise --People who often use LightGBM such as kaggler

Why do you train your muscles?

In order to improve the accuracy of the model, it is necessary to repeat the hypothesis and verification many times.

However, when the accuracy does not improve at all, I think that motivation may decrease.

** In such a case, muscle training is effective. ** **

When you do muscle training, the secretion of a hormone called "testosterone" is activated. Testosterone is a hormone that controls the motivation of the brain, and by secreting it Even when the accuracy does not improve and you are depressed

"OK! Once again!"

It came to be thought that the accuracy was improved.

筋トレ効果.png

Also, I feel that muscle training has become popular recently in the world. An animation of high school girls doing muscle training was broadcast, and an app for muscle training on the Yamanote line came out. After all the demand for muscle training seems to be high.

Muscle training on the Yamanote Line ... "Train-only app" introduced by JR East. But don't you care about the eyes around you?

Direction

I can't really force muscle training, so

Create a situation that makes you think "I have to train my muscles!" → ** Muscle training menu-Play sounds such as start signals and rhythms **
Increase motivation for muscle training → ** Depending on the learning situation (accuracy), the number of muscle trainings can be changed to give the game a game.

Focus on such places.

In addition, LightGBM is the target model to be implemented.

The "deep learning" model tends to take a long time to learn, and it is hard to keep training even if you try it, so I gave up ... orz

What is LightGBM

LightGBM is a gradient boosting library developed by Microsoft. It is often used in Kaggle [^ 1], and it is attractive for its high accuracy and fast learning speed. XGBoost is also famous, but I have the impression that LightGBM is being used these days.

About LightGBM, the following article is easy to understand.

LightGBM Official Document (English) 7 Reasons to Do "First LightGBM" A thorough introduction to LightGBM-How to use LightGBM, how it works, and how it differs from XGBoost

LightGBM Callback function

This time, we will implement it mainly using LightGBM's Callback function. LightGBM's Callback function is executed during learning by passing a Callback function that you define during learning.

Basically, in many machine learning libraries (excluding the NN framework), learning is performed on the library side, so it is difficult for the user to customize during learning, but the callback function allows various customizations. Become.

For example

--Send the learning progress to logger (and send it to Slack / LINE) --Implementing your own Early Stopping --Dynamic change of hyperparameters

There are many uses such as, and it is useful for heavy users.

How the Callback function works

The Callback function is defined to accept an argument of type namedtuple (env in the sample code below). Since it contains a variable that includes the learning status, define the process that uses it. What you can receive

--Hyperparameters --Current number of iterations --Train / valid score in current iteration --Learning model (Booster class)

Etc.

When the defined callback function is learned and executed such as lgb.train () and lgb.fit (), it is passed in list format with the argument name callbacks. Below is a simple sample code.

import lightgbm as lgb

....

def callback_func(env):
    """
Callback function you define
Argument env(namedtuple)You can get the learning status from
    """
    if env.iteration + 1 % 100 == 0:
        print(f"now,{env.iteration + 1}Iteration is over")


lgb.train(
    lgb_param,
    .....
    callbacks=[callback_func] #Pass the function defined in callbacks in list format
)

If you look at the links below, you'll find out more about the contents and other uses.

LightGBM/callback.py at master · Microsoft/LightGBM · GitHub How to use LightGBM (callback function) that is too detailed to convey Output learning history via logger using LightGBM callback

Implemented function

Give muscle training guidance during learning

--At the start of learning: Guidance for starting muscle training menu --Example: "Start 30 abs" -Make a metronome sound (pawn sound) at regular intervals --Example: Pawn every 5 seconds --End guidance at the end of learning / maximum number of times --Example: "Thank you for your hard work." "I did my best."

Great for muscle training

--A fixed time (example: plank 30 seconds) --The number of times is fixed (Example: 30 abs)

There are two types, and the timing to output the metronome sound changes depending on which type.

Things with a fixed time

Sound is emitted at regular intervals until the target time is reached. However, instead of setting it to normal time, I decided to set the target time and the interval of sounding with machine learning iteration because I wanted to do my best on the same time axis as LightGBM during learning.

I feel that ** "300iteration plank" ** makes friends with the model better than "30 second plank".

Those with a fixed number of times

If the number of times is fixed, the pace is important, so it is difficult to match the number of iterations, so set it in seconds. In the first few iterations, the number of iterations for the specified number of seconds is calculated, and after that, the metronome sound is output each time.

Muscle training menu is specified or random

It's not good to just do the same training, so I made it possible to specify or randomly set it myself. We recommend random settings and enjoy including what kind of training you do.

If the accuracy is better than last time, the metronome sound will change

If you find out "Oh, the accuracy has improved!" During the painful muscle training, you will be motivated for the muscle training. It also prevents you from being unable to concentrate on muscle training due to concerns about accuracy during learning.

In order to implement this, the learning log is output to the log file every time, Implemented to read and compare the previous learning log.

If the accuracy is worse than last time, a penalty (additional muscle training)

The accuracy of the model also affects the muscle training in terms of giving it a game-like feel. This can also be handled by comparing with the previous learning log.

Penalty (additional muscle training) even if learning is stopped in the middle

Stop learning on the way ≒ accuracy did not improve In that case, it is a penalty muscle training. It is handled by catching Keyboard Interrupt Exception.

I don't allow you to run through with "Ctrl + C".

Implementation

Execution environment / preparation

MacOS 10.14.5
Python 3.6.8
lightgbm 2.3.0
VLC media player 3.0.8

Installation

#VLC installation
brew cask install vlc
# Python-VLC installation
pip install python-vlc
#Install LightGBM
pip install lightgbm

Sound source preparation

It is necessary to prepare a sound source.

For voice, create with MacOS standard voice reading function [^ 2], Other sound effects are available from the free sound source site [^ 3].

You may be more motivated by preparing the voice you like.

Source code

Pre-defined config

Set the muscle training menu, type, number of executions, sound source path, etc. It's not essential, so I'll fold it.

Config code for muscle training menu

train_config = {
    "planc":{
        "train_menu": "planc",
        "train_type": "duration",
        "total_iteration": 500,
    },
    "abs":{
        "train_menu": "abs",
        "train_type": "iter",
        "total_times": 50,
        "seconds_1time": 3
    },
    "pushup":{
        "train_menu": "pushup",
        "train_type": "iter",
        "total_times": 50,
        "seconds_1time": 2
    },  
    "squat":{
        "train_menu": "squat",
        "train_type": "iter",
        "total_times": 50,
        "seconds_1time": 2
    },
}

def make_sound_dict(config):
    sound_dict = {
        "iteration_10":[
            'sound/iter_sound_1.mp3'
        ],
        "iteration_100":[
            'sound/iter_sound_2.mp3'
        ],
        "iteration_100_better":[
            'sound/iter_sound_3.mp3'
        ],
        "train_finish":[
            'sound/finish.mp3'
        ]
    }
    if config["train_type"] == "duration":
        sound_dict["train_start"] = [
            f"sound/{config['total_iteration']}iter.mp3", #N iteration
            f"sound/{config['train_menu']}_train.mp3",
            "sound/start.mp3"
        ]
    elif config["train_type"] == "iter":
        sound_dict["train_start"] = [
            f"sound/{config['train_menu']}_train.mp3", #Muscle training name (ex:Push-ups, abs ,. .. .. )
            f"sound/{config['total_times']}times.mp3",  #N iteration
            "sound/start.mp3" #start
        ]
    return sound_dict

Muscle training class including Callback function

It's a little long, but it's the main part, so I'll put it as it is.

class MuscleSound():
    """
Callback for muscle training with LightGBM
    """
    def __init__(self, train_config, train_menu="planc"):
        if train_menu == "random":
            #In the case of random, set randomly from the menu
            train_menu = random.choice(train_config.keys())
        assert(train_menu in train_config.keys())
        self.train_menu = train_menu
        self.config = train_config[train_menu]
        self.sound_dict = make_sound_dict(self.config)
        self.log_dir = "./muscle"
        self.start_time = None
        self.n_iter_1time = None
        # setup
        os.makedirs(self.log_dir, exist_ok=True)
        self._setup_prev_log()
        self._load_prev_log()

    def media_play(self, media_list):
        """
Specified media_Play the audio files in list in order
        """
        p = vlc.MediaListPlayer()
        vlc_media_list = vlc.MediaList(media_list)
        p.set_media_list(vlc_media_list)
        p.play()

    def _setup_prev_log(self):
        """
The previous learning log is curr.It's a log
        prev.rename to log
        """
        log_filepath = os.path.join(self.log_dir, "curr.log")
        if os.path.exists(log_filepath):
            os.rename(
                log_filepath,
                os.path.join(self.log_dir, "prev.log")
            )

    def _load_prev_log(self, log_filepath="muscle/prev.log"):
        """
Read the log from the last learning
        """
        if os.path.exists(log_filepath):
            self.prev_log = pd.read_csv(
                log_filepath, names=["iter","score"]
            ).set_index("iter")["score"]
        else:
            self.prev_log = None

    def _check_score(self, env):
        """
Compare scores and save logs
        """
        n_iter = env.iteration + 1
        is_better_score = False
        #Extract Validation score
        # valid_Use the score of the last dataset in sets
        curr_score = env.evaluation_result_list[-1][2]
        #Whether the higher the number, the better the score
        is_upper_metric = env.evaluation_result_list[-1][3]
        #Compare if the previous log has the same iteration score
        if self.prev_log is not None and n_iter in self.prev_log.index:
            prev_score = self.prev_log.loc[n_iter]
            is_better_score = curr_score > prev_score \
                if is_upper_metric else curr_score < prev_score
        #Save log
        with open(os.path.join(self.log_dir, "curr.log"), "a") as f:
            f.write(f"{n_iter},{curr_score}\n")
        return is_better_score

    def play_train_start(self, train_menu):
        """
Sound reproduction at the start of learning
        """
        self.play_media_list(self.sound_dict["train_start"])
        #A little sleep so that learning (muscle training) does not start before reading
        time.sleep(5)

    def duration_sound(self, env):
        """
For muscle training with a fixed time
Make a sound every certain number of iterations
        """
        if (env.iteration + 1) > self.config["total_iteration"]:
            #Do nothing if the maximum number of iterations for muscle training is exceeded
            return
        elif env.iteration + 1 == self.config["total_iteration"]:
            #Signaled because the number of ends has been reached
            self.media_play(self.sound_dict["train_finish"])
        elif (env.iteration + 1) % 100 == 0:
            #Sound every 100 times
            is_better_score = self._check_score(env)
            if is_better_score:
                self.media_play(self.sound_dict["iteration_100_better"])
            else:
                self.media_play(self.sound_dict["iteration_100"])
        elif (env.iteration + 1) % 10 == 0:
            #Every 10 sounds
            self.media_play(self.sound_dict["iteration_10"])

    def iter_sound(self, env):
        """
Sound playback according to time (for muscle training with a fixed number of times)
Make a sound every fixed number of seconds
        """
        if self.n_iter_1time is None:
            return
        if  (env.iteration + 1) > self.config["total_times"]*self.n_iter_1time:
            #Do nothing if the maximum number of muscle training is exceeded
            return
        if  (env.iteration + 1) == self.config["total_times"]*self.n_iter_1time:
            #When the maximum number of times is reached, the end will be announced
            self.media_play(self.sound_dict["train_finish"])
        if  (env.iteration + 1)%self.n_iter_1time != 0:
            #If it is not divisible by the number of iterations, do nothing
            return
        if ((env.iteration + 1)//self.n_iter_1time) % 10 == 0:
            #Sound every 100 times
            self.media_play(self.sound_dict["iteration_100"])
        else:
            #Every 10 sounds
            self.media_play(self.sound_dict["iteration_10"])

    def __call__(self, env):
        if env.iteration == 0:
            #At the beginning of learning
            self.media_play(self.sound_dict["train_start"])
        if self.config["train_type"] == "times":
            #Set the appropriate number of iterations per session
            if env.iteration == 1:
                self.start_time = time.time()
            elif env.iteration == 11:
                time_10iter = time.time() - self.start_time
                self.n_iter_1time = int(self.config["seconds_1time"] / time_10iter * 10)
                print("Number of iterations per session", self.n_iter_1time)
        if not env.evaluation_result_list:
            return
        #Metronome sound reproduction according to muscle training type
        if self.config["train_type"] == "iter":
            self.iter_sound(env)
        elif self.config["train_type"] == "duration":
            self.duration_sound(env)

Penalty processing at the time of interruption

This is a process to impose a penalty if you stop learning in the middle.

This is not the Callback function, but the Exception of KeyboardInterrupt is caught and processed.

I also make it available as a decorator so that I can write it easily when learning.

def penalty_muscle(func):
    def play_media_list(media_list):
        """
Specified media_Play the audio files in list in order
        """
        p = vlc.MediaListPlayer()
        vlc_media_list = vlc.MediaList(media_list)
        p.set_media_list(vlc_media_list)
        p.play()

    def wrapper_func(*args, **kwargs):
        try:
            func(*args, **kwargs)
        except KeyboardInterrupt:
            interrupt_list = [
                'sound/keyboard_interrupt.mp3',
                'sound/1000iter.mp3',
                'sound/planc_train.mp3',
                'sound/add.mp3'
            ]
            print("Ctrl+Since I did C, I added muscle training!!!")
            play_media_list(interrupt_list)
            time.sleep(5)
            for i in range(100):
                if i % 10 == 0 and i > 0:
                    play_media_list([ 'sound/iter_sound_2.mp3'])
                else:
                    play_media_list([ 'sound/iter_sound_1.mp3'])
                time.sleep(1)
            raise Exception(KeyboardInterrupt)
    return wrapper_func

Learning code example using Callback function

For normal LightGBM learning

--Add a penalty_muscle decorator to the function to be trained --Create an instance of the MuscleSound class and pass it to callbacks

only. The decorator only catches KeyboardInterrupt, so any function will do as long as you are learning LightGBM inside the function. With this, you can easily make a muscle training version of LightGBM at any time.

@penalty_muscle  #Add a decorator to the learning function
def train_muscle_lgb(train_df, target_col, use_cols):
    folds = KFold(n_splits=2, shuffle=True, random_state=2019)
    for i, (trn_, val_) in enumerate(folds.split(train_df, train_df[target_col])):
        print(f"============fold{i}============")
        trn_data = lgb.Dataset(
            train_df.loc[trn_, use_cols],
            label=train_df.loc[trn_, target_col]
        )
        val_data = lgb.Dataset(
            train_df.loc[val_, use_cols],
            label=train_df.loc[val_, target_col]
        )
        lgb_param = {
            "objective": "binary",
            "metric": "auc",
            "learning_rate": 0.01,
            "verbosity": -1,
        }
        #Instantiation of MuscleSound class(Real Callback function)
        callback_func = MuscleSound(train_config, train_menu="random")
        model = lgb.train(
            lgb_param,
            trn_data,
            num_boost_round=10000,
            valid_sets=[trn_data, val_data],
            verbose_eval=100,
            early_stopping_rounds=500,
            callbacks=[callback_func] #Specify callback function
        )

I actually tried it

It is an animated GIF due to various reasons. Actually, you will hear the voice and sound written in the subtitles, so please play it in your brain.

Challenges I felt when I actually tried it

As a result of actually doing muscle training myself, there were still issues.

I will skip

No matter how much you hear the sound, if you do it many times, you will want to skip it. It's a strange thing.

――If you do it too many times, you will get bored, so set the maximum number of muscle trainings per day. ――It senses that you are doing muscle training with IoT devices. I will give the result to twitter etc.

It seems that it is necessary to take measures such as. The latter seems to have a high hurdle, but if you do so far, everyone may use it?

Learning time is too short to be a muscle training

When there was little learning data, I didn't have time to train. I want hundreds of thousands of cases. And LightGBM is still fast.

Countermeasures are difficult, but there is no choice but to lower the learning rate or create features. (Huh? Are you in a situation to use LightGBM for muscle training?)

Is muscle training a penalty in the first place?

I don't like muscle training so much, but for muscle training lovers, increasing the number of muscle training may not be a penalty. .. ?? Some people may dare not improve the accuracy or Ctrl + C. .. ??

Muscle training is deep, whether you like it or do it with a penalty.

Summary

Let's spend a meaningful time doing muscle training while learning the model! !!

[^ 1]: The world's most famous data analysis platform (https://www.kaggle.com/) [^ 2]: Synthetic voice narration sound source that can be created with macOS standard software [^ 3]: Sound effect lab (sound source site)