What to do?

Let's make a super-simple Dragon Quest-style turn-based battle and let it learn Q. バトルスクショ.png The purpose is to make the brave man, who can only save the world with a few percent chance, smarter by Q-learning.

In addition, I will explain the implementation of the game part and Q-learning, but I will not explain the Q-learning itself. If you want to know the detailed theory of Q-learning, you will be happy if you read these good articles one by one.

Reinforcement learning that cannot be heard now (1): State value function and Bellman equation

People who want to read

--Those who want to make their own games and play with them instead of the existing simulation environment such as OpenAI Gym. ――Q Those who somehow know the theory of learning but don't know how to implement it!

Make a game

The rules are simply designed as follows:

--Brave vs Demon King 1: 1 --The only action the Demon King takes is "attack" --There are two choices for the hero's actions: "attack" and "recovery". ――The order of actions is determined by multiplying the quickness of each character by a certain random number and sorting.

Implementation of character class

Now let's implement the game itself. First is the character class.

`dq_battle.py`


class Character(object):

    """Character class"""

    ACTIONS = {0: "attack", 1: "recovery"}

    def __init__(self, hp, max_hp, attack, defence, agillity, intelligence, name):
        self.hp = hp  #Current HP
        self.max_hp = max_hp  #Maximum HP
        self.attack = attack  #Offensive power
        self.defence = defence  #Defense power
        self.agillity = agillity  #Agility
        self.intelligence = intelligence  #Wise
        self.name = name  #Character name

    #Returns a status string
    def get_status_s(self):
        return "[{}] HP:{}/{} ATK:{} DEF:{} AGI:{} INT:{}".format(
            self.name, self.hp, self.max_hp, self.attack, self.defence, self.agillity, self.intelligence)

    def action(self, target, action):

        #attack
        if action == 0:

            #Offensive power-Defensive damage calculation
            damage = self.attack - target.defence
            draw_damage = damage  #For logs

            #If the opponent's remaining HP is less than the amount of damage, the damage is just the remaining HP
            if target.hp < damage:
                damage = target.hp

            #Cause damage
            target.hp -= damage

            #Returns the battle log
            return "{}Is{}To{}Damaged".format(
                self.name, target.name, draw_damage)

        #recovery
        elif action == 1:

            #Use the amount of recovery as the INT value
            heal_points = self.intelligence
            draw_heal_points = heal_points  #For logs

            #If you can recover to maximum HP, maximum HP-Use the current HP as the recovery amount
            if self.hp + heal_points > self.max_hp:
                heal_points = self.max_hp - self.hp

            #recovery
            self.hp += heal_points

            #Returns the battle log
            return "{}HP{}Recovered".format(
                self.name, draw_heal_points)

Since the battle design this time is simple, we have put it together in one class without distinguishing between the player and the enemy.

Each character (Brave and Demon King)

--HP (physical strength) --ATTACK (attack power) --DEFENCE --AGILITY (quickness) --INTELLIGENCE

Has the status of.

Damage calculation in "attack" is

"(Own attack power)-(Opposite defense power)" *

It is calculated by the simple formula. Also, the amount of recovery with the "Recovery" command is the same as the value of cleverness.

Overall picture of battle design (state transition)

Next, we will implement the battle body. First, you need to understand the big picture (state transition) of the battle.

`dq_battle.py`


class GameState(Enum):

    """Game state management class"""
    TURN_START = auto()      #Start turn
    COMMAND_SELECT = auto()  #Command selection
    TURN_NOW = auto()        #During the turn (each character action)
    TURN_END = auto()        #End of turn
    GAME_END = auto()        #Game over

As mentioned above for the battle ** "Start turn" "Select command" "During turn" "End turn" "End game" ** There are five states.

The state transition diagram is as shown below.

バトル状態遷移.png

In this way, the basics of battle design is to loop the transition from the "turn start" state to the "turn end" state endlessly until the "game end" state (until the HP of the hero or the demon king becomes 0). Will be.

Implementation of the battle body

Now, let's implement the battle itself. Let's look at the entire code first.

`dq_battle.py`


class Game():

    """Game body"""

    HERO_MAX_HP = 20
    MAOU_MAX_HP = 50

    def __init__(self):

        #Generate a character
        self.hero = Character(
            Game.HERO_MAX_HP, Game.HERO_MAX_HP, 4, 1, 5, 7, "Brave")

        self.maou = Character(
            Game.MAOU_MAX_HP, Game.MAOU_MAX_HP, 5, 2, 6, 3, "Devil")

        #Add to character list
        self.characters = []
        self.characters.append(self.hero)
        self.characters.append(self.maou)

        #Define variables for state transitions
        self.game_state = GameState.TURN_START

        #Number of turns
        self.turn = 1

        #A string to save the battle log
        self.log = ""

    #Advance the game every turn
    def step(self, action):

        #Main loop
        while (True):
            if self.game_state == GameState.TURN_START:
                self.__turn_start()
            elif self.game_state == GameState.COMMAND_SELECT:
                self.__command_select(action)  #Pass the action
            elif self.game_state == GameState.TURN_NOW:
                self.__turn_now()
            elif self.game_state == GameState.TURN_END:
                self.__turn_end()
                break  #Exit the loop at the end of the turn
            elif self.game_state == GameState.GAME_END:
                self.__game_end()
                break

        #Whether the game is over
        done = False
        if self.game_state == GameState.GAME_END:
            done = True

        #Returns "state s, reward r, game end"
        return (self.hero.hp, self.maou.hp), self.reward, done

    #Initialize the game to the state of the first turn
    def reset(self):
        self.__init__()
        return (self.hero.hp, self.maou.hp)

    #Draw battle log
    def draw(self):
        print(self.log, end="")

    def __turn_start(self):

        #State transition
        self.game_state = GameState.COMMAND_SELECT

        #Initialize log
        self.log = ""

        #drawing
        s = " ***turn" + str(self.turn) + " ***"
        self.__save_log("\033[36m{}\033[0m".format(s))
        self.__save_log(self.hero.get_status_s())
        self.__save_log(self.maou.get_status_s())

    def __command_select(self, action):

        #Action selection
        self.action = action

        #Random number 0 for the character.5〜1.Sort by quickness of 5 and store in queue
        self.character_que = deque(sorted(self.characters,
                                          key=lambda c: c.agillity*random.uniform(0.5, 1.5)))

        #State transition
        self.game_state = GameState.TURN_NOW

        #Log save
        self.__save_log("Command selection-> " + Character.ACTIONS[self.action])

    def __turn_now(self):

        #Sequential action from the character queue
        if len(self.character_que) > 0:
            now_character = self.character_que.popleft()
            if now_character is self.hero:
                s = now_character.action(self.maou, self.action)
            elif now_character is self.maou:
                s = now_character.action(self.hero, action=0)  #Demon King always attacks

            #Save log
            self.__save_log(s)

        #Game end if HP is 0 or less
        for c in self.characters:
            if c.hp <= 0:
                self.game_state = GameState.GAME_END
                return

        #Turn end when everyone finishes action
        if len(self.character_que) == 0:
            self.game_state = GameState.TURN_END
            return

    def __turn_end(self):

        #Set reward
        self.reward = 0

        #Initialize character queue
        self.character_que = deque()

        #Turn progress
        self.turn += 1

        #State transition
        self.game_state = GameState.TURN_START

    def __game_end(self):

        if self.hero.hp <= 0:
            self.__save_log("\033[31m{}\033[0m".format("The hero is dead"))
            self.reward = -1  #Set reward
        elif self.maou.hp <= 0:
            self.__save_log("\033[32m{}\033[0m".format("Defeated the Demon King"))
            self.reward = 1  #Set reward

        self.__save_log("-----Game end-----")

    def __save_log(self, s):
        self.log += s + "\n"

The code is a bit long, but there are only two important parts of Q-learning.

The first is the step () method. This is the main part of the battle.

`dq_battle.py`


    #Advance the game every turn
    def step(self, action):

        #Main loop
        while (True):
            if self.game_state == GameState.TURN_START:
                self.__turn_start()
            elif self.game_state == GameState.COMMAND_SELECT:
                self.__command_select(action)  #Pass the action
            elif self.game_state == GameState.TURN_NOW:
                self.__turn_now()
            elif self.game_state == GameState.TURN_END:
                self.__turn_end()
                break  #Exit the loop at the end of the turn
            elif self.game_state == GameState.GAME_END:
                self.__game_end()
                break

        #Whether the game is over
        done = False
        if self.game_state == GameState.GAME_END:
            done = True

        #Returns "state s, reward r, game end"
        return (self.hero.hp, self.maou.hp), self.reward, done

Basically, the process flow is the same as the state transition diagram described above.

However, in Q-learning, the current state must be evaluated ** every turn **, so you must exit the main loop not only in the "game end" state but also in the "turn end" state.

In the "end of turn" state, the variables that must be evaluated for Q-learning are:

"State s"
"Reward r"
"Flag to determine if the game is over"

There are three.

Whether it is the end of the game is simply determined by whether the HP of the hero or the HP of the Demon King has become 0.

We need to think a little about the state s. There are multiple stats such as offensive power and defensive power, but there are only two stats that should be evaluated in Q-learning: "Brave HP" and "Devil's HP".

In this battle design, the values such as attack power and defense power are always constant, so there is no need to evaluate stats other than HP. Conversely, if the status changes due to buffing, debuffing, etc., that information is also required.

The reward r is evaluated in each of the "end of turn" and "end of game" states.

`dq_battle.py`



    def __turn_end(self):

        #Set reward
        self.reward = 0
    
    #(abridgement)

    def __game_end(self):

        if self.hero.hp <= 0:
            self.__save_log("\033[31m{}\033[0m".format("The hero is dead"))
            self.reward = -1  #Set reward
        elif self.maou.hp <= 0:
            self.__save_log("\033[32m{}\033[0m".format("Defeated the Demon King"))
            self.reward = 1  #Set reward

The reward for the passage of turns is 0. If you are conscious of the purpose of "defeating the Demon King at the fastest speed", you can make the reward after the turn negative. (However, it is difficult to set appropriate parameters.)

At the end of the game, if the hero is defeated, the reward will be "-1", and if the demon king is defeated, the reward will be "+1".

The second important part is the reset () method.

`dq_battle.py`


    #Initialize the game to the state of the first turn
    def reset(self):
        self.__init__()
        return (self.hero.hp, self.maou.hp)

It's just a method to initialize the game. In addition, it is necessary to return the initial state for Q learning.

Together with the step () method above,

** Game initialization (reset) → Advance the turn until the battle ends (step) → Game initialization (reset) → Advance the turn until the battle ends (step) ・・・ **

You can proceed with learning by repeating the game.

The above is the basic part of the game for Q-learning.

Implement Q-learning

About agent class

Q-learning is implemented within the agent class. An agent is a class like a player who actually plays a game.

Since the agent is the player himself, he can choose the action (attack or recovery) and know the state (such as the HP of the hero or the demon king), but It is not possible to know the internal information of the game (random numbers that determine the order of actions, etc.).

Learning proceeds only from "behavior" and the "state" and "reward" obtained by that action. This is a basic understanding of reinforcement learning in general, including Q-learning.

First, I will post the entire agent class.

`q-learning.py`


DIV_N = 10

class Agent:
    """Agent class"""

    def __init__(self, epsilon=0.2):
        self.epsilon = epsilon
        self.Q = []

    #Ε policy-Defined by greedy method
    def policy(self, s, actions):

        if np.random.random() < self.epsilon:

            #Random behavior with epsilon probability
            return np.random.randint(len(actions))

        else:

            #(If Q contains the state s and the Q value in that state is not 0)
            if s in self.Q and sum(self.Q[s]) != 0:

                #Act so that the Q value is maximized
                return np.argmax(self.Q[s])
            else:
                return np.random.randint(len(actions))

    #Convert state to number
    def digitize_state(self, s):

        hero_hp, maou_hp = s

        #DIV each of the HP of the hero and the demon king_Divide by N
        s_digitize = [np.digitize(hero_hp, np.linspace(0, dq_battle.Game.HERO_MAX_HP, DIV_N + 1)[1:-1]),
                      np.digitize(maou_hp, np.linspace(0, dq_battle.Game.MAOU_MAX_HP, DIV_N + 1)[1:-1])]

        # DIV_Returns the number of states up to the square of N
        return s_digitize[0] + s_digitize[1]*DIV_N

    #Q Learn
    def learn(self, env, actions, episode_count=1000, gamma=0.9, learning_rate=0.1):

        self.Q = defaultdict(lambda: [0] * len(actions))

        # episode_Battle for count
        for e in range(episode_count):

            #Reset the game environment
            tmp_s = env.reset()

            #Convert current state to number
            s = self.digitize_state(tmp_s)

            done = False

            #Repeat the action until the end of the game
            while not done:

                # ε-Choose an action according to the greedy policy
                a = self.policy(s, actions)

                #Advance the game for one turn and return the "state, reward, game end" at that time
                tmp_s, reward, done = env.step(a)

                #Convert state to number
                n_state = self.digitize_state(tmp_s)

                #Value gained by action a(gain) =Immediate reward+Time discount rate*Maximum Q value in the following states
                gain = reward + gamma * max(self.Q[n_state])

                #Q value currently being estimated (before learning)
                estimated = self.Q[s][a]

                #Update the Q value based on the current estimated value and the actual value when performing action a
                self.Q[s][a] += learning_rate * (gain - estimated)

                #Change the current state to the next state
                s = n_state

Convert state to number

What's a little confusing in the agent class is the method that converts the state to a number.

`q-learning.py`


    #Convert state to number
    def digitize_state(self, s):

        hero_hp, maou_hp = s

        #DIV each of the HP of the hero and the demon king_Divide into N
        s_digitize = [np.digitize(hero_hp, np.linspace(0, dq_battle.Game.HERO_MAX_HP, DIV_N + 1)[1:-1]),
                      np.digitize(maou_hp, np.linspace(0, dq_battle.Game.MAOU_MAX_HP, DIV_N + 1)[1:-1])]

        # DIV_Returns the number of states up to the square of N
        return s_digitize[0] + s_digitize[1]*DIV_N

As I mentioned briefly earlier, there are two state variables that should be evaluated when learning Q-learning: "Brave HP" and "Devil's HP". However, in Q-learning, the state must be represented as a single number. In other words, the image is as follows.

--State 1: (Brave HP, Demon King HP) = (0, 0) --State 2: (Brave HP, Demon King HP) = (0, 1) --State 3: (Brave HP, Demon King HP) = (0, 2)

You can convert it as above, but this will increase the status by the number of HP x HP. Like a certain national RPG who is not Dragon Quest, if the HP is 4 digits, the number of states exceeds 1 million and it is difficult (laugh). Therefore, let's divide the state according to the ratio of HP.

np.digitize(hero_hp, np.linspace(0, dq_battle.Game.HERO_MAX_HP, DIV_N + 1)[1:-1]

A quick explanation of this code With np.linspace (), divide from 0 to maximum HP into N, This is an image that returns the number of divisions the current HP belongs to with np.digitize ().

Since N = 10 this time,

--HP is less than 10% → 0 --HP is 10% or more, less than 20% → 1 --HP is 20% or more, less than 30% → 2

It will be converted like this. In addition

"Brave state (0-9) + Demon King state (0-9) * 10" By calculating, the number of states can be suppressed to 100 from 0 to 99.

If the state is "15", you can intuitively understand that the Demon King's HP is less than "10"% and the Hero's HP is less than "50"%.

Definition of measures

The policy is ε-greedy.

`q-learning.py`


    #Ε policy-Defined by greedy method
    def policy(self, s, actions):

        if np.random.random() < self.epsilon:

            #Random behavior with epsilon probability
            return np.random.randint(len(actions))

        else:

            #(If Q contains the state s and the Q value in that state is not 0)
            if s in self.Q and sum(self.Q[s]) != 0:

                #Act so that the Q value is maximized
                return np.argmax(self.Q[s])
            else:
                return np.random.randint(len(actions))

To briefly explain for beginners, it is basically a policy to decide the action so that the action value is maximized and adopt a random action with a probability of ε.

By giving a certain degree of randomness to the behavior, various behaviors are searched, so appropriate learning is possible without depending on the initial value of the Q value.

Implementation of Q-learning

By the way, we have all the variables and methods necessary for Q-learning.

The Q-learning algorithm is as follows.

Initialize $ Q (s, a) $.
Repeat the battle any number of times:
Initialize the game environment
Advance the turn to the end of the game:
Select action $ a $ according to strategy $ π $.
Perform action $ a $ and observe reward $ r $ and next state $ s'$.
Update $ Q (s, a) $ as follows.
$ Q (s, a) $ \ leftarrow $ Q (s, a) + α (r + γ * $ \ underset {a ′} {max} $$ Q (s ′, a ′) −Q (s, a)) $
$ s $ \ leftarrow $ s'$.

As mentioned at the beginning of the article, I will not explain the theory of Q-learning, so let's implement the above algorithm obediently.

`q-learning.py`


    #Q Learn
    def learn(self, env, actions, episode_count=1000, gamma=0.9, learning_rate=0.1):

        self.Q = defaultdict(lambda: [0] * len(actions))

        # episode_Battle for count
        for e in range(episode_count):

            #Reset the game environment
            tmp_s = env.reset()

            #Convert current state to number
            s = self.digitize_state(tmp_s)

            done = False

            #Repeat the action until the end of the game
            while not done:

                # ε-Choose an action according to the greedy policy
                a = self.policy(s, actions)

                #Advance the game for one turn and return the "state, reward, game end" at that time
                tmp_s, reward, done = env.step(a)

                #Convert state to number
                n_state = self.digitize_state(tmp_s)

                #Value gained by action a(gain) =Immediate reward+Time discount rate*Maximum Q value in the following states
                gain = reward + gamma * max(self.Q[n_state])

                #Q value currently being estimated (before learning)
                estimated = self.Q[s][a]

                #Update the Q value based on the current estimated value and the actual value when performing action a
                self.Q[s][a] += learning_rate * (gain - estimated)

                #Change the current state to the next state
                s = n_state

This completes the implementation of the game and Q-learning.

Run & learn games

Try to act randomly

Q Before learning, let's try what happens if you battle the hero's actions randomly.

Add the following code.

`q-learning.py`



class Agent:

    #(abridgement)

    #Test battle
    def test_run(self, env, actions, draw=True, episode_count=1000):

        turn_num = 0  #Number of defeat turns
        win_num = 0  #Number of wins

        # episode_Battle for count
        for e in range(episode_count):

            tmp_s = env.reset()
            s = self.digitize_state(tmp_s)

            done = False

            while not done:
                a = self.policy(s, actions)
                n_state, _, done = env.step(a)
                s = self.digitize_state(n_state)
                if draw:
                    env.draw()  #Draw battle log

            if env.maou.hp <= 0:
                win_num += 1
                turn_num += env.turn

        #Outputs average winning percentage and average number of defeated turns
        if not win_num == 0:
            print("Average win rate{:.2f}%".format(win_num*100/episode_count))
            print("Average number of defeat turns:{:.2f}".format(turn_num / win_num))
        else:
            print("Average win rate 0%")


if __name__ == "__main__":

    game = dq_battle.Game()
    agent = Agent()

    actions = dq_battle.Character.ACTIONS

    """Completely random battle"""
    agent.epsilon = 1.0
    agent.test_run(game, actions, episode_count=1000)

By setting ε = 1.0, we are making the action 100% completely random. Also, I tried to calculate the average winning percentage and the average number of defeated turns from the results of 1000 battles.

Below are the execution results.

$ python q-learning.py 
Average win rate 0.90%
Average number of defeat turns:64.89

The winning percentage is quite low ...

As you can see from the number of turns, it tends to be a long-term battle. The longer the battle, the more dying the hero will be, and as a result, it is expected that it will be difficult to win.

Q Try to battle after learning

Add the following code.

`q-learning.py`



if __name__ == "__main__":

    #(abridgement)

    """Q learn"""
    agent.epsilon = 0.2
    agent.learn(game, actions, episode_count=1000)

    """Test battle"""
    agent.epsilon = 0
    agent.test_run(game, actions, episode_count=1000)

Let's set ε = 0.2 and execute Q-learning.

After that, 1000 test battles will be held. By setting ε = 0 (0% random), the behavior is performed according to the learned behavior value.

Below, the execution results are shown by changing the number of battles to be learned.

** Execution result (number of learning battles: 50, number of test battles: 1000) **

$ python q-learning.py 
Average win rate 42.60%
Average number of defeat turns:56.19

** Execution result (500 learning battles, 1000 test battles) **

$ python q-learning.py 
Average win rate 100.00%
Average number of defeat turns:55.00

** Execution result (5000 learning battles, 1000 test battles) **

$ python q-learning.py 
Average win rate 100.00%
Average number of defeat turns:54.00

The winning percentage is 100%!

What is the Q value after learning?

Let's consider a little. Let's see the Q value of the learned result.

Below, the Q value when learning with 1000 battles is extracted for some states.

State 50:[-0.19, -0.1]
State 51:[-0.6623164987957537, -0.34788781183605283]
State 52:[-0.2711479211007827, 0.04936802595531123]
State 53:[-0.36097806076138395, 0.11066249745943924]
State 54:[-0.04065992616558749, 0.12416469852733954]
State 55:[0.17619052640036173, 0.09475948937059306]
State 56:[0.10659739434775867, 0.05112985778828942]
State 57:[0.1583472103200607, 0.016092008419030468]
State 58:[0.04964633744625512, 0.0020759614034820224]
State 59:[0.008345513895442138, 0.0]

As for how to see the state, the 10th place is the remaining HP of the Demon King, and the 1st place is the remaining HP of the hero. In other words, the above figure shows how the action value changes depending on the remaining HP of the hero when the remaining HP of the Demon King is about 50%.

From the figure, it can be seen that if the remaining HP (1st place) of the hero is low, the "Recovery" command is selected, and if the remaining HP is high, the "Attack" command is selected.

Let's also look at the Q value when the remaining HP of the hero is fixed.

State 07:[2.023809062133135, 0.009000000000000001]
State 17:[1.8092946131557912, 0.8310497919226313]
State 27:[0.8223927076749513, 0.5279685031058523]
State 37:[0.5565475393122992, 0.29257906153106145]
State 47:[0.25272081107828437, 0.26657637207739293]
State 57:[0.14094053800308323, 0.1533527340827757]
State 67:[0.0709128688771915, 0.07570873469406877]
State 77:[0.039059851207044236, 0.04408123679644829]
State 87:[0.023028972190011696, 0.02386492692407677]
State 97:[0.016992303227705185, 0.0075795064515745995]

The figure above shows how the action value changes depending on the remaining HP of the Demon King when the remaining HP of the hero is about 70%. You can see that the less HP the Demon King has, the more "attack" it has.

Finally

Since this article is mainly implemented, other considerations will be omitted. If you can afford it, it will be interesting to try learning by changing the hyperparameters or trying to make the battle rules more complicated.

Also, since the author is a beginner in reinforcement learning, please feel free to point out any mistakes. I am glad that my knowledge has been strengthened.

The source is on github. https://github.com/nanoseeing/DQ_Q-learning

Reference book

-[Machine Learning Startup Series Enhanced Learning with Python [Revised 2nd Edition] From Introduction to Practice](https://www.amazon.co.jp/%E6%A9%9F%E6%A2%B0%E5% AD% A6% E7% BF% 92% E3% 82% B9% E3% 82% BF% E3% 83% BC% E3% 83% 88% E3% 82% A2% E3% 83% 83% E3% 83% 97% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82% BA-Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E5% BC% B7% E5% 8C% 96% E5% AD% A6% E7% BF% 92-% E6% 94% B9% E8% A8% 82% E7% AC% AC2% E7% 89% 88-% E5% 85% A5% E9% 96% 80% E3% 81% 8B% E3% 82% 89% E5% AE% 9F% E8% B7% B5% E3% 81% BE% E3% 81% A7-% E4% B9% 85 % E4% BF% 9D / dp / 4065712519 / ref = sr_1_3? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords =% E5% BC% B7% E5 % 8C% 96% E5% AD% A6% E7% BF% 92 & qid = 1581149235 & s = books & sr = 1-3) -[Learn while making! Deep reinforcement learning ~ Practical programming with PyTorch ~](https://www.amazon.co.jp/%E3%81%A4%E3%81%8F%E3%82%8A%E3% 81% AA% E3% 81% 8C% E3% 82% 89% E5% AD% A6% E3% 81% B6-% E6% B7% B1% E5% B1% A4% E5% BC% B7% E5% 8C % 96% E5% AD% A6% E7% BF% 92-PyTorch% E3% 81% AB% E3% 82% 88% E3% 82% 8B% E5% AE% 9F% E8% B7% B5% E3% 83 % 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0-% E6% A0% AA% E5% BC% 8F% E4% BC% 9A% E7% A4% BE% E9% 9B% BB% E9% 80% 9A% E5% 9B% BD% E9% 9A% 9B% E6% 83% 85% E5% A0% B1% E3% 82% B5% E3% 83% BC% E3% 83% 93% E3% 82% B9-% E5% B0% 8F% E5% B7% 9D% E9% 9B% 84% E5% A4% AA% E9 % 83% 8E / dp / 4839965625 / ref = sr_1_4? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords =% E5% BC% B7% E5% 8C % 96% E5% AD% A6% E7% BF% 92 & qid = 1581149235 & s = books & sr = 1-4)

Try Q-learning in Dragon Quest-style battle [Introduction to Reinforcement Learning]

What to do?

People who want to read

Make a game

Implementation of character class

dq_battle.py

Overall picture of battle design (state transition)

dq_battle.py

Implementation of the battle body

dq_battle.py

dq_battle.py

dq_battle.py

dq_battle.py

Implement Q-learning

About agent class

q-learning.py

Convert state to number

q-learning.py

Definition of measures

q-learning.py

Implementation of Q-learning

q-learning.py

Run & learn games

Try to act randomly

q-learning.py

Q Try to battle after learning

q-learning.py

What is the Q value after learning?

Finally

Reference book

`dq_battle.py`

`dq_battle.py`

`dq_battle.py`

`dq_battle.py`

`dq_battle.py`

`dq_battle.py`

`q-learning.py`

`q-learning.py`

`q-learning.py`

`q-learning.py`

`q-learning.py`

`q-learning.py`