[Competition Pro] Let Python play a stone-picking game with no winning method

## What is a stone-picking game? A stone-picking game is a game in which some stones are placed on the board and the player decides whether to win or lose after picking up the stones. There are various types of stone-picking games, but in this article,

--Two players ――The two take stones one by one alternately -Each stone has a score written on it --Compete for the total score after removing the stones

I will think about something like that. The difficulty with the stone-picking game is that both players have their own best strategy. If you just take the stones by yourself, you can always make the board you want (the best for you), but if you play the game with two people, the opponent will also play the best, so on the contrary it is the worst board for you. It will be. Usually, these games have the best strategy and winning method to be taken on each board. However, on a very complicated board, there are times when you do not know the winning method, and there are times when there is no winning method in the first place. In such a case, the purpose of this article is to derive the victory or defeat with the power of the computer. Here, we will consider how to calculate the win / loss and score in some game patterns.

Game 1: Game taken from both ends (1)

The stones are lined up in a row, and each stone has a score written on it. The two players take turns taking stones, but only ** of the stones at both ends of the row **. When the stones are gone, it's over. Which will win when the two players always take the best strategy? Here, the scores of the stones lined up in a row are represented by an array. In the example of the image above, the array is

stones = [4,5,3,1,2]

It looks like. Of the two players, the first attack is A and the second attack is B. First, A has two options: take 4 stones or take 2 stones. The score will be higher if you take 4 stones, but then your opponent will take 5 stones. On the contrary, if you take 2 stones, the score will be small, but the next opponent will be able to get 1 stone. Also, as an extreme example, in the case of a stone row such as $ stones = [1,100,2,3] $, whether or not you can get 100 stones will determine the outcome.

In order to solve this problem, should we search all the cases where the left and right stones are taken? You may be able to find the maximum profit you can get, but the game is not established because the opponent does not always have the optimal strategy. With that in mind, it's hard to understand what the optimal strategy is in the first place.

Solution 1

Now, let's assume that there is some optimal strategy and put it as $ strategy $. Suppose that applying $ strategy $ to an array will return the score that the first player will get.

max\_score = strategy(stones[:])

What's going on inside $ strategy $? It seems to be an operation that compares the score when taking the stone on the right and the score when taking the stone on the left, and selects the one with the higher score.

max\_score = \max \left\{ 
\begin{array}{l}
stones[0] + strategy\_inv(stones[1:]) \\
stones[-1] + strategy\_inv(stones[:-1])
\end{array} \right.

Here, $ strategy \ _inv $ is the score that A is obtained when player B plays optimally against the remaining stones. Since B maximizes his score and minimizes A's score, A has the worst strategy. Specifically, $ strategy \ _inv $ is like selecting a stone with a lower A score when the left and right stones are taken from the remaining stone rows.

max\_score = \max \left\{ 
\begin{array}{l}
stones[0] + \min \left\{ 
\begin{array}{l}
stones[1] + strategy(stones[2:]) \\
stones[-1] + strategy(stones[1:-1])
\end{array} \right. \\
stones[-1] + \min \left\{ 
\begin{array}{l}
stones[0] + strategy(stones[1:-1]) \\
stones[-2] + strategy(stones[:-2])
\end{array} \right.
\end{array} \right.

$ \ Min $ and $ \ max $ came out and it became complicated, but $ strategy $ appeared again in the formula. Since it is A that takes the stone before B, this $ strategy $ is like maximizing A's score again. The important thing is that this is in the form of a recurrence formula. In other words, if you define this $ strategy $ function and perform calculations like nesting, you can perform calculations without knowing the specific optimal strategy.

def stone_game(stones: List[int]) -> int:
  n = len(stones)

  def strategy(i, j):
    if i > j:
      return 0
            
    if i == j:
      return stones[i]
            
    left = min(stones[i+1] + strategy(i+2, j), stones[j] + strategy(i+1, j-1))  #When the player picks up the stone on the left
    right = min(stones[i] + strategy(i+1, j-1), stones[j-1] + strategy(i, j-2))  #When the player picks up the right stone
            
    return max(stones[i] + left, stones[j] + right)

  return strategy(0, n - 1)  #Total score obtained by A

In the code, instead of inputting a subarray, the indexes $ i and j $ of the start point and end point of the array are used as arguments. The recurrence formula requires an initial value, but if the number of remaining stones is 0 or 1, the optimal strategy is clear, so it is defined separately. Finally, you can tell the victory or defeat by comparing the scores of A and B.

Solution 2

Solution 1 allowed us to solve the problem without having to think about the optimal strategy each time, but the drawback is that the formula is a little complicated. Players A and B are the same in that they have the best strategy, so can't we somehow unify it?

Let's say that the return value of $ strategy $ is not the total score of A, but the maximum score difference between A and B.

score\_dif = strategy(stones[:])

Then, the formula of Solution 1 can be changed as follows.

score\_dif = \max \left\{ 
\begin{array}{l}
stones[0] - strategy(stones[1:]) \\
stones[-1] - strategy(stones[:-1])
\end{array} \right.

Since it is a score difference, the part that was added will be subtracted. And as a result, $ strategy $ appears here instead of $ strategy \ _ inv $. This makes the recurrence formula much easier.

def stone_game(stones: List[int]) -> int:
  n = len(stones)

  def strategy(i, j):
    if i > j:
      return 0
            
    return max(stones[i] - strategy(i+1, j), stones[j] - strategy(i, j-1))

  return strategy(0, n - 1)  #Score difference between A and B

If the input array, that is, $ i and j $, is determined for the $ strategy $ function, the return value will be fixed to one. It is useless to calculate this every time because it is likely that the same subarray will be reached from different routes during the calculation process. Therefore, memoization recursion is performed here. Memoization recursion is like recording the inputs and outputs of a function you've already seen, avoiding having to calculate the same function over and over again, which can save you a lot of time.

Memoization recursion can be done by holding the value in the dictionary $ dict $, but in python it is possible to make it a memoized function by prefixing the function with $ @ functools.lru \ _ cache () $. I can do it.

def stone_game(stones: List[int]) -> int:
  n = len(stones)

  @functools.lru_cache(None)  #Memoization recursion
  def strategy(i, j):
    if i > j:
      return 0
            
    return max(stones[i] - strategy(i+1, j), stones[j] - strategy(i, j-1))

  return strategy(0, n - 1)

As a result, $ strategy $ is calculated only once for each combination of $ i and j $. Since the number of combinations is $ O (n ^ 2) $ and the amount of calculation in $ strategy $ is $ O (1) $, the total amount of time calculation is $ O (n ^ 2) $. The amount of spatial complexity is also $ O (n ^ 2) $ by memoization.

Seeking victory or defeat

When finding the victory or defeat in Solution 2, see whether the final $ score \ _ dif $ is positive or negative. Since it is a score difference, if $ score \ _ dif $ is larger than 0, A wins first, and if it is less than 0, B wins. If it is 0, it is a draw.

def stone_game(stones: List[int]) -> str:
  def strategy(i, j):
    #The contents are the same

  score = strategy(0, n - 1)
  
  if score > 0:
    return "A win"
  elif score < 0:
    return "B win"
  else:
    return "draw"

Find the total score

On the other hand, when calculating the total score of A, use $ score \ _ dif $.

score\_A = \frac{\sum_{i=1}^n stones[i] + score\_dif}{2}

Can be calculated as.

def stone_game(stones: List[int]) -> int:
  def strategy(i, j):
    #The contents are the same

  return (sum(stones) + strategy(0, n - 1)) // 2

(Bonus) If you win

By the way, although it has nothing to do with coding, there is a simple strategy to win first if ** the number of stones is even **.

Paint an even number of stones alternately in two colors as shown below. If you add the red and blue stones, you can see that the red one has a larger sum. Here, when A on the play takes the red stone (4) on the right side, B can only take the blue stone (5 or 2) next. Whichever stone B takes, A can take red stones and B can only take blue stones. If A takes all the red stones in this way, he can be sure to win (at least not lose).

Game 2: Game taken from both ends (2)

You can change Game 1 a little and think of a game that ** "When you take a stone, you get the sum of the points of the remaining stones instead of the stone you took" **. What happens in this case?

Solution 1

The score you get will only change from "the score of the stones you got" to "the sum of the scores of the remaining stones", so you only have to change that part from the Game 1 function. For example, the score when you take the stone on the left is

score = stones[i] - strategy(i+1, j)

What was

score = sum(stones[i+1:j]) - strategy(i+1, j)

It becomes. It's easy.

def stone_game(stones: List[int]) -> int:
  n = len(stones)
        
  @functools.lru_cache(None)
  def strategy(i, j):
    if i >= j:
      return 0
            
    left = sum(stones[i+1:j]) - strategy(i+1, j)
    right = sum(stones[i:j-1]) - strategy(i, j-1)
    return max(left, right)
        
  return strategy(0, n-1)

Solution 2

However, Solution 1 has a fatal drawback. The part $ sum (stones [i: j]) $ for which the sum of the subarrays is calculated takes $ O (n) $, so the total amount of calculation becomes $ O (n ^ 2) $. is. To prevent this, calculate the cumulative sum. Cumulative sum can be calculated by $ O (1) $ by holding the sum of the values ​​from the beginning as an array in advance. In python, the $ itertools.accumulate () $ function will automatically find the cumulative sum. Let's call this $ prefix \ _sum $, and the improved code looks like this:

def stone_game(stones: List[int]) -> int:
  n = len(stones)
  prefix_sum = [0] + list(itertools.accumulate(stones))  #Cumulative sum
        
  @functools.lru_cache(None)
  def strategy(i, j):
    if i >= j:
      return 0
            
    left = (prefix_sum[j+1] - prefix_sum[i+1]) - strategy(i+1, j)  #If you take the stone on the left
    right = (prefix_sum[j] - prefix_sum[i]) - strategy(i, j-1)  #If you take the right stone
    return max(left, right)
        
  return strategy(0, n-1)

The amount of calculation is now $ O (n) $.

Solution 3

There is no better solution than memoization recursion, but I'll show you another way to write it. Instead of calling the function nestedly by memoization recursion, consider storing the obtained result in an array and reading it as appropriate. In essence, it is the same as memoization recursion, but this is a method closer to the image of so-called dynamic programming.

Here, we prepare a two-dimensional array $ dp $ with vertical and horizontal lengths of $ n $. $ dp [i] [j] $ will contain the output of $ strategy (i, j) $ in Solution 2. In the code, $ strategy (i, j) $ of solution 2 is changed to $ dp [i] [j] $. However, you need to be careful about the order in which $ dp $ is calculated. When calculating $ dp [i] [j] $, $ dp [a] [b] $ for the indexes $ a, b ~ (i <= a <= b <= j) $ inside it. Must be sought. Therefore, with $ dif $ as the difference between $ i $ and $ j $, $ dp [i] [j] $ is calculated in ascending order of $ dif $ (this is the so-called "interval DP" method. is).

def stone_game(stones: List[int]) -> int:
  n = len(stones)
  prefix_sum = [0] + list(itertools.accumulate(stones))
  dp = [[0]*n for _ in range(n)]  #Array to store values
        
  for dif in range(1, n):  #Width of the desired array, increasing in order
    for i in range(n-dif):  #Starting point of the array
      j = i+dif  #End point of array
      left = (prefix_sum[j+1] - prefix_sum[i+1]) - dp[i+1][j]
      right = (prefix_sum[j] - prefix_sum[i]) - dp[i][j-1]
      dp[i][j] = max(left, right)
                
  return dp[0][n-1]

The theoretical amount of calculation for solutions 2 and 3 is the same, but in reality, solution 3 is usually faster. If solution 2 is too late, rewriting it in 3 will work.

Game 1 solution using interval DP

By the way, Game 1 can also be solved like Solution 3 by using the interval DP.

def stone_game(stones: List[int]) -> int:
  n = len(stones)
  dp = [[0]*n for _ in range(n)]

  for dif in range(1, n):
    for i in range(n-dif):
      j = i + dif
      dp[i][j] = max(stones[i] - dp[i+1][j], stones[j] - dp[i][j-1])

  return dp[0][n-1]

Game 3: Games taken in order from the end (1)

Next, consider setting to take stones in order ** from only one end, instead of taking stones from both ends. However, since there is no gameplay if you take stones one by one, ** you can select the number of stones you can take at one time from 1, 2 or 3 **. Again, the one with the highest total score wins when the stones are completely removed.

Solution 1

From the explanation of Game 1 and Game 2. It was found that the problem can be solved by ** constructing a $ strategy $ function that recursively expresses the optimal strategy by inputting a partial array and outputting the score difference. The strategy is also applied to this game.

In this case, the $ strategy $ function is like calculating the maximum score difference obtained when the number of stones to be taken is 1 to 3, and returning the maximum value of each.

score\_dif = \max \left\{ 
\begin{array}{l}
stones[0] - strategy(stones[1:]) \\
stones[0] + stones[1] - strategy(stones[2:]) \\
stones[0] + stones[1] + stones[2] - strategy(stones[3:])
\end{array} \right.

Just write this in memoization recursion as in Game 1 while paying attention to the initial values ​​and boundary conditions.

def stone_game(stones: List[int]) -> int:
  n = len(stones)
        
  @functools.lru_cache(None)
  def strategy(i):
    if i >= n-1:
      return sum(stones[i:])
            
    return max(sum(stones[i:i + k]) - strategy(i + k) for k in (1, 2, 3))  #Summed up using sum and for statements
  
  return strategy(0)

Since the function argument is only one index this time, the time complexity and spatial complexity can be $ O (n) $.

Solution 2

As we saw in Game 2, we will also look at the solution method by storing the calculation result in an array. Here, a one-dimensional array $ dp $ with a length of $ n $ is prepared, and $ dp [i] $ contains the output of $ strategy (i) $. This time it is simpler than Game 2 because it has only one index, but you still need to be careful about the order of calculation. When calculating $ dp [i] $, $ dp [j] $ with a larger index $ i <j $ must be found. Therefore, $ dp [i] $ is calculated from the one with the larger index.

def stone_game(stones: List[int]) -> int:
  n = len(stones)
  dp = [0] * (n+3)  #Prepare a long array to prevent index errors
  for i in reversed(range(n)):  # i = n-1 ~Loop in order of 0
    dp[i] = max(sum(stones[i:i + k]) - dp[i + k] for k in (1, 2, 3))

  return dp[0]

The theoretical amount of calculation is the same here as well, but I think that the calculation time is actually faster with Solution 2.

Game 4: Games taken in order from the end (2)

Let's change the game rule a little and say "** Player can take no more than twice the maximum number of stones taken **". In other words, when a maximum of $ M $ stones have been taken so far, the next player can take $ X $ stones that satisfy $ 1 \ leq X \ leq 2M $, which is within this range. You can choose your favorite $ X $. For the next player, $ M = max (X, M) $. At first, we will start from $ M = 1 $.

If you understand the contents so far, you can imagine the solution even if you look at such a complicated problem. The state of $ strategy $ is determined by the starting index and the maximum number of stones that can be taken $ 2M $. The code looks like this:

def stone_game(stones: List[int]) -> int:
  n = len(stones)
            
  @functools.lru_cache(None)
  def strategy(i, M):
    if i + 2 * M >= n:
      return sum(stones[i:])
            
    return max(sum(stones[i:i + X]) - strategy(i + X, max(M, X)) for X in range(1, 2 * M + 1))
        
  return strategy(0, 1)  #X at first= 1, M = 1

The input and update expressions of the function have changed, but what they are doing is almost the same as Game 3.

Game 5: Game when the value of the stone is different

Finally, consider a game with different rules. There are two major rule changes:

--You can take any of the stones in the field. -The value of the stone is different between the two players

For example, when there are 5 stones in the field, the values ​​of the stones seen from A are 5,2,3,1,4 in order, and the values ​​of the stones seen from B are 1,5,3,4,2 in order. So, even with the same stone, the scores obtained by both are different. It is assumed that both players can see the score distribution of the opponent. For example, A gets the highest score if he gets the first stone (5 points), but then B should get the second stone and get 5 points, so the second one first. You may want to save the stones (2 points).

Solution 1

However, we already know that the problem can be solved by ** constructing a $ strategy $ function that recursively expresses the optimal strategy by outputting the score difference **. This time, we can take stones from any position in the row, so we will introduce a $ state $ array that represents the existing stones and the location of the stones taken. If the $ i $ th stone has already been taken, $ state [i] = False $, and if it remains, $ True $. Enter the player name and $ state $, find the score when each of the remaining stones is taken, and write a function that returns the maximum score.

def stone_game(values_A: List[int], values_B: List[int]) -> int:
  n = len(values_A)
  state = [True]*n
            
  @lru_cache(None)
  def strategy(player, state):
    if max(state) == 0:  #All False, that is, return 0 if all stones have been taken
      return 0
            
    max_value = float("-inf")
    state = list(state)
            
    for i in range(n):
      if state[i]:  #If stone i remains
        state[i] = False  #I decided to take stone i
                    
        if player == "A":  #Cases for players A and B
          max_value = max(max_value, values_A[i] - strategy("B", tuple(state)))
        else:
          max_value = max(max_value, values_B[i] - strategy("A", tuple(state)))
        
        state[i] = True  #Return stone i to the untaken state
                        
      return max_value
            
    return strategy("A", tuple(state))  #First attack is A, state is all True

In python, arrays are not hashable and cannot be memoized, so when inputting $ state $ into a function, it is converted to tuple. Now you can find and calculate the score you will get for each $ state $.

But is this really good? Until now, stones were taken only from both ends of the array, but now it is possible to take stones from anywhere, and the variation of $ state $ is enormous. Specifically, since each $ n $ point can take $ True $ or $ False $, the order of the number of combinations will be $ O (2 ^ n) $. With this, you can easily calculate up to $ n = 30 $ at most.

Solution 2

In fact, this game is much easier to calculate. First, let's simplify the problem and consider the case where the stones have the same value. In this case, it is best to start with the stone with the highest score without recursion. Intuitively, this is due to the following two reasons.

――You can increase your score by taking a stone with a large score. --If you take a stone with a high score, your opponent cannot get that stone and you can reduce your opponent's score.

This is true even if the values ​​of both stones are different. The score difference between the opponent and the opponent when you take one stone can be thought of as (the score you got) + (the score you can't get because the stone is gone). That is, the sum of the stone score seen by oneself and the score seen by the opponent. In order to increase the score difference, it is best to add the values ​​for both stones for each stone and take the stones with the highest value in order.

def stone_game(values_A: List[int], values_B: List[int]) -> int:
  sorted_sum_value = sorted(zip(values_A, values_B), key=sum, reverse=True)  #The sum of the values ​​of each stone is arranged in descending order.
  sum_A = sum(a for a, b in sorted_sum_value[::2])  #Take every other stone from the top
  sum_B = sum(b for a, b in sorted_sum_value[1::2])  #Take every other stone from the second stone
        
  return sum_A - sum_B

In this case, the amount of calculation is $ O (n) $. There is a better way to deal with this problem than applying the joseki method, which gives us a sense of the depth of the stone-picking game.

Summary

So far, there are several stones with different scores, and when two people compete for the stones, we have considered the question of which one can get more points and how many points can be scored. The point is

--Represent each state in a simple form (index, etc.) —— Input each state, output the score (difference) obtained, and recursively write a function that takes the optimum action. --Reduce the amount of calculation by memoization and DP

was. If someone else applies for a stone-picking game in the future, be sure to check if you can win.

** Reference (from Letcode) ** Stone Game Stone Game VII Stone Game III Stone Game II Stone Game VI

Recommended Posts

[Competition Pro] Let Python play a stone-picking game with no winning method
Let's make a shiritori game with Python
I made a roguelike game with Python
[Python] Make a game with Pyxel-Use an editor-
I want to make a game with Python
[Python] Make a simple maze game with Pyxel
I made a bin picking game with Python
I made a Christmas tree lighting game with Python
[Let's play with Python] Make a household account book
Let's make a simple game with Python 3 and iPhone
[Piyopiyokai # 1] Let's play with Lambda: Creating a Python script
[Python] Road to a snake charmer (5) Play with Matplotlib
Kernel Method with Python
I made a simple typing game with tkinter in Python
A collection of competitive pro techniques to solve with Python
I made a puzzle game (like) with Tkinter in Python
[Python] Make a simple maze game with Pyxel-Make enemies appear-
[Python] The first step to making a game with Pyxel
Finding pi with a three-line function [Python / Monte Carlo method]
I tried a stochastic simulation of a bingo game with Python
A story about a python beginner stuck with No module named'http.server'
Python3 standard input (competition pro)
[Python] Play with Discord's Webhook.
Make a fortune with Python
Save the result of the life game as a gif with python
Solve the Python knapsack problem with a branch and bound method
I made a poker game server chat-holdem using websocket with python
[ROS2] How to play a bag file with python format launch