This is the introduction of the Puyo Puyo AI that I'm currently developing, known as Ama AI. I've been developing this AI for more than a year, and I want to write down the knowledge that I've learnt while making it. Maybe I will even write a Puyo Puyo AI tutorial in the future. There are a lot of things I want to talk about, but the amount of information is too big to cover, so I will keep it short in this post.
1. What Is Ama?
It is a Puyo Puyo AI that I've been making since mid 2022. Originally, it was just a side project that I made to take a break from my main project at the time, which was a Tetris AI called Lemon Tea. I was getting a little tired of Tetris at the moment, so I decided to have a look at Puyo Puyo for a change of pace. Ever since, I've become obsessed with Puyo Puyo.
Currently it can run on Puyo Puyo Champions Steam version, and it has gotten quite strong lately. The AI is private to avoid abuse, though, maybe in the future I will release it publicly so that everyone can play against it to practice (with cheat prevention, of course).
This is a video demonstrating how Ama works. Here, Ama (2P) is fighting against くまちょむ aka Kumachom (1P).
2. Algorithm.
The AI is given a Puyo field and 2 pairs of Puyo to search. We wish to save time by searching the next position while the current pair are being maneuvered, which is why the AI is only given 2 pairs rather than all 3 pairs that are accessible.
2.1. The Search Function.
The search function is just a simple depth-first search (DFS) algorithm. We search for all positions up to depth of 2 and evaluate all those positions with a evaluation function. Then, we choose the move that leads to the position with the highest evaluation score.
For example, in the above graph, we will choose move #2 because it leads to position #4, which has the highest evaluation score of 1500.
2.2. Evaluation Function.
Given that the above algorithm is quite an greedy method, the evaluation function's quality is crucial.
There are many heuristics that are very important for a Puyo AI, such as: Field shape, Puyo connection, etc. But in my experience, the ability to determine the potential chain of a position is the most important part of the evaluation function.
We won't go into detail about Ama's evaluation algorithm here because doing so would take a long time. Rather than that, I will discuss it in more detail in a future post.
Here is the list of Ama's current evaluation parameters:
- Chain.
- Ignition y.
- Number of key puyo needed to ignite.
- Chain extendibility.
- 2 dub.
- 3 dub.
- Field shape.
- U shape.
- Chain form (GTR, Meri, etc.)
- Link 2.
- Link 3.
- Number of garbage puyo.
- Time wasted.
2.3. When To Ignite?
To determine when to ignite a chain, we simply check if there is a chain that has higher score than a lower bound. If there is, choose the move that lead to the chain with the highest score, else, continue building the chain. For Ama currently, the lower bound is 78000.
For instance, move #1 in the graph above produces a 14 chain with a score of 85200, which is greater than 78000, therefore we decide to go with that move.
2.4. Why DFS?
Beam Search appears to be the preferred algorithm for Puyo AI now. There exists a great research about this method (Puyo AI new search method by Takapt.)
One of big advantages of Beam Search is that it can build large chain without relying heavily on the evaluation function. Additionally, Beam Search can search deeper than 2 depths if the entire Puyo sequence is known or if the Monte Carlo Method is used.
However, Beam Search's biggest disadvantage is that it's very slow (average 100ms to 300ms).
In the game of Puyo Puyo, the skill of precisely reading an opponent and timing one's attack or defense is more important than building a large chain. For instance, we should build high to the left if the opponent is in All Clear mode, or we should build quickly and react appropriately if the opponent is launching a quick attack.
That is why, we need an algorithm that is both fast and flexible, and DFS fits perfectly.
DFS is very fast (about 8ms), and in the time frame of 300ms, I can search as many time as I want. I can search for the move that construct the highest chain, move that can build in an All Clear battle, and move that leads to a suitable counterattack against the enemy, all in just about 60ms.
In my opinion, the advantage of DFS over Beam Search is that it's more suitable for Puyo Puyo battle.
Perhaps when processing power increases in the future, the benefit of DFS will become less significant, but for the time being, I sincerely believe that DFS is the best option.
3. Automatic Tuning.
Currently, Ama is using a tuning method that is quite similar to SPSA and Stockfish's Tuning Method. I've tried Genetic Algorithm, but it didn't seem to work very well.
3.1. Explanation.
The idea behind the tuning method is quite simple. Let just say we have a set of parameters call w0 that we want to tune.
- Step 1: We randomly choose some parameters from w0 that we want to tune. (You have the freedom to select how many as you like. In my experience, choosing between 1 and 4 usually yields better results more quickly.) For example, here we choose 3 parameters: x, y, z from w0:
x = -100
y = 50
z = 150
- Step 2: For each parameter, we generate a random delta value. Be careful here, the numbers should not be overly large or little. Let say we have:
x_delta = +20
y_delta = -15
z_delta = +32
- Step 3: We add and subtract these delta values to the original parameters to get 2 new sets of parameters w+ and w-:
w0 (x = -100, y = 50, z = 150)
w+ (x = -80, y = 35, z = 182)
w- (x = -120, y = 65, z = 118)
- Step 4: We then match AI(w0) vs. AI(w+) vs. AI(w-) for a total of N times to determine which set of parameters is superior. To yield more confident results, N should be bigger than 1. I currently choose N = 100.
- Step 5: If w0 is still the best, then we try again. Else, if w+ or w- is better, we should slightly adjust w0 toward them. For example, let say that w+ is better, then w0 += w_delta * k. Here, k is a value between 0 and 1.
- Step 6: Repeat from step 1 until we have a satisfactory set of parameters.
3.2. Pros and Cons.
This strategy is extremely versatile. It can be used to improve the AI's capacity to create large chains or to fine-tune its battle heuristics.
However, even though this method increases strength very fast in the beginning; after a while some variables reach the optimal values and start walking randomly, which leads to the strength starts to decrease. So we should be careful about the time we should stop.
5. Implementation
The source code for Ama AI is here.
The code is written in c++. I apologize for my poorly written code, as I'm not a good programmer.
6. Conclusion
This is a introduction post for Puyo Puyo AI. I will write more detail posts in the future.
Puyo Puyo AI is fun. If you have any question, please reach out to me on Twitter.
Comments
Post a Comment