Fast, reactive AI with a short training time

  1. Introduction
  2. Algorithm
  3. Implementation
    1. Input Saving
    2. Game Snapshots
    3. State-Input Mapping
      1. Pass One: Group
      2. Pass Two: Weight
    4. Playback
  4. Post-Implementation Tweaks
    1. Attack Bias
    2. Misdirection
  5. Playable Demo
  6. Notes and Future Steps
    1. Memory
    2. Absolute Distance

Introduction

Ghost AI is a fast alternative to classical state-machine AI or complex neural net approaches. When I was working on the Andromeda boss enemy, I wanted something less predictable than a state machine but I didn’t have the time or resources to train a neural net for dozens of hours.

I was researching fighting game AI approaches and found this paper. It talks about the theory behind ghost-based AI and how it was implemented in an emulated build of Street Fighter 3.

I added it to Vapor Trails. This is a writeup of the algorithm’s core and doesn’t cover movement or attacking outside of it.

Algorithm

  1. Save the inputs I made at a moment in the game
  2. Store it with a snapshot of the game state at that moment
  3. Have the AI look at the inputs I made when the game’s in a certain state and choose one of them

Implementation

Input Saving

The input recording code here is the foundation of ghost AI. It’s a platform to record inputs and replay them in any order.

  1. When an input event happens, store its ID if it’s a button, or store its ID and value if it’s an axis.
  2. Feed the inputs back into the player from code.

I invested some of the money I made from the initial VT donations in the purchase of Rewired.

One thing to note here: Horizontal input in a saved replay is normalized to positive (towards the opponent) or negative (away). When the AI player is reading it, it will translate that into actual left-right values.

Game Snapshots

Each game state snapshot is encoded as an 32-bit int to speed up lookup times when the AI is running.

For the non-binary people: An int32 is an integer represented by 32 bits (values that can be 0 or 1) with a max value of 232. The number 2 as an int32 is 0000 0000 0000 0000 0000 0000 0000 0010.

For the binary people: Yes, these are unsigned ints in this example. You’re very smart.

The distribution of information to bits is represented like this. I consider it a mostly-complete picture of the game’s state with which to make meaningful decisions.

0:	 	whether I am attacking
1: 		whether opponent is attacking
2-3:	X range of opponent: close | medium | far
4-5: 	Y range of opponent: close | medium | far
6-7: 	projected X range in ¼ of a second
8-9: 	projected Y range in ¼ of a second
10: 	grounded
11: 	stunned
12: 	just got hit
13: 	opponent stunned
14: 	has an airjump
15: 	dash off cooldown
16: 	touching a wall
17-24: 	attack name hash
25: 	whether attack just landed
26-31:  currently unused space

The single-bit values are just true/false states. The two bit values like the range can store 4 possible values but I only use 3. Here’s the range conversion logic:

int DistanceToRange(float distance) {
    distance = Mathf.Abs(distance);
    if (distance < 1f) return 0b00;
    else if (distance < 4f) return 0b01;
    else return 0b10;
}

Together, these operations condense the game snapshot into a single number.

Notes:

  • Range isn’t a real number. Just close (threat), medium (threat after burst movement), and far (no threat).
  • Distance is absolute.
  • The attack name hash is a byte-size slice of the hashed attack animation (the name converted to an int32), if the player’s in an attack. Some attack animations have different timings for when the hitboxes come out, and my input behaviors reflect that.

State-Input Mapping

To construct a proper ghostfile, I run two passes on the data.

Pass One: Group

During the course of a fight, the game state will produce the same number-encoded snapshot several times. This is good and intended.

I keep a list of the different groups of inputs I made during each identical snapshot.

Pass Two: Weight

Some inputs I make more than others. A more intelligent AI will reflect me, making certain choices more often.

For every unique snapshot, I look through each group of inputs and combine the identical ones. They’re assigned a higher number if they occur more often, and then sorted from least to most frequent.

I end up with an ordered list of game snapshots.

A ghostfile is represented like this:

struct Ghostfile {
	Dictionary<int, List<WeightedFrameInput>> ghost;
}

where each int is a game state snapshot.

Playback

To mimic human reaction time of ~250ms, the AI only looks at the game state and virtually presses buttons 4 times a second to the ghost recorder’s 12.

If the AI finds a known corresponding entry in the ghostfile for the current game state, it checks its list of possible inputs and makes a weighted random choice from them.

If it doesn’t find one, it makes a weighted random choice from a random game state.

I try to keep a known:unknown state ratio of at least 2:1. This corresponds to a ~1 minute long ghostfile.

With all these steps complete, the AI ghost is able to move and attack more like a player.

Post-Implementation Tweaks

The ghostfiles and input recordings aren’t raw. In addition to the horizontal input normalization, I made several other adjustments.

Attack Bias

The state snapshots are saved 12 times a second. However, the moment I press the attack button down is only saved for one of those snapshots. Therefore, the attack input is diluted over the states and the ghost ends up being reluctant to attack.

I could track when the button was down at all instead of just saving its ID on the frame it was pressed, but I sometimes press buttons quickly. Saving snapshots less frequently would result in less dilution, but also less data for the ghost to read so I’d have to train it for longer.

Solution: When recording inputs, I check if the button pressed is an attack. If so, I artifically stretch it, applying the attack input to the next 6 game states. This replicates the effect of the ghost “wanting” to be in the attack for a while, like I am with an attack hitbox out.

Misdirection

The ghost will sometimes attack in the opposite direction of the player. This was due to the earlier space-saving decision I made to make the distance absolute. The AI makes no distinction between the projected X distance of the opponent being in front of or behind itself, only near/medium/far.

Solution 1: I added an extra bit of information to say whether the projected distance is in front or behind. This unfortunately made the known:unknown state ratio drop sharply to 1:3, so the AI behaved erratically even after longer training sessions.

Solution 2: When the AI is executing an input and one of the buttons is an attack input, override any horizontal movement to be towards the player.

Playable Demo

Notes and Future Steps

Memory

A major weakness of this AI model is that although it performs pretty well in a scrap, it doesn’t have memory of the last few game states like a recurrent neural net would. Instead, I rely on prediction (projected distance) and single-bit memory (attack landed). It doesn’t react well to mixups and variation. Any perceived baits and whiff punishes are just snap reactions to the current frame.

I could spend some of the remaining bit room on other “X just happened” things, like just jumped or just touched ground bits that will stay true for a few fractions of a second after they happen.

Absolute Distance

Early in development when I added the extra sign bit for the projected distance, the distance was cut into four sections instead of three. I was of the mindset that I allocated four bits to think about distance and I should use all of them.

Unfortunately, since that made the slices of space more narrowly delineated, the number of potential game states exploded and it meant I needed to train the AI on my inputs for much longer to get the known:unknown ratio back to what I thought was acceptable.

Now that I’ve cut the possible distance values down to 3 instead of 4, I could revisit this and make the missing fourth value in the projected direction just be “behind”. This would also let me revert the fix to misdirection I mentioned earlier.