Humans play StarCraft under physical constraints that limit their reaction time and the rate of their actions. The game was designed with those limitations in mind, and removing those constraints changes the nature of the game. We therefore chose to impose constraints upon AlphaStar: it suffers from delays due to network latency and computation time; and its actions per minute (APM) are limited, with peak statistics substantially lower than those of humans. AlphaStar’s play with this interface and these constraints was approved by a professional player (see ‘Professional player statement’ in Methods).
Camera view. Humans play StarCraft through a screen that displays only part of the map along with a high-level view of the entire map (to avoid information overload, for example). The agent interacts with the game through a similar camera-like interface, which naturally imposes an economy of attention, so that the agent chooses which area it fully sees and interacts with. The agent can move the camera as an action.
Opponent units outside the camera have certain information hidden, and the agent can only target within the camera for certain actions (for example, building structures). AlphaStar can target locations more accurately than humans outside the camera, although less accurately within it because target locations (selected on a 256 × 256 grid) are treated the same inside and outside the camera. Agents can also select sets of units anywhere, which humans can do less flexibly using control groups. In practice, the agent does not seem to exploit these extra capabilities (see the Professional Player Statement, below), because of the human prior. Ablation data in Fig. 3h shows that using this camera view reduces performance.
APM limits. Humans are physically limited in the number of actions per minute (APM) they can execute. Our agent has a monitoring layer that enforces APM limitations. This introduces an action economy that requires actions to be prioritized. Agents are limited to executing at most 22 non-duplicate actions per 5-s window. Converting between actions and the APM measured by the game is non-trivial, and agent actions are hard to compare with human actions (computers can precisely execute different actions from step to step). See Fig. 2c and Extended Data Fig. 1 for APM details.
Delays. Humans are limited in how quickly they react to new information; AlphaStar has two sources of delays. First, in real-time evaluation (not training), AlphaStar has a delay of about 110 ms between when a frame is observed and when an action is executed, owing to latency, observation processing, and inference. Second, because agents decide ahead of time when to observe next (on average 370 ms, but possibly multiple seconds), they may react late to unexpected situations. The distribution of these delays is shown in Extended Data Fig. 2.
The following quote describes our interface and limitations from StarCraft II professional player Dario ‘TLO’ Wünsch (who is part of the team and an author of this paper).
“The limitations that have been put in place for AlphaStar now mean that it feels very different from the initial show match in January. While AlphaStar has excellent and precise control it doesn’t feel superhuman certainly not on a level that a human couldn’t theoretically achieve. It is better in some aspects than humans and then also worse in others, but of course there are going to be unavoidable differences between AlphaStar and human players.
I’ve had the pleasure of providing consultation to the AlphaStar team to help ensure that DeepMind’s system does not have any unfair advantages over human players. Overall, it feels very fair, like it is playing a ‘real’ game of StarCraft and doesn’t completely throw the balance off by having unrealistic capabilities. Now that it has limited camera view, when I multi-task it doesn’t always catch everything at the same time, so that aspect also feels very fair and more human-like.”