Professional StarCraft players such as TLO and MaNa are able to issue hundreds of actions per minute (APM) on average. This is far fewer than the majority of existing bots, which control each unit independently and consistently maintain thousands or even tens of thousands of APMs.
In its games against TLO and MaNa, AlphaStar had an average APM of around 280, significantly lower than the professional players, although its actions may be more precise. This lower APM is, in part, because AlphaStar starts its training using replays and thus mimics the way humans play the game. Additionally, AlphaStar reacts with a delay between observation and action of 350ms on average.
THE DISTRIBUTION OF ALPHASTAR’S APMS IN ITS MATCHES AGAINST MANA AND TLO AND THE TOTAL DELAY BETWEEN OBSERVATIONS AND ACTIONS. CLARIFICATION (29/01/19): TLO’S APM APPEARS HIGHER THAN BOTH ALPHASTAR AND MANA BECAUSE OF HIS USE OF RAPID-FIRE HOT-KEYS AND USE OF THE “REMOVE AND ADD TO CONTROL GROUP” KEY BINDINGS. ALSO NOTE THAT ALPHASTAR'S EFFECTIVE APM BURSTS ARE SOMETIMES HIGHER THAN BOTH PLAYERS.
During the matches against TLO and MaNa, AlphaStar interacted with the StarCraft game engine directly via its raw interface, meaning that it could observe the attributes of its own and its opponent’s visible units on the map directly, without having to move the camera - effectively playing with a zoomed out view of the game. In contrast, human players must explicitly manage an "economy of attention" to decide where to focus the camera. However, anaysis of AlphaStar’s games suggests that it manages an implicit focus of attention. On average, agents “switched context” about 30 times per minute, similar to MaNa or TLO.
Additionally, and subsequent to the matches, we developed a second version of AlphaStar. Like human players, this version of AlphaStar chooses when and where to move the camera, its perception is restricted to on-screen information, and action locations are restricted to its viewable region.
PERFORMANCE OF ALPHASTAR USING THE RAW INTERFACE AND THE CAMERA INTERFACE, SHOWING THE NEWLY TRAINED CAMERA AGENT RAPIDLY CATCHING UP WITH AND ALMOST EQUALLING THE PERFORMANCE OF THE AGENT USING THE RAW INTERFACE.
We trained two new agents, one using the raw interface and one that must learn to control the camera, against the AlphaStar league. Each agent was initially trained by supervised learning from human data followed by the reinforcement learning procedure outlined above. The version of AlphaStar using the camera interface was almost as strong as the raw interface, exceeding 7000 MMR on our internal leaderboard. In an exhibition match, MaNa defeated a prototype version of AlphaStar using the camera interface, that was trained for just 7 days. We hope to evaluate a fully trained instance of the camera interface in the near future.
These results suggest that AlphaStar’s success against MaNa and TLO was in fact due to superior macro and micro-strategic decision-making, rather than superior click-rate, faster reaction times, or the raw interface.