French Defence Chess Opening: Leela Chess 445 vs Equinox (CCRL Rating: 3181 )

French Defence Chess Opening: Leela Chess 445 vs Equinox (CCRL Rating: 3181 )

Replayable game with nicely indented variations:

AlphaZero is a computer program developed by the Alphabet-owned AI research company DeepMind, which uses an approach similar to AlphaGo Zero’s to master not just Go, but also chess and shogi. On December 5, 2017 the DeepMind team released a preprint introducing AlphaZero, which, within 24 hours, achieved a superhuman level of play in these three games by defeating world-champion programs, Stockfish, elmo, and the 3-day version of AlphaGo Zero, in each case making use of custom tensor processing units (TPUs) that the Google programs were optimized to make use of.[1] AlphaZero was trained solely via “self-play” using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After just four hours of training, DeepMind estimated AlphaZero was playing at a higher Elo rating than Stockfish; after 9 hours of training, the algorithm decisively defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws).[1][2][3] The trained algorithm played on a single machine with four TPUs.

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include:[1]

AZ has hard-coded rules for setting search hyperparameters.
The neural network is now updated continually.
Go (unlike Chess) is symmetric under certain reflections and rotations; AlphaGo Zero was programmed to take advantage of these symmetries. AlphaZero is not.
Chess can end in a draw unlike Go; therefore AlphaZero can take into account the possibility of a drawn game.
Comparing Monte Carlo tree search searches, AlphaZero searches just 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variation.[1]

AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks. In parallel, the in-training AlphaZero was periodically matched against its benchmark (Stockfish, elmo, or AlphaGo Zero) in brief one-second-per-move games to determine how well the training was progressing. DeepMind judged that AlphaZero’s performance exceeded the benchmark around four hours of training for Stockfish, two hours for elmo, and eight hours for AlphaGo Zero.[1]

In AlphaZero’s chess tournament against Stockfish 8 (2016 TCEC world champion), each program was given one minute’s worth of thinking time per move. Stockfish was allocated 64 threads and a hash size of 1 GB,[1] a setting that Stockfish’s Tord Romstad later criticized as suboptimal.[4][note 1] AlphaZero was trained on chess for a total of nine hours before the tournament. During the tournament, AlphaZero ran on a single machine with four application-specific TPUs. In 100 games from the normal start position, AlphaZero won 25 games as white, won 3 as black, and drew the remaining 72.[6] In a series of twelve 100-game matches (of unspecified time or resource constraints) against Stockfish starting from the 12 most popular human openings, AlphaZero won 290, drew 886 and lost 24.[1]

