Deep Feng-hsiung Blue Hsu, S. Murray IBM T. System Campbell, J. Watson Abstract and Grand Challenge science is the creation level chess computer. problems of a World Combining VLSI in com- custom circuits, Thompson Introduction however, started Charles Babbage, of playing back in 1840s, contemplated chess using the pos- about Engine. additional his Analytical Claude Shannon, at the dawn of the modern computer age, presented his seminal paper [10] on computer chess in 1949 at a convention in New York. Across the Atlantic, Alan Turing hand simulated his program and played possibly the first computer chess game on record the same time. speech the World meeting In 1957, Herbert [11] predicting Chess Champion of Operational before Research would other be to build ply, experiments that taper [13, 4] with the performance off when the increase search depth At shallower depths, Belle gained points (one full chess Class) for each or close to 100 rating points for each hand, did not actually finally happen and machine vs. off effect, on the when faster chess appeared. 1967 in the annual Society The of America. during was ten times for faster the 1980s. sity, using hardware First, a 64-chip chess board) there continued was Cray unabated Blitz [8] run- VLSI (one chip for each square move generator [3]. The on the first au- thor, Hsu, started a different machine in 1985, also at Carnegie Mellon, using a one-chip move generator based on a refined Belle move generator design, which later evolved into the Deep Thought chess automaton [6, 7, 5]. Deep Thought won the second Fred- slower. kin Intermediate Prize by being the first machine to achieve Grandmaster level performance over 25 consecutive rated games in 1988. Permission to make digital/hard copies of all or part of this material without fee is granted provided that the copies are not made or distributed Deep Thought, for profit or commercial advantage, the ACM copyright/server notice, the title of the publication and its date appear, and notice is given that copyright is by permission of the Association for Computing Machinery, Inc. (ACM). To copy otherwise, to republish,to post on servers redistribute to lists, requires specific permission and/or fee. ICS ’95 Barcelona, Spain G 1995 ACM 0-89791-726-6/95/0007. quest ning on the Cray supercomputers. Then there waa the Hitech chess automaton from Carnegie Mellon Univer- USCF (United State Chess Federation) scale, Ten years later, the top program, Chess 4.5 [12], reached Class A level (strong amateur)} playing on a Cyber This is more than one full class 176 supercomputer. stronger than when the same program was playing on which self-play in both machine vs. machine play. His observed tapering automata Greenblatt brought up a program that started to play at low Class C (about a weak amateur) level on the machine, 8 plies. 200 rating firmed human Simon gave his a computer proceeded doubling of search speed. (To search one extra ply requires a six-fold increase in search speed on average. ) The basic slope measured by Thompson was later con- Despite the early enthusiasm, computers were still playing at rank beginner level in 1966, when Richard its regular Jr. Laboratories showed to seriously exceeded sibility [2] of Bell Thompson’s Belle, around Hoane, the famous Belle special-purpose chess automaton. In 1982, searching about 8 plies (4 moves by White, and 4 moves by Black) in chess middle games, it became the first computer to play at USCF National Master level and thus claimed the first Fredkin Intermediate Prize. keynote Joseph Center program performance, when used in conjunction with brute force searching techniques. Joe Condon and Ken Championship dedicated massively parallel C@ search engines, and various new search algorithms, Deep Blue is designed to be such a computer. This paper gives an overview of the system, and examines the prospects of reaching the goal. 1 A. Research By the late 1970s, it became clear that faster hardware speed was the single most effective way to improve One of the oldest puter Overview or to both playing curve predicted to answer .$3.50 240 and, to a lesser degree, at well above for their what search the speed. why this is so. One conjecture Hitech Belle were self-play It is difficult is that Belle was not other optimized did not perform puters top for two machines greater search well back in Belle’s were much slower, while programs ponents. different speed, Selective searching were. had significant heyday and the Table programs when com- searching Logical com- into their evaluation functions. 550,000 Technology 0.6 pm 3 metal Package 144-pin Clock Project The first 3-5 Millon two authors, Hsu and Campbell, previously worked joined together IBM new chess chip, on the Deep to improve while Campbell Deep Thought strength to close to Super national rating) Deep Thought. Kasparov won handily as predicted by the roughly 350 rating point difference of the two players. In fact, having previously studied all the com- 3 games, Kasparov playing machine. another To counter difference could would 200 points the higher 550 potential need 3 additional than rating the plies, assuming Grandmaster (2600+ inter- by 1995. Chip single-chip Thought months. that strength Chess The point continued Hardware 3.1 very well have been ef- and Hoane 2 from average Grandmaater Thought machine. Shortly after, in October 1989, the first ever exhibition match between a reigning human World Chess Champion and a computer was played in New York city. The players were Gary Kasparov and fectively Pops History in 1989, having puter’s CMOS PGA 30-40 MHz Freq Speed 2 1 Million 450,000 Memory It is also plausible that deep searchers require types of chess knowledge to be incorporated Characteristics about Transistors by late 1980s, all of the selective 1: Chip chess move generator used in Deep and Deep Thought 2 was designed in about 6 The new chip took over 3 years to complete. the rating increase per ply can be maintained. Deep Thought was searching about 700,000 Pops (positions Unlike the move generator chip, the new chip is a fullfledged chess machine on its own, containing a new per second). To reach equity with Kasparov, if the extrapolation holds, requires a machine searching at least move generator, a sophisticated hardware evaluation function, and an cY@search controller. It ccmtains over one million transistors versus the 36,000 or so in the 150 million Pops. A two-phased terim machine, with both approach more and completely knowledgeable are comparable processors in speed (500,000 First, 2, was completed Deep Thought to Deep Pops when running but had more chess knowledge built the search would chess stand- for Deep Thought 2 underwent configuration, several revisions. Deep Thought With other findings Super Grandmaster To implement on a general description 2 searches level when the equivalent purpose in a future at or properly function computer using a to play would optiof one require a assuming coding efficiency of per position searched. A rough of the chip is given here, further be presented paper. basic design characteristics Table details will 1 summarizes the of the chip. Of the three components of the chip, the evaluation is by far the biggest, occupying two thirds of the chip area. The C@ search controller is tlhe smallest, using only about 570 of the chip area, with the balance function Experience with Deep Thought 2 offered two valuable lessons. Long range chess knowledge gaps have to be explicitly filled in the hardware and the deeper the searcher, the more selective the search should be. with above a system can be expected 10,000 MIPs machine, 2,000-3,000 instructions parov. along new chess chip chip 3 to 5 million Pops and plays at close to 2600 international rating, or about 200 points weaker than Kas- These lessons, be much improved, single mized. in, and up to 24 processors can be running simultaneously (Logically, up to 32, but only 24 were built). The software a 14-processor new chess ch~ip searches 3-5 million Pops, or about the same as the entire Deep Thought 2, but since both the evaluation function and 2’s chess Thought’s A single old move generator. an inin 1991 new chess processors, new software. processors alone), was adopted. Deep Thought going to the move generator. The evaluation function contains over half of the logical transistors, and more were applied during the design of a new single chip chess processor in the second phase of the project. than The second phase of the project went into full swing after the first version of Deep Thought 2 software was The core part of the move generator is an 8 by 8 combinatorial array, one cell per square of the chess completed board, group in 1991 and Joseph in late 1990, took revisions Hoane, who joined over the maintenance of the search code. Hsu started 80% of the memory 50 odd RAMs the following and ROMs transistors, of various to work on the 241 over the same basic design used in the Deep Thought move generator [7]. The game are embedded in the wiring and the distributed sizes. rules cjf the chess patterns between the cells. Unlike the old move generator which can only generate pseudo-legal moves on demand, the new move generator core can generate check evasion moves trapped pieces, development and so on. It is perhaps more complicated than any chess evaluation function ever described in computer chess literature. or checking moves aa well. Additional support to detect hung pieces and to measure the forcefulness of a check is also included. Special circuitry outside the core keeps The search control does not really implement the regular @ search algorithm [9]. Rather, it implements a minimum-window C@ search algorithm. This elimi- track of the presence castling) priate of special and generates moves (en passant the special and nates parts: piece placement and slow evaluation. assigns a value This new datapath evaluation, evaluation At the system device with of the level, a 17-bit the beyond the software 3.2 Initial used to look bles. part King up the possible centralization of the endgame outcome bonus evaluation. piece The counts. piece locations from ROM initial system is station. ta- to use more than is also computed as The pawn is smaller. The pawn array computes whether theory, Currently, the initial a 16-processor used to support a given pawn is beyond the reach of opponent’s king and whether it has a clear path to promote. A few more complicated functions related to passed pawns are also chores, similar cards, into an IBM system 16 processors IBM or to Deep each with RS/6000 8 work- is not expected at a time, although, SP2 supercomputer up to 512 chess chips. to do so at the moment. embedded into an 8 by 8 combinatorial array, which is somewhat similar to the move generator core but much as a 32-bit The host processor configured 2. Up to 4 microchannel are bitmap appears System and so on are computed on the chip search depth. chess chips, can be plugged based control machines in as few cycles as pos- can then be freed up to do housekeeping initiate a search on another chip. Thought the XORed state address space. One of the possible the pawns. Simple endgame assertions such as rule draws, insufficient material, bishops of opposite colors endgames, three addresses, when written, initiates a search from the current position on the chip, usually for 4 or 5 plies precomputed by software baaed When a piece captures another piece, the moving piece’s from-square value and the victim piece’s to-square value are subtracted, and the moving piece’s to-square value is added. The endgame evaluation maintains the counts of various chess pieces still on the chess board, the XORed locations of all the pieces of a given piece type, and a bitmaps of all For certain and sible. endgame square a 16-bit needed by the search algorithm into three piece placement for each piece on every chess board, usually on the game position. search controlling different sections of the datapath. Two of the state machines also control the move generator indirectly. The datapath uses multiple cascaded adder/subtracter’s to compute the conditional flags can be partitioned evaluation, The The sequence. time in the move generation function stack. contains move generator allows the search to be more selective than was possible with the old move generator, and the move ordering is also slightly improved. The evaluation the need for a value moves at the appro- There in can be is no plan Each chess chip is about the same speed as the entire Deep Thought Thought 2 is only about 200 points from 2, and Deep Kasaprov. It is conceivable system that a single card, 8-chip, could optimized be right at Kasparov’s level, when properly and against an unprepared Kasparov. The first release of the software is adapted from Deep Thought 2 software, which almost certainly is not optimal with respect to the new hardware. computed by the pawn array. Both the piece placement evaluation function and the endgame evaluation are computed in one single cycle. The slow evaluation is the single most complicated element on the chip, occupying about half of the chip core, and takes 10 cycles The purpose for building this initial system is mainly for debugging and preparing for the chip production to compute. run tion At very shallow has to be computed depths, for about tions searched, but at the realistic percentage drops to around 1570. haa a 3-stage array that pipeline, starting runs for 8 cycles, the slow evaluahalf search depths, The with slow which also should of the posi- native the will be used to build eat Deep Thought English speakers, it should beat system. It [For non- Deep Thought 2 consistently.] evaluation a 8 by 1 systolic one cycle per file. This 3.3 array is followed by 40 odd synchronous RAMs, and then an adder tree that accumulates the results. The slow evaluation sequence can be stopped on the fly by the controller to reduce power consumption. The slow evaluation computes values for chess concepts such as Final System The design of the final system is not yet complete, The system is to be mounted on a rack about 6 feet tall, and has a footprint about the same as an office refrigerator. A PowerPC based host inside the rack controls a 2 l-slot 9U full depth VME box which contains the square control, pins, xrays, king safety, pawn structure, passed pawns, ray control, outposts, pawn major- chess chips and a hardware ity, rook on the 7th~ blockade, timate restraint, the final 2 for lunch. color complex, 242 is that hash table. up to 8 boards The current of chess chips, es- 128 chips final system place. per board for a total of 1,024 chips, up to 4 boards for the hardware hash table and a host interface board would be inside the VME tion is expected box. The total to be around raw speed would Pops. Assuming 5 a 30% the same task would Charles Babbage never did get his Analytical Engine to play chess, settling for the much simpler goal of playing tic-tac-toe. Herbert Simon’s prediction never did need at come true. Kasparov, the human said in 1988, the very scored Since the design of the the final system is incomplete at the moment, the descriptions of the software here are limited to the software of the initial system. It is essentially the same as the Deep Thought 2 software in basic structure. supports and then gives the positions search the remaining The about the order sor some time new ate search ware to process searches. Most software uses a revised brute of part version selectivity parallelization software the selectivity The of singular efficiency is around force search. of calculations, efficiency to around chip hard- the search of software search The chess chip evaluation hardware function, 8070 when not a part ware. Only be for providing of the project, and Gershon Brody assistance. a fairly Anantharaman, Hsu. compli- ing, to Spring Murray Singular brute force Symposium, pages 8-13, March Campbell, extensions: searching. Computer and Adding In 1988 Game Play- 1988. all of the evalua[2] J. H. Condon and K. Thompson. Bells chess hardware. In M. R. B. Clarke, editor, Advances in Computer [3] Carl Ebeling. chitecture for for long range planning. moves with dramatically Chess University, All the Chess. April [4] J. H, Gondon Pergamon Right Moves: A Press, 243 Ar- Mellon 1986. and Ken Thompson. [5] F. Hsu, T. Anantharaman, A. Nowatzyk. Deep thought. the VLSI PhD thesis, Carnegie 13elle. In Pe- Chess Skill in Man ter W. Frey, editor, pages 20 1–2 10. Springer-Verlag, chine, tion, 1983. good results when 9, pages 45–54. 1982. in the opening book. The opening book It is expected that the opening revised. is constantly book will have to be changed and Deep Blue could port AAAI of the search tree. the most frequent is at hand Moulic, C. J. for their sup- selectivity An opening book created from a chess game database with 300,000 games makes up the rest of the softare included between. The authors would like to thank Randy Tan, Zeev Barzilai and Maurizio Arienzo Feng-hsiung The software also accesses the on-line 5-piece endgame databases when a 5-piece endgame position is reached in the software in to do it. [1] Thomas posi- downloads them onto the chips in parallel. Furthermore, in the software part of the search, a complementary software evaluation is also computed. The softis used mainly somewhere References tion is done in hardware. The software precomputes the evaluation tables for each game position and then ware evaluation lies inevitably, computers will overtake World Champion. The authors be- requiring the parallelization supports ovsr a Grand- The doing 509’0. For typical but Deep Thought by a computer victory the time engineering tions, with a brute force search depth of 12 plies the average maximum depth reached is around 36 plies. cated Chess Champion, [1], which algorithm. For chess positions very long sequences can drop the of the extensions than the original World Acknowledgements and initi- adjust behavior of search. com- search takes results can the have been about same year that probably the machine gives the host proces- the search to change searches. has greater simple The parameters is in the Each 4-ply of 1 ms, which truth lieve that say, 8 plies, to the the chess chips to 4 plies. the first Eventually and even the human the initial system hardware. The chess chips are used aa hardware coroutines by the software. For an 12searches the first, scientists predictions maater in a tournament game, that computers will never beat a Grandmaster before the year 2000. Very famous chess players have been known to make too optimistic predictions about computer chess. up to 32 chess chips as does ply search, the software computer chess. Gary Software Very famous to make too optimistic puter The software takes Conclusion known 4 the Match The aggregate parallelization efficiency, the effective speed would be around 1 billion Pops, or greater than one thousand times the speed of Deep Thought. A general purpose machine used to perform least one Teraops. and before power dissipa- 2-3 KW. be 3-5 billion is completed and Ma2nd edi- and M. Campbell, In T. A. Marsland and J. Schaeffer, editors, Comput em, Chess, and pages 55–78. Springer-Verlag, 1990. Cognition, [6] Feng-hsiung Hsu. A two million chip chess move generator. single Digest of Technical Papers, moves/see cmos In 1987 lSSCC page 278, February 1987. [7] Feng-hsiung Alpha-Beta tural Hsu. Large Search: An Study. sity, February PhD thesis, 1990. Scale Parallelization Algorithmic and Carnegie of Architec- Mellon Univer- [8] Robert M. Hyatt. Parallel chess on the tray xInternational Computer Chess Associamp/48. pages 90–99, 1985. tion Journal, [9] Donald analysis E. Knuth and Ronald of alpha-beta pruning. ligence, 6:293–326, W. Moore. An IntelArtificial 1975. [10] C.E. Shannon. Programming a computer for playPhilosophical Magazine, 41:256-275, ing chess. 1950. [11] H. A. Simon and A Newell. Heuristic ing: The next advance in operations e~ations 6(1):1-10, Research, at the 1957 meeting problem solvresearch. Op- 1958. Address of Operations Research given Soci- ety of America. [12] David J. Slate and Lawrence R. Atkin. Chess 4.5the northwestern university chess program. In PeChess Skill in Man and Mater W. I?Yey, editor, chine, pages 82-118. Springer-Verlag, 1977. [13] K. M. Thompson. R. B. Clarke, Chess Computer editor, 9, pages 55-56. chess strength. Advances Pergamon in In Computer Press, 1982. 244
© Copyright 2026 Paperzz