Deep Blue System Overview

Deep
Feng-hsiung
Blue
Hsu,
S.
Murray
IBM
T.
System
Campbell,
J. Watson
Abstract
and
Grand
Challenge
science is the creation
level chess computer.
problems
of a World
Combining
VLSI
in com-
custom
circuits,
Thompson
Introduction
however,
started
Charles
Babbage,
of playing
back in 1840s, contemplated
chess using
the pos-
about
Engine.
additional
his Analytical
Claude Shannon, at the dawn of the modern computer
age, presented his seminal paper [10] on computer chess
in 1949 at a convention
in New York. Across the Atlantic, Alan Turing
hand simulated
his program
and
played possibly the first computer chess game on record
the same time.
speech
the World
meeting
In 1957, Herbert
[11] predicting
Chess Champion
of Operational
before
Research
would
other
be
to build
ply,
experiments
that
taper
[13, 4] with
the performance
off when
the
increase
search
depth
At shallower
depths,
Belle gained
points (one full chess Class) for each
or close to 100 rating
points
for each
hand,
did not actually
finally
happen
and machine vs.
off effect, on the
when faster
chess
appeared.
1967 in the annual
Society
The
of America.
during
was ten times
for faster
the 1980s.
sity,
using
hardware
First,
a 64-chip
chess board)
there
continued
was Cray
unabated
Blitz
[8] run-
VLSI
(one chip for each square
move generator
[3].
The
on the
first
au-
thor, Hsu, started
a different
machine
in 1985, also
at Carnegie Mellon,
using a one-chip move generator
based on a refined Belle move generator
design, which
later evolved into the Deep Thought
chess automaton [6, 7, 5].
Deep Thought
won the second Fred-
slower.
kin Intermediate
Prize by being the first machine to
achieve Grandmaster
level performance
over 25 consecutive rated games in 1988.
Permission
to make digital/hard
copies of all or part of this material
without fee is granted
provided
that the copies are not made or distributed
Deep Thought,
for profit or commercial advantage, the ACM copyright/server
notice, the title of the publication
and its date appear, and notice is given
that copyright is by permission of the Association for Computing Machinery,
Inc. (ACM).
To copy otherwise,
to republish,to
post on servers
redistribute
to lists, requires
specific
permission
and/or
fee.
ICS ’95
Barcelona,
Spain G 1995 ACM
0-89791-726-6/95/0007.
quest
ning on the Cray supercomputers.
Then there waa the
Hitech chess automaton
from Carnegie Mellon Univer-
USCF (United
State Chess Federation)
scale,
Ten
years later, the top program,
Chess 4.5 [12], reached
Class A level (strong
amateur)}
playing
on a Cyber
This is more than one full class
176 supercomputer.
stronger than when the same program was playing on
which
self-play
in both machine vs. machine
play. His observed tapering
automata
Greenblatt
brought
up a program that started to play
at low Class C (about a weak amateur)
level on the
machine,
8 plies.
200 rating
firmed
human
Simon gave his
a computer
proceeded
doubling
of search speed. (To search one extra ply requires a six-fold increase in search speed on average. )
The basic slope measured by Thompson
was later con-
Despite the early enthusiasm,
computers
were still
playing at rank beginner level in 1966, when Richard
its regular
Jr.
Laboratories
showed
to seriously
exceeded
sibility
[2] of Bell
Thompson’s
Belle,
around
Hoane,
the famous Belle special-purpose
chess automaton.
In
1982, searching about 8 plies (4 moves by White, and
4 moves by Black) in chess middle games, it became
the first computer
to play at USCF National
Master
level and thus claimed the first Fredkin Intermediate
Prize.
keynote
Joseph
Center
program performance,
when used in conjunction
with
brute force searching techniques.
Joe Condon and Ken
Championship
dedicated
massively
parallel
C@ search engines, and
various new search algorithms,
Deep Blue is designed
to be such a computer.
This paper gives an overview
of the system, and examines the prospects of reaching
the goal.
1
A.
Research
By the late 1970s, it became clear that faster hardware speed was the single most effective way to improve
One of the oldest
puter
Overview
or to
both
playing
curve
predicted
to answer
.$3.50
240
and, to a lesser degree,
at well
above
for their
what
search
the
speed.
why this is so. One conjecture
Hitech
Belle
were
self-play
It is difficult
is that
Belle
was not
other
optimized
did not perform
puters
top
for
two machines
greater
search
well back in Belle’s
were much slower, while
programs
ponents.
different
speed,
Selective searching
were.
had significant
heyday
and
the
Table
programs
when com-
searching
Logical
com-
into their
evaluation
functions.
550,000
Technology
0.6 pm 3 metal
Package
144-pin
Clock
Project
The first
3-5 Millon
two authors,
Hsu and Campbell,
previously
worked
joined
together
IBM
new chess chip,
on the Deep
to improve
while
Campbell
Deep Thought
strength
to close to Super
national
rating)
Deep Thought.
Kasparov
won handily
as predicted
by the roughly
350 rating point difference of the two
players. In fact, having previously
studied all the com-
3
games, Kasparov
playing
machine.
another
To counter
difference
could
would
200 points
the
higher
550 potential
need 3 additional
than
rating
the
plies, assuming
Grandmaster
(2600+
inter-
by 1995.
Chip
single-chip
Thought
months.
that
strength
Chess
The
point
continued
Hardware
3.1
very well have been ef-
and Hoane
2 from average Grandmaater
Thought
machine.
Shortly after, in October 1989, the
first ever exhibition
match between a reigning human
World Chess Champion
and a computer
was played in
New York city. The players were Gary Kasparov
and
fectively
Pops
History
in 1989, having
puter’s
CMOS
PGA
30-40 MHz
Freq
Speed
2
1 Million
450,000
Memory
It is also plausible that deep searchers require
types of chess knowledge
to be incorporated
Characteristics
about
Transistors
by late 1980s, all of the
selective
1: Chip
chess move
generator
used
in
Deep
and Deep Thought
2 was designed in about 6
The new chip took over 3 years to complete.
the rating increase per ply can be maintained.
Deep
Thought
was searching about 700,000 Pops (positions
Unlike the move generator chip, the new chip is a fullfledged chess machine
on its own, containing
a new
per second). To reach equity with Kasparov,
if the extrapolation
holds, requires a machine searching at least
move generator,
a sophisticated
hardware
evaluation
function,
and an cY@search controller.
It ccmtains over
one million
transistors
versus the 36,000 or so in the
150 million
Pops.
A two-phased
terim
machine,
with
both
approach
more
and completely
knowledgeable
are comparable
processors
in speed (500,000
First,
2, was completed
Deep Thought
to
Deep
Pops when running
but had more chess knowledge
built
the search would
chess
stand-
for Deep Thought
2 underwent
configuration,
several revisions.
Deep Thought
With
other
findings
Super
Grandmaster
To implement
on a general
description
2 searches
level
when
the equivalent
purpose
in a future
at or
properly
function
computer
using a
to play
would
optiof one
require
a
assuming
coding efficiency
of
per position searched. A rough
of the chip is given here, further
be presented
paper.
basic design characteristics
Table
details
will
1 summarizes
the
of the chip.
Of the three components
of the chip, the evaluation
is by far the biggest, occupying
two thirds of
the chip area. The C@ search controller is tlhe smallest,
using
only
about
570 of the chip area, with the balance
function
Experience
with Deep Thought
2 offered two valuable lessons. Long range chess knowledge
gaps have
to be explicitly
filled in the hardware
and the deeper
the searcher, the more selective the search should be.
with
above
a system
can be expected
10,000 MIPs machine,
2,000-3,000 instructions
parov.
along
new chess chip
chip
3 to 5 million
Pops and plays at close to 2600 international
rating, or about 200 points weaker than Kas-
These lessons,
be much improved,
single
mized.
in, and up
to 24 processors can be running simultaneously
(Logically, up to 32, but only 24 were built).
The software
a 14-processor
new chess ch~ip searches
3-5 million Pops, or about the same as the entire Deep
Thought
2, but since both the evaluation
function
and
2’s chess
Thought’s
A single
old move generator.
an inin 1991
new chess processors,
new software.
processors
alone),
was adopted.
Deep Thought
going to the move generator.
The evaluation
function
contains over half of the logical transistors,
and more
were applied
during the design of a new single chip chess processor
in the second phase of the project.
than
The second phase of the project went into full swing
after the first version of Deep Thought
2 software was
The core part of the move generator
is an 8 by 8
combinatorial
array, one cell per square of the chess
completed
board,
group
in 1991 and Joseph
in late 1990, took
revisions
Hoane,
who joined
over the maintenance
of the search code.
Hsu started
80% of the memory
50 odd RAMs
the
following
and ROMs
transistors,
of various
to work on the
241
over
the same basic design used in the Deep
Thought
move generator
[7]. The
game are embedded
in the wiring
and the
distributed
sizes.
rules cjf the chess
patterns
between
the cells.
Unlike the old move generator
which can
only generate pseudo-legal
moves on demand, the new
move generator core can generate check evasion moves
trapped
pieces, development
and so on. It is perhaps
more complicated
than any chess evaluation
function
ever described in computer
chess literature.
or checking moves aa well. Additional
support to detect
hung pieces and to measure the forcefulness of a check
is also included.
Special circuitry
outside the core keeps
The search control does not really implement
the regular @ search algorithm
[9]. Rather,
it implements
a minimum-window
C@ search algorithm.
This elimi-
track
of the presence
castling)
priate
of special
and generates
moves (en passant
the special
and
nates
parts:
piece placement
and slow evaluation.
assigns
a value
This new
datapath
evaluation,
evaluation
At
the
system
device with
of the
level,
a 17-bit
the
beyond
the software
3.2
Initial
used to look
bles.
part
King
up the possible
centralization
of the endgame
outcome
bonus
evaluation.
piece
The
counts.
piece locations
from
ROM
initial
system
is
station.
ta-
to use more than
is also computed
as
The pawn
is
smaller.
The
pawn
array
computes
whether
theory,
Currently,
the initial
a 16-processor
used to support
a
given pawn is beyond the reach of opponent’s
king and
whether it has a clear path to promote.
A few more
complicated
functions
related to passed pawns are also
chores,
similar
cards,
into an IBM
system
16 processors
IBM
or
to
Deep
each with
RS/6000
8
work-
is not expected
at a time,
although,
SP2 supercomputer
up to 512 chess chips.
to do so at the moment.
embedded
into an 8 by 8 combinatorial
array, which
is somewhat
similar
to the move generator
core but
much
as a 32-bit
The host processor
configured
2. Up to 4 microchannel
are
bitmap
appears
System
and so on are computed
on the
chip
search depth.
chess chips, can be plugged
based
control
machines
in as few cycles as pos-
can then be freed up to do housekeeping
initiate
a search on another chip.
Thought
the XORed
state
address space. One of the possible
the pawns.
Simple endgame assertions such as rule
draws, insufficient
material,
bishops of opposite colors
endgames,
three
addresses, when written,
initiates
a search from the
current
position
on the chip, usually for 4 or 5 plies
precomputed
by software baaed
When a piece captures another
piece, the moving
piece’s from-square
value and the
victim piece’s to-square value are subtracted,
and the
moving piece’s to-square value is added. The endgame
evaluation
maintains
the counts of various chess pieces
still on the chess board, the XORed locations
of all
the pieces of a given piece type, and a bitmaps
of all
For certain
and
sible.
endgame
square
a 16-bit
needed by the search algorithm
into three
piece placement
for each piece on every
chess board, usually
on the game position.
search
controlling
different
sections of the datapath.
Two
of the state machines also control
the move generator indirectly.
The datapath
uses multiple
cascaded
adder/subtracter’s
to compute
the conditional
flags
can be partitioned
evaluation,
The
The
sequence.
time in the move generation
function
stack.
contains
move generator
allows the search to be more selective
than was possible with the old move generator, and the
move ordering is also slightly improved.
The evaluation
the need for a value
moves at the appro-
There
in
can be
is no plan
Each chess chip is about
the
same speed as the entire Deep Thought
Thought
2 is only about 200 points from
2, and Deep
Kasaprov.
It
is conceivable
system
that
a single
card,
8-chip,
could
optimized
be right at Kasparov’s
level, when properly
and against an unprepared
Kasparov.
The first release
of the software is adapted from Deep Thought
2 software, which almost certainly
is not optimal
with respect to the new hardware.
computed
by the pawn array.
Both the piece placement evaluation
function
and the endgame evaluation
are computed
in one single cycle. The slow evaluation
is the single most complicated
element on the chip, occupying about half of the chip core, and takes 10 cycles
The purpose for building
this initial system is mainly
for debugging
and preparing
for the chip production
to compute.
run
tion
At very shallow
has to be computed
depths,
for about
tions searched, but at the realistic
percentage drops to around 1570.
haa a 3-stage
array
that
pipeline,
starting
runs for 8 cycles,
the slow evaluahalf
search depths,
The
with
slow
which
also should
of the posi-
native
the
will
be used to build
eat Deep Thought
English
speakers,
it should
beat
system.
It
[For non-
Deep Thought
2 consistently.]
evaluation
a 8 by 1 systolic
one cycle
per file.
This
3.3
array is followed by 40 odd synchronous
RAMs, and
then an adder tree that accumulates
the results.
The
slow evaluation
sequence can be stopped on the fly by
the controller
to reduce power consumption.
The slow
evaluation
computes values for chess concepts such as
Final
System
The design of the final system is not yet complete,
The
system is to be mounted on a rack about 6 feet tall, and
has a footprint
about the same as an office refrigerator.
A PowerPC
based host inside the rack controls
a 2 l-slot 9U full depth VME box which contains the
square control,
pins, xrays, king safety, pawn structure, passed pawns, ray control, outposts, pawn major-
chess chips and a hardware
ity, rook on the 7th~ blockade,
timate
restraint,
the final
2 for lunch.
color complex,
242
is that
hash table.
up to 8 boards
The current
of chess chips,
es-
128 chips
final system
place.
per board for a total of 1,024 chips, up to 4 boards
for the hardware hash table and a host interface board
would be inside the VME
tion
is expected
box. The total
to be around
raw speed would
Pops.
Assuming
5
a 30%
the same task would
Charles Babbage never did get his Analytical
Engine to
play chess, settling for the much simpler goal of playing tic-tac-toe.
Herbert
Simon’s prediction
never did
need at
come true.
Kasparov,
the human
said in 1988, the very
scored
Since the design of the the final system is incomplete
at the moment,
the descriptions
of the software here
are limited
to the software of the initial
system. It is
essentially the same as the Deep Thought
2 software in
basic structure.
supports
and then
gives the positions
search the remaining
The
about
the order
sor some time
new
ate
search
ware
to process
searches.
Most
software
uses a revised
brute
of
part
version
selectivity
parallelization
software
the
selectivity
The
of singular
efficiency
is around
force search.
of calculations,
efficiency
to around
chip
hard-
the
search
of
software
search
The
chess chip
evaluation
hardware
function,
8070 when
not
a
part
ware.
Only
be
for providing
of the project,
and Gershon
Brody
assistance.
a fairly
Anantharaman,
Hsu.
compli-
ing,
to
Spring
Murray
Singular
brute
force
Symposium,
pages 8-13,
March
Campbell,
extensions:
searching.
Computer
and
Adding
In
1988
Game
Play-
1988.
all of the evalua[2] J. H. Condon and K. Thompson.
Bells chess hardware.
In M. R. B. Clarke, editor,
Advances in
Computer
[3] Carl Ebeling.
chitecture for
for long range planning.
moves with
dramatically
Chess
University,
All
the
Chess.
April
[4] J. H, Gondon
Pergamon
Right
Moves:
A
Press,
243
Ar-
Mellon
1986.
and Ken Thompson.
[5] F. Hsu, T. Anantharaman,
A. Nowatzyk.
Deep thought.
the
VLSI
PhD thesis, Carnegie
13elle. In Pe-
Chess Skill
in Man
ter W. Frey, editor,
pages 20 1–2 10. Springer-Verlag,
chine,
tion, 1983.
good results
when
9, pages 45–54.
1982.
in the opening book. The opening book
It is expected that the opening
revised.
is constantly
book will have to be changed
and Deep Blue could
port
AAAI
of the search tree.
the most frequent
is at hand
Moulic, C. J.
for their sup-
selectivity
An opening book created from a chess game database with 300,000 games makes up the rest of the softare included
between.
The authors would like to thank Randy
Tan, Zeev Barzilai and Maurizio
Arienzo
Feng-hsiung
The software also accesses the on-line 5-piece endgame
databases when a 5-piece endgame position is reached
in the software
in
to do it.
[1] Thomas
posi-
downloads
them onto the chips in parallel.
Furthermore, in the software part of the search, a complementary software evaluation
is also computed.
The softis used mainly
somewhere
References
tion is done in hardware.
The software precomputes
the evaluation
tables for each game position
and then
ware evaluation
lies
inevitably,
computers
will overtake
World Champion.
The authors be-
requiring
the parallelization
supports
ovsr a Grand-
The
doing
509’0. For typical
but
Deep Thought
by a computer
victory
the time
engineering
tions, with a brute force search depth of 12 plies the
average maximum
depth reached is around 36 plies.
cated
Chess Champion,
[1], which
algorithm.
For chess positions
very long sequences
can drop
the
of the
extensions
than the original
World
Acknowledgements
and initi-
adjust
behavior
of search.
com-
search takes
results
can
the
have been
about
same year that
probably
the machine
gives the host proces-
the search
to change
searches.
has greater
simple
The
parameters
is in the
Each 4-ply
of 1 ms, which
truth
lieve that
say, 8 plies,
to the the chess chips to
4 plies.
the first
Eventually
and
even the human
the initial
system hardware.
The chess chips are used
aa hardware
coroutines
by the software.
For an 12searches the first,
scientists
predictions
maater in a tournament
game, that computers
will
never beat a Grandmaster
before the year 2000. Very
famous chess players have been known to make too optimistic
predictions
about computer
chess.
up to 32 chess chips as does
ply search, the software
computer
chess.
Gary
Software
Very famous
to make too optimistic
puter
The software
takes
Conclusion
known
4
the Match
The aggregate
parallelization
efficiency, the effective speed would be
around 1 billion
Pops, or greater than one thousand
times the speed of Deep Thought.
A general purpose
machine used to perform
least one Teraops.
and before
power dissipa-
2-3 KW.
be 3-5 billion
is completed
and Ma2nd edi-
and
M. Campbell,
In T. A. Marsland
and J. Schaeffer, editors,
Comput em, Chess,
and
pages 55–78. Springer-Verlag,
1990.
Cognition,
[6] Feng-hsiung
Hsu. A two million
chip chess move generator.
single
Digest
of
Technical
Papers,
moves/see cmos
In 1987 lSSCC
page
278,
February
1987.
[7] Feng-hsiung
Alpha-Beta
tural
Hsu.
Large
Search:
An
Study.
sity, February
PhD thesis,
1990.
Scale
Parallelization
Algorithmic
and
Carnegie
of
Architec-
Mellon
Univer-
[8] Robert
M. Hyatt.
Parallel
chess on the tray xInternational
Computer
Chess Associamp/48.
pages 90–99, 1985.
tion Journal,
[9]
Donald
analysis
E. Knuth
and Ronald
of alpha-beta
pruning.
ligence,
6:293–326,
W. Moore.
An
IntelArtificial
1975.
[10] C.E. Shannon. Programming
a computer for playPhilosophical
Magazine,
41:256-275,
ing chess.
1950.
[11] H. A. Simon and A Newell. Heuristic
ing: The next advance in operations
e~ations
6(1):1-10,
Research,
at the 1957 meeting
problem solvresearch. Op-
1958. Address
of Operations
Research
given
Soci-
ety of America.
[12] David J. Slate and Lawrence R. Atkin.
Chess 4.5the northwestern
university
chess program.
In PeChess Skill
in Man
and Mater W. I?Yey, editor,
chine, pages 82-118. Springer-Verlag,
1977.
[13] K.
M.
Thompson.
R. B. Clarke,
Chess
Computer
editor,
9, pages 55-56.
chess strength.
Advances
Pergamon
in
In
Computer
Press, 1982.
244