One Flip per Clock Cycle
Martin Henz, Edgar Tan, Roland Yap
SAT Problems
Find an assignment of n variables that satisfies
all m clauses (disjunctions of literals of
variables)
Notation:
V: array of boolean values; V[3] is the value
of the third variable in assignment V
EVALi(V): evaluation function of clause i,
returns boolean value resulting from
evaluating clause i under assignment V
GenSAT
procedure GenSAT(cnf, maxtries, maxflips)
for i = 1 to maxtries do
INITASSIGN(V);
for j = 1 to maxflips do
if V satisfies cnf then return V
else
f = CHOOSEFLIP();
V := V with variable f flipped
end end end end
Instances of GenSAT
• GSAT: CHOOSEFLIP randomly chooses a flip that
produces maximal score
• WSAT: CHOOSEFLIP randomly chooses a
violated clause, and randomly chooses among the
variables of that clause a flip that produces
maximal score
• GWSAT: choose randomly whether to do GSAT
flip or WSAT flip
• GSAT/Tabu: prevent quick flipping back
• HSAT: use history for tie breaking: choose least
recently flipped variable
FPGAs
• ASICs: application-specific integrated circuits
– customer describes logic behavior in a hardware
description language such as VHDL
– vendor designs and produces integrated circuit with
this behavior
• Masked gate arrays
– ASIC with transistors arranged in a grid-like manner
– initially unconnected; mass produced
– add final conductor layers for connecting components
• FPGAs: field programmable gate arrays
Current Line of FPGAs: Example
•
•
•
•
•
•
•
•
•
Xilinx XCV1000
4MBytes on-board RAM
max clock rate 300 MHz
max clock rate using on-board RAM 33MHz
6144 CLBs (configurable logic blocks)
roughly 1M system gates
1 Mbit of distributed RAM
each CLB is divided into 2 slices
thus 12,288 slices available
Programming FPGAs
• Massively parallel computer with random access
memory
• Instructions are compiled into hardware; no
runtime stacks; no functions; no recursion…
• In practice, hardware description languages like
VHDL are used to program FPGAs
• Newer development: Handel C
NESL-like Syntax for Parallelism
P
gates for P
depth of P
x:=y+z
g(P) = O(1)
d(P) = O(1)
Q; R
g(P) = g(Q)+g(P)
d(P) = g(Q)+g(R)
{e(i) : i S}
g(P) = i(g(e(i)))
d(P) = maxi(d(e(i)))
Example
Let S be an array of statically known size n,
where n is a power of 2.
macro SUM(S,n):
if n = 1 then S[0]
else SUM({ S[2i] + S[2i + 1]
: i [0..n/2-1]},
n/2)
g(SUM(S,n) = O(n)
d(SUM(S,n) = O(log n)
Previous GSAT/FPGA Work
• Hamadi/Merceron: first non-software design of a
local search algorithm; CP 97
• Yung/Seung/Lee/Leong: runtime reconfigurable
version of Hamadi/Merceron work; first
implementation; Conference on Fieldprogrammable Logic and Applications, 1999
Naïve Parallel GSAT (Ham/Merc)
macro CHOOSEFLIP(f):
max := -1; f := -1;
for i = 1 to n do
score := SUM({EVALj(V[V[i]/i] : j [1…m]});
if score > max (score = max RANDOMBIT()) then
max := score; f := i
end
end
g(CHOOSEFLIP(f)) = O(n m)
d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)
Step 1: Naïve Random GSAT
macro CHOOSEFLIP(f):
max := -1; f := -1;
MaxV := {0 : k [1…n]};
for i = 1 to n do
score := SUM({EVALj(V[V[i]/i] : j [1…m]});
if score > max then
max := score; MaxV := { 0 : k [1…n]}[1/i]
else if score = max then MaxV := MaxV[1/i]
end
end
f := CHOOSE_ONE(MaxV)
g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)
Step 2: Parallel Variable Scoring
macro CHOOSEFLIP(f):
Scores := { SUM( {EVALj(V[V[i]/i])
: j [1…m]})
: i [1…n]};
f := CHOOSE_MAX(Scores);
d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m)
g(CHOOSEFLIP(f)) = O(m n2)
Step 3: Relative Scoring
• Selman/Levesque/Mitchell use a technique of
relative scoring in their implementation.
• First thorough analysis of relative scoring in Hoos’
Diplomarbeit
• Idea: After every flip, update the score of those
variables that are affected by the flip.
• Since clauses are small, the number of affected
variables is much smaller than the overall number
of variables
Some Notation
• NCl[i] is the number of clauses that contain the
variable i
• MaxClauses = maxi NCl[i];
usually MaxClauses << m
• MaxVariables = maxj (number of vars in clause j)
• EVALjC(i) evaluates the j-th clause from the set of
clauses that contain the variable i
Relative Scoring
macro CHOOSE_FLIP(f):
NewS := { SUM({EVALjC(i)(V[V[i]/i]) : j [1…NCl[i]})
: i [1…n] };
OldS := { SUM({EVALjC(i)(V)
: j [1…NCl[i]})
: i [1…n] };
Diff := { NewS[i] – OldS[i] : i [1…n]};
f := CHOOSE_MAX(Diff)
g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n)
d(CHOOSE_FLIP(f)) = O(log MaxClauses +
log MaxVars)
Step 4: Pipelining
procedure GenSAT(cnf, maxtries, maxflips)
for i = 1 to maxtries do
INITASSIGN(V);
for j = 1 to maxflips do
if V satisfies cnf then return V
else
f = CHOOSEFLIP();
V := V with variable f flipped
end end end end
Pipelining Outer Loop
macro CHOOSE_FLIP(f):
NewS := { SUM({EVALjC(i)(V[V[i]/i]) : j [1…NCl[i]})
: i [1…n] };
STAGE II OldS := { SUM({EVALjC(i)(V)
: j [1…NCl[i]})
: i [1…n] };
STAGE III Diff := { NewS[i] – OldS[i] : i [1…n]};
STAGE IV f := CHOOSE_MAX(Diff)
STAGE I
Try 1
Try 2
Try 3
Try 4
SI
S II S III S IV S I
SI
S II S III S IV S I
S II
S II S III S IV S I S II S III S IV S I
S I S II S III S IV S I S II S III S IV
S I S II S III S IV S I S II S III
…
…
…
…
Preliminary Experiments
• Conducted on hill-climbing variant of GSAT;
• Comparing software implementation by
Selman/Kautz with Hamadi/Merceron and Step 4
• Software: running on Pentium II at 400MHz
• FPGA: running on Xilinx XCV 1000 at 20MHz;
programmed using Handel C by Celoxica
Flips per Second
DIMACS Software FPGA
FPGA
Problems Sel/Kau Ham/Mer Step 4
50-80128.5 K 520 K
25 M
1.6
Speedup
vs H/M
50-1002.0
48
107.4 K
520 K
25 M
48
100-160- 139.6 K
1.6
284 K
22 M
77.5
100-200- 110.9 K
2.0
284 K
22 M
77.5
Flips per Slice Second
DIMACS Slices
f / sl sec Slices
Problems Ham/Mer Ham/Mer Step 4
f / sl sec Impro
Step 4
vement
50-801.6
651
800
1671
14950
18.7
50-1002.0
704
740
1697
14700
19.9
100-160- 1136
1.6
250
3154
6975
27.9
100-200- 1240
2.0
230
3186
6900
30
Conclusions
• Fastest known one-chip implementation of GSAT
• using parallel relative scoring plus pipelining
• current size and speed makes it feasible to use
FPGAs as platforms for parallel algorithms
• FPGA are one-chip parallel machines with
serious limitations of programmability
• higher-level languages needed
• stack support needed: towards compiling parallel
languages to hardware
© Copyright 2026 Paperzz