Efficient Register Allocation via Coloring
Using Clique Separators
RAJIV
GUPTA,
University
MARY
LOU
SOFFA,
and
DENISE
OMBRES
of Pittsburgh
Although
graph coloring is widely recognized as an effective technique for register allocation,
memory demands can become quite high for large interference
graphs that are needed in
coloring. In this paper we present an algorithm
that uses the notion of chque separators to
improve the space overhead of coloring, The algorithm,
based on a result by R. TarJ an regarding
the colorability
of graphs, partitions
program code into code segments using clique separators.
The interference
graphs for the code partitions
are constructed
one at a time and colored
independently.
The colorings for the partitions
are combined to obtain a register allocation for
the entire program. Thm approach can be used to perform register allocation in a space-efficient
manner. For straight-line
code (e.g., local register allocation),
an optimal
allocation
can be
obtained
from optimal
allocations
for individual
code partitions.
Experimental
results are
presented demonstrating
memory demand reductions for interference
graphs when allocating
registers using clique separators.
Categories
hard.wzre
compilers;
General
and
Subject
/ softumre
Descriptors:
Interfaces;
D.3.4
Systems Organization]:
C.O [Computer
[Programming
Languages]: Processors—code
General—
generation;
optimization
Terms: Algorithms,
Additional
priorities,
Design,
Languages,
Key Words and Phrases
spans, spill code
Clique
Performance
separators,
graph
coloring,
interference
graph,
node
1. INTRODUCTION
The problem
coloring
interference
edge have
Hennessy
for
of global
problem
in
register
which
graph is made
the same color
1984].
registers,
allocation
an
is commonly
assignment
edges
to
such that no two nodes directly
[Chaitin
et al. 1981; Chaitin
The nodes in an interference
and
formulated
of a color
connect
nodes
graph
that
must
node
in
an
connected
by an
1982; Chow and
correspond
be
as a graph
each
to candidates
allocated
different
This work was partially
supported by National Science Foundation
Presidential
Young Investigator Award CCR-9 157371 and Grant CCR-9 109089 to the University
of Pittsburgh.
A preliminary version of this paper appeared in the 1989 SIGPLAN
Conference on Programming
Language Design and Implementation.
Authors’
current address: Department
of Computer
Science, Umversity
of Pittsburgh,
Pittsburgh, PA 15260.
Permission to copy without fee all or part of this material is granted provided that the copies are
not made or distributed
for dmect commercial advantage, the ACM copyright notice and the title
of the publication
and its date appear. and notice is given that copying is by permmsion of the
Association for Computing Machinery. To copy otherwise, or to republish, reqmres a fee and/or
specific permission.
C 1994 ACM 0164-0925/94/0500-0370
$03.50
ACM Tr.an,s.act,on,sm Progrmmmng Languages and Systems, Vol 16, No 3, May 1994, Pages 370- 3S6
Register Allocation via Coloring Using Clique Separators
registers.
Existing
A coloring
techniques
techniques
for
allocation.
Local
straight-line
belong
register
register
allocation
allocation
segments.
to a class of graphs
and
deals
The
techniques
with
the
interference
called
interval
for
graphs
graphs
for
straight-line
[Golumbic
1980].
Global
allocation
deals
with
the
register
of registers
for general graphs,
time
[Garey
and
allocation
global
allocation
graph coloring is an NP-complete
problem
can be optimally
colored
in polynomial
register
371
of this graph is equivalent
to an assignment
of registers.
for register
allocation
can be divided
into two categories:
local
code
.
Although
interval
Johnson
of registers
in
code
graphs
1979].
in code con-
taining
branches.
The interference
graphs constructed
during
global register
allocation
are no longer interval
graphs,
and since general
graph coloring
is
an NP-complete
problem
[Garey and Johnson
1979], polynomial
time heuristics are used to obtain suboptimal
colorings.
Due to the increased
amount
of analysis
performed
by modern
optimizing
compilers,
interest
in improving
the space efficiency
of such compilers
growing.
As an example,
sparse evaluation
graphs are being used to limit
amount
[Choi
of data-flow
information
et al. 1990].
space requirement
computed
By avoiding
the
for saving
data-flow
during
computation
program
code optimization
of intermediate
information
is
the
is reduced.
results,
the
In this
paper
we present
an approach
for reducing
the space requirements
during
global
register
allocation.
The size of an interference
graph constructed
for global
register
allocation
can be large, and hence, it is desirable
to develop
techniques
for limiting
the memory
demands
of an interference
graph.
One
approach
that has been advocated
performs
inexpensive
tion before performing
global
register
allocation
using
Hennessy
not
1984].
successful
the local
global
However,
since
allocation
very
experience
has shown
few candidate
nodes
phase,
allocation
phase
and hence
[Larus
most
and
that
local register
allocacoloring
[Chow
and
this
of the allocation
Hilfinger
1986].
efficiency
of register
allocation
we must concentrate
ciency of global register
allocation
algorithms.
decomposition
each subgraph
can be colored
register
interference
due to Tarjan
into subgraphs
can be colored
in
allocation,
k colors
the
graph
[ 1985] regarding
using
using
Thus,
by combining
subgraphs
correspond
the
resulting
to code segments
coloring
from
during
the
to improve
on improving
the
the
effi-
space efficiency
The technique
the colorability
clique separators,
at most k colors,
is usually
registers
is done during
In this paper we present
a technique
that improves
the
graph-coloring-based
global register
allocation
algorithms.
based on a result
strategy
are allocated
of a graph
which
states
then the entire
of the
in a program.
Thus,
by
that if
graph
subgraphs.
a decomposition
of
is
of
In
an
clique
separators
partition
a program
into code segments
for which
the register
allocation
can be performed
independently.
We show that the partitioning
of
the program
can be carried
out by examining
the code; hence, the technique
does not require
the construction
of the entire
interference
graph and then
the execution
of an algorithm
on the graph to find the clique separators.
The
clique separators
are found in the code by selecting
traces (paths) through
the
control flow graph and finding
separators
along each trace. The interference
graphs for the code partitions
are constructed
one at a time, and a coloring
ACM
Transactions
on Programming
Languages
and Systems,
Vol
16, No 3, May 1994.
372
R. Gupta et al
.
heuristic
(e.g., priority-based
a subgraph
is colored,
subgraph.
The
coloring)
its
colorings
storage
for
is used to color
can
the
these
be reclaimed
subgraphs
are
and
subgraphs.
used
combined,
by
Once
another
resulting
in
a
coloring
of the entire interference
graph for the program.
Register
allocation
using
clique
separators
is carried
out efficiently
because, at a given point during
register
allocation,
only an interference
graph
for a single partition
needs to be constructed.
This reduces the space requirements
the
for the interference
coloring
heuristic
number
of nodes
number
of partitions.
offset
the time
suitable
for
graph.
Furthermore,
is a polynomial
in the
graph,
The
the
time
time
spent
savings
spent
on the detection
parallel
implementations,
if the run-time
of a degree
during
separators.
as the
code
complexity
than
on coloring
obtained
of clique
greater
reduces
with
coloring
partially
The strategy
in
of
one in the
each
trace
the
is also
can
be
decomposed
in parallel
and subgraphs
can be colored in parallel.
A parallel
implementation
will not provide
any total savings in space, but will improve
the run-time
complexity
of register
allocation.
In the next section, we discuss background
information,
including
mary
of Tarj an’s results
and a revised
definition
of name spans
ranges).
An overall
3. In Section
In Section
description
4 we describe
5 two
performance
coloring
of the technique
the partitioning
algorithms
of a clique-based
implementation
of the
are presented
in Section
that
register
technique
is then
technique
use separators
allocation
and
presented
using
results
scheme
based
a sum(or live
in Section
clique
separators.
are presented.
is analyzed
on experimental
and
The
an
studies
6.
2. BACKGROUND
The
technique
for
partitioning
this paper utilizes
the
separator
is a completely
graph.
tors
a program
The idea of decomposing
was
decompose
first
developed
a graph
into
code segments
notion
of clique
separators
connected
subgraph
whose
into
by
a graph
Tarjan
coloring
[1985].
subgraphs
that
problem
Clique
can
[Gavril
removal
be
presented
using
clique
separators
are
colored
in
1977]. A clique
disconnects
the
separaused
independently.
to
A
coloring
for the entire graph is obtained
from the colorings
of the subgraphs.
If each subgraph
is colored using at most h colors, then the entire graph can
be colored using k colors by combining
graph shown in Figure
la, the clique
the colorings
of the subgraphs.
For the
as its
CS = {u ~, U2, u ~} is a separator,
removal
results
in
disconnected
subgraphs
SI = { Va, us} and
S’a =
in Figure
lb. The subgrapbs
that must be colored using
{ U6, u ~, u., U3}, given
k colors, if a h-coloring
for the entire
graph
is to be found,
are shown in
Figure
lc. These subgraphs
are formed
by including
the members
of the
clique separator
in each of the disconnected
subgraphs
(Sl and S’z) shown in
Figure
lb. In Figure
lC colorings
of the subgraphs
using three colors are
given.
These colorings
are combined
to obtain
a 3-coloring
for the entire
graph shown in Figure
ld. The combining
process involves
renaming
of colors
in one of the subgraphs
so that both subgraphs
use the same colors for the
members
of the clique. In this example
the coloring
was achieved
by interACM
TransactIons
on Programming
Languages
and Systems,
Vol. 16, No. 3, May
1994
Register Allocation via Coloring Using Clique Separators
.
373
“m ‘8&3’5
V9
V2
c3(v6)
V2
V9
Cl(vl)
c1(v8)
c3(v6)
C3(V3)
c2(v7)a
C2(V2)
c1 (V9)
cl(’1)
~3(v4)
cI(v8)
c2(v5)
C3(V3)
C2(V7)
o
.$2( ’1)
c1(’9)
C3(V4)
C2(V2)
CI(V5)
P
C3(V3)
C1(V2)
Fig. 1.
Clique separators.
changing
the use of colors c1 and C2 in the subgraphs
that includes
S1 and
CS. The subgraphs
resulting
from a decomposition
maybe
further
decomposable using clique separators.
A graph that cannot be decomposed
any further
is called
an atom.
In the above example,
as it can be further
decomposed
developed
S1 is an atom,
by clique
The
algorithm
graph,
following
posed.
reduce
This approach
is not useful for register
the space complexity
of the algorithm.
which
by
separators
in this paper
instead
of the interference
Another
allocation
Dietz
clique
requires
[1988]
using
are identified
by
graph of the entire
will
A register
of the
and the graph
allocation
To avoid
examining
program
entire
is decom-
because it does not
this problem,
clique
the
code.
program
code
into partitions
for which
register
has been developed
by Chi and
cut point
in a program
is a point
at
allocation
of live variables
to registers
can be determined
applying
a register
allocation
algorithm.
For example,
if
variable
live at a point
in the program,
then one of the
hold
separators
cut points.
S’z is not an atom,
construction
are identified
technique
to divide
a program
can be carried
out independently
which the optimal
without
actually
there is a single
registers
Tarjan
the separators
but
{UG, UT}.
its
value,
to partition
and
the
remaining
code is more
must
general
than
be empty.
register
The
use of
cut points.
A
code segment
can be divided
into subparts
using clique separators
even if
optimal
allocation
at any point during
the code segment is not known.
Before we can construct
interference
graphs,
we must identify
the name
spans that are represented
as nodes of the interference
graph.
In earlier
global register
allocation
algorithms,
a name span of a variable
consisted
of a
group of basic blocks in the control
flow graph. However,
in this work name
ACM
Transactions
on Programming
Languages
and Systems,
Vol. 16, No. 3, May 1994.
374
R, Gupta et al
.
Fig. 2.
spans
are composed
of basic
blocks.
of a group
This
interest
may appear
block boundaries.
Definition.
group
spans
of code statements
modification
is essential
at any
in the
A span
of contiguous
Constructing
point
corresponding
that
because
code and
to a variable
code statements
that
not
may
include
clique
necessarily
X is defined
satisfies
the
portions
separators
of
at basic
as an isolated
following
conditions:
(1) There is no use of X outside the span that is reachable
from a definition
of
X inside the span, and (2) there is no use of X in the span that is reachable
from
a definition
of X outside
Consider
the flow graph
created for variable
X. The
statement
ment
SZ – 1, thus
SZ is used
in
the span.
in Figure
definition
creating
basic
the
blocks
2. In this example
three
spans are
of X in statement
s ~ is used only by
span
B2,
B3,
L1. The
and
definition
B4,
which
of X
causes
in stategroups
of
statements
Sz . . . s~, S4 . . . s~, SG. . . S7, and Slo . . . S1l to be included
in span
Lz. Furthermore,
since the use of X in statement
s ~~ is also reachable
from
the definition
of X in statement
Sg, the statements
Sg . . . Sg are also included
in Lz. The third span L~ is composed of statements
S1l + 1 . . . S12. If the code
is stored in a linear
code array and the code for basic blocks is stored in the
order Bl, B2, B3, and B4, then the span Lz can be represented
as consisting
of statement
3. OVERVIEW
groups
SZ . . . SV and
OF THE
Sa . . . SI1.
REGISTER
ALLOCATION
TECHNIQUE
A high-level
algorithm
for allocating
registers
using clique separators
is given
in Figure
3. The approach
is independent
of the coloring
scheme used in the
actual register
allocation.
A control flow graph representation
of the program
ACM
TransactIons
on Programmmg
Languages
and Systems,
Vol
16, No 3, May
1994
Register Allocation via Coloring Using Clique Separators
.
375
Given: An control t] ow graph representation of o program.
out put:
Register allocation for the program.
Compute the name spans.
+ 0
whl le the entire program has not been pmlitioned do
Select I trace composed of basic blocks that have not been included m
previous traces, giving preference to the blocks with h]gher execution counts.
Purti/ion the trace by identifying chque separators and add the
partitions at the end of the Fmw[ionbst.
Parfi/ionLisf
endwhi le
for each partition Pin
)’ar(/uonLm
do
Construct the mferj%rence graph for P.
Co/or the graph using some graph coloring heunstlc.
Combine the cotorings with already processed partitions which involves:
renunm,g colors for the current partition; and
possibly generation of copy rode at merge and join points.
end for
Fig. 3.
Overview
of register
allocation
is assumed. Details
about the individual
in subsequent
sections.
A program
is partitioned
by selecting
flow
graph
consists
and finding
of a sequence
clique
steps in the algorithm
separators
of basic blocks
using clique separators.
traces
(paths)
in each trace
along
are explained
through
[Fisher
an execution
path.
the
1981].
control
A trace
The statements
belonging
to a trace are examined
in sequence,
and clique separators
in the
trace are identified.
After partitioning
one trace, another
trace is chosen, and
the
partitioning
process
basic
is only
Each
live
block
variables
start
of the
blocks
that
execution
program
at the
trace.
are
path.
spends
is repeated
included
beginning
The
the
entire
program
in one trace.
When
partitioning
of a trace
resulting
not
only
The
selection
until
partitions
connected
in different
are assumed
to
of traces
parts
of the
contain
each
is
typically
constructed
earlier in the selection
process account
the program’s
execution
time than those selected
but
based
[Fisher
of better
quality
code for parts
time is spent. Branch
and merge
during
register
allocation
Consider
a situation
partitioned
entire
using
segment
clique
can
on
separators.
An
from
of the
points
optimal
optimal
the
1981].
basic
along
The
an
time
a
traces
for a greater percentage
of
later. The order in which
is the same as
results
in the
program
of traces
by incorporating
live spans that
where
a segment
of straight-line
be constructed
lie
all
at the
from
also
the partitions
are processed by the register
allocation
heuristic
the order in which traces were found. This process of ordering
generation
execution
a trace,
to be defined
statements
other,
program
is partitioned.
register
solutions
where
more
are handled
overlap the traces.
code has been
allocation
for
each
for
the
of the
partitions.
This is a highly
desirable
characteristic
since compilers
for the
current
generation
of superscalar
processors,
in an effort to exploit
instruction-level
parallelism,
create large segments
of straight-line
code through
program
transformations
such as loop unrolling
[Dongarra
and Jinds 1979]
ACM Transactions on Programming Languages and Systems,Vol. 16, No. 3, May 1994.
376
R. Gupta et al.
.
and in-line
expansion
[MacLaren
register
allocation
expense
of code quality
resulting
other
from
traces
partitions.
partition,
across multiple
portions
of a name
generated.
This
load
First
consider
the
point
This
results
that
nodes
the
in subgraphs
start
forming
after
the
the
clique.
spans
in
point
the
is
points
in
register
(i.e., it is
code into
code seg-
interference
clique
that
Furthermore,
from
overlapping
from
these
these
are included
If we divide
the
the
The
a separa-
the clique
graph
and spans
are not
overlap.
connected
The
subgraphs,
are
clique
interference
do not
in both
that
represents
the
subgraphs
subgraphs
At any
spans
graph.
end before
into
nodes
which
code at each of the
Since
a span
of the
can appear
subgraph
into
in any number
subparts
of these
does not
smaller
subgraphs.
As a result,
partitioning
using
cause register
allocation
to be more time expensive.
The
registers
it
possible
each resulting
partition
will
contain
a single
statement.
The
graph corresponding
to a partition
will contain
all values live at
partitioning
above
maximum
the
out independently.
in the code segment
of the
of spans
graph.
colors
to different
or join
any
straight-line
are several
a clique
separator
interference
separators,
interference
point.
there
consisting
among
assigned
span
by renaming
be assigned
at branch
can be carried
removal
as the
a clique
divides
of partitioning
of
to a name
SEGMENTS
allocation
to any program
is because
by an edge,
that
problem
register
by
corresponding
values
before
to the next
are also introduced.
INTO
in a code segment,
represented
tor.
TRACES
for which
given
moving
is not
code partitions
are processed
register
must
span
at the
Thus,
goal is achieved
introduced
instructions
better
earlier,
one partition
same
registers
for
of a name
and store
from
the
This
copy code is typically
4. PARTITIONING
ments
partitions.
If different
code. If a portion
spilled),
to assign
code
later.
of execution
proceeds
branches,
processed
processed
allocation
span,
partitions
probability
is made
to a partition.
If the code contains
for
partitions
high
As register
assigned
the
for
with
an attempt
extending
1984].
can be achieved
problem
number
can be fixed
can be avoided
of cliques,
to a small
subgraphs
in which
n vertices
is divided
constant
a span
into
chosen
by choosing
as separators,
(say,
can occur
clique
the
cliques
in which
c). Thus,
the
then
this
in proportionally
all
separators
will
carefully.
The
a span can occur
maximum
is c + 1. If the entire
m subgraphs,
subgraphs,
result
number
graph
each subgraph
of
containing
on average
will
m
is large,
the subgraphs
will
be
contain
(c + l)n/m
nodes. Assuming
significantly
smaller
than the interference
graph for the entire program.
One
approach
for partitioning
is to first partition
a code segment
assuming
c has
the value
one. If a code partition
we can further
two.
We can continue
are sufficiently
very
few
separators
ACM
partition
small.
separators
are found,
Transactions
this
is larger
to increase
Our
the value
experience
are found.
in practice.
on Programmmg
than
code segment
of c until
has shown
However,
Thus,
Languages
the maximum
by increasing
that,
and Systems,
work
Vol
value
size,
of c to
all resulting
partitions
if c is chosen
to be one,
if c is chosen
in this
acceptable
the
to be two,
frequent
we use an algorithm
16, No. 3, May
1994
that
Register Allocation via Coloring Using Clique Separators
uses the value
maximum,
two for c. However,
The example
spans
presented
{b, i, f) and that
code segment
two
into
subgraphs
segment
that
after
of the
example
colored
independently,
entire
the
we
cannot
To
end,
CLIQUE.
the
members
the
member
separator
= {a, e},
formed
POST
{b, i, O, the
tive
separators.
CLIQUEP
at least
ing
it.
one
In
also
keeps
the
current
that
point
and
checks
node
that
addition
for
the
CLIQUE
beginning
one
clique
The
the
spans
of
The set POST
with
at
least
4, at the
the
The
spans
CZIQVll
members
from
one
point
at
following:
clique
iff it can be divided
that
at that
set
of the
contain
= {b, i, f).
and
contains
overlap
sets
to
POST,
the
in Figure
is {i, H, and
no span
in
of the
separator
into
PRE
disjoint
do not
graph
the
current
as a separator.
for
the
partition.
The
COLORING
two
sets
PRE
or POST
segment
If the
Figure
that
our
number
consecu-
are
contains
and
fashion,
( MAXSIZE),
algorithm
in
than
preceding
above
is {b}.
more
a code
subgraphs
maximum
is summarized
set CLIQUEPos~
in
a separator,
in
a certain
the
appears
interference
separators
size
ALLOCATION
from
SPanS from POST
do not overlap
spans
PRE and POST
are nonempty.
For the
present
exceeds
separators
code
of branches
PRE,
PRE
set
but
such
coloring
in the
presence
or not.
least
three
in choosing
the
is not
of the
is chosen
5. REGISTER
that
to locating
partition
the
a trace
example
is chosen
sets
ensures
that
to obtain
In
are
OST ~
the
Furthermore,
track
and
set CLIQUEp~~
to ensure
begun,
CLIQUEPo~~,
and
The above condition
nonempty
= {g, c},
~ and
yet
the
The
subgraphs.
a node not colored.
in the
occurs,
of CLIQUE
OVerlaP SPanS from
from
CLIQUEpR~,
clique
not
Thus,
{b, i, O
by members
CLIQUEPE
have
live
and
graph.
subgraphs
whether
at
with
the
code
are
The
namely,
The
overlap
or interfere
that
sets,
clique.
that
no branches
as a separator
either
a single
resulting
the
in
we determine
current
but
the
renaming.
code
three
sets,
CLIQUE.
set
through
the
be chosen
of the
ended,
spans
of the
the
sets
the
are not colored,
contains
PRE
updating
should
already
CLIQUE,
which
and
in
the
into
segment
from
are
However,
scan
code
combined
if there
colorings
we
for
of the
divides
is divided
subgraphs.
are
before,
the
of the
consists
spans
excluded
both
colorings
the
separators,
program
have
in
the
before
are
in both
be combined.
By examining
contains
that
their
combine
the
in the
end
included
that
graph
regarding
that
a certain
separator
graph
interference
segment
mentioned
always
constructing
point
spans
code
are
As
always
identify
the
and
can
4. The
information
the
a separator
c = 2. This
interference
{b, i, ~ is included
graph.
colorings
the
the
separator
shown,
for
the
Figure
Thus,
begin
members
the
period.
that
in
only
4 contains
hence,
reaches
377
point.
the condition
parts;
shown
at that
in Figure
satisfies
two
represents
during
spans
if the size of a partition
the code is partitioned
.
succeedalgorithm
of spans
in
the clique
at
constructs
the
sets
5.
ALGORITHMS
USING
CLIQUES
When
partitioning
a trace,
interference
graphs
are constructed
for each
partition,
one at a time. The graphs are colored, and the results are combined
with
graphs
of adjacent
partitions.
ACM Transactions
A new trace
on Programming
Languages
is selected,
and Systems,
and partitioning
Vol
of
16, No 3, May 1994.
R, Gupta et al.
B1
f=l
f=l
e=f+5
i=e+f
a=i*2
b=a*7
~~~~~
-
“:.’ ““.’ ::!”:
;%
a=i*2
,.,
. ~~~ -
b=a*7
g=i+f
if (..) go to L
u
. 1.....-..
g=i+f
. ..x
if (..) go to L
~~~~
~ ;
. . .. v...
g=.1
1
..
.
c
c=b+3
d=zxb,.,
,,.
d
~“
tll
!
‘\
e
(1)
&
a
i
f
b
g
c
h
(ii)
Fig. 4
that
trace
entire
The
may
that
form
of the
code
partitions
be
allows
allocated
renaming
used
in
preceding
and
from
one
register
able
is
algorithm
must
values
we
algorithm,
A
segments.
code
in
register
allocation
the
into
segment
ln
the
of
values
so that
already
situation,
If
the
values,
to
be
the
for
the
held
the
registers
partitions
been
to
allocated
transfer
values
of registers
the
in
they
result
same
have
number
Thus,
Tarjan’s
the
code
then
interference
clique.
of branches,
may
this
live
in
the
code,
segments
presence
is introduced.
number
register
registers
avail-
allocation
and
spill
the
memory.
adaptations
algorithm
to exploit
the
present
succeeding
straight-line
of the
renamed.
another
are
and
For
In
a code
be
to
present
Chaitin’s
register
in one
code
choose
[1982]
separator
of registers
than
remaining
Next
culminates
preceding
registers.
cannot
less
a clique
succeeding
that
Chaitin’s
process
different
both
registers
5.1
This
m program
code.
spans
graphs
are
is performed.
program
Clique separators
the
and
notion
of
two
Chow
of clique
specific
and
coloring-based
Hennessy’s
algorithms,
[ 1984]
priority-based
separators.
Algorithm
allocator
based
on
Chaitin’s
algorithm
is constructed
for a partition
interference
graph
Chaitin’s
coloring
heuristic.
The coloring
heuristic
ACM Transactwns
on Pr.grammlng
is
given
and
removes
in
then
Figure
colored
each
Languages and Systems, Vol 16, No 3, May 1994
node
6, An
using
in
the
Register Allocation via Coloring Using Clique Separators
.
379
Partition Trace {
PRE = POST
START=
= CLIQUE=@
CURRENT
=
first instruction
SIZE = Q PAR TITIONLIST
in the trace;
= @
repeat
for
each
SIZE
span s$,ar, that starts at the current instruction
= S[ZE
CLIQUE
POST
= CLIQUE
= POST
for
do
+ 1;
U { S,lart ]
- { s,,.,, )
each span s,, such thats, has not yet started and s.,.,, overlaps ~i do
POST=
POSTU
{~i }
end for
end for
for each span s,ti that ends at the current instruction do
PRE = PRE U {send)
CLIQUE
= CLIQUE
- { S,,ti ]
end for
for each span si that no longer overlaps a member of CLIQUE
A is marked as belonging to a partition
A all of the spanswith which it interferes are also marked
do PRE = PRE - {s, ) endfor
last instruction in the trace) then
i f (CURRENT=
add partition containing instructions from START to CURRENT to the PAR TITIONLIST
else
if (S[ZE=MAX.WE)
then
add partition contammg instructions from START to CURRENT to the PARTITIONLIST
S[ZE = [ CLIQUE
I ; START=
CURRENT=
next instruction in the trace;
else
if Check ( CLIQUE)
then
add partition containing instructions from START to CURRENT to the PARTITIONLIST
SIZE = I CLIQUE
I ; START=
CURRENT=
next instruction in the trace;
endif
last instruction in the trace)
unt i 1 (CURRENT=
1
Check (
CLIQUE)
CLIQUEPR~
{
and it overhps a span from PRE )
and it overlaps a span from POST
= ( s : s & CLIQUE
CLIQUEPOW
= { s: s E CLIQUE
CLIQUEPOW
=
]
if(PRE#$)A(POST#+)
A ( CLIQUEpRE ~
then return(true) endi f
return(false)
CLIQUEPOST
= ~ )
1
Fig. 5.
graph
number
for which
of colors.
the
number
If no such
Finding
of edges
node
separators.
incident
exists,
then
to the
spilling
node
is less than
is needed.
A span
the
is
chosen to spill, and its associated
node is then deleted from the graph. The
attempt
to color continues,
spilling
again if necessary
until
all nodes have
been removed.
The node chosen to spill is selected based on the number
of
edges connected
to the node and the nesting
levels of the code partition.
The
ACM Transactions
on Programming
Languages
and Systems,
Vol
16, No 3, May 1994
380
R, Gupta et al
.
Allocate Reglstcrs {
Spdt the progmm mto execution traces
repeat
Select a trace that has not been proccsscd
repeat
Fmd a p,art]tlon m the cument trxe
repeat
Construct
Interference
graph for the part]tmn
repeat
Delete all nodes with the number of netghbors less thml the number of rcgls[ers,
If graph not empty then
from the remmning nodes choose one to spdl
Spltl all IIS usesand definitions along this trace
endl f
until
graph ISempty
until
no new spIIls occur
Color the Graph
Combine colors mld generate copy code ]f needed
unt 11 no more parutmns m [he [race
untd nu more traces
until
)
Fig 6.
higher
the number
node it will
considered
partitions
frequently.
Regrster allocation
algorithm
of edges, the more
be possible
to color
other
likely
nodes
based on Chaitm’s
it is that
approach.
after
in the graph.
the removal
The nesting
of the
level
is
in an attempt
to ensure
that
spill
code is introduced
in code
that are not nested inside loops and, therefore,
are executed
less
After all nodes have been removed,
if any spilling
was necessary,
then the graph is reconstructed
using the revised code for the partition.
Once
the coloring
of the graph is successful,
register
allocation
performed
in the
current
partition
is propagated
to other partitions
that
contain
the same
spans
but
colorings
have
not
of individual
been
colored,
partitions
eliminating
by renaming
the
colors.
need
to
Any
spill
combine
the
code for the
partition
is also generated.
This technique
incorporates
spills by storing
the
definition
and loading
the value before each of the uses along the trace being
processed.
In Chaitin’s
algorithm,
a name span was either entirely
spilled or assigned
a register
for the entire
if a definition
is spilled
duration
of the live range. However,
in our algorithm,
along a trace, it is not considered
spilled for any new
trace. This introduces
some copy code at the onset of the trace, moving
the
definition
from memory
to the register
allocated
on the new trace. However,
this amount
of copy code is likely to cost less than the cost of spilling
a span
in all traces.
5.2
Prlorlty-Based
Algorithm
We next describe a priority-based
coloring
algorithm
that uses our partitioning method.
In implementing
a priority-based
coloring
algorithm,
a method
for computing
node priorities
is used. The priority
of a node or span is
measured
ACM
in terms
TransactIons
of the
on Programmmg
savings
Languages
in execution
and Systems,
time
Vol
(TOTfiSAV
16, No, 3, May
1994.
) that
are
Register Allocation via Coloring Using Clique Separators
incurred
Hennessy
more
a register
priorities
often,
and hence,
the total
savings
resulting
from
allocating
to a span is normalized
with respect to the loop-nesting
depth. The
are maintained
to guide
the coloring
of an interference
graph.
coloring,
updated.
An overall
and
381
by its being
allocated
a register
instead
of memory
[Chow
and
1984]. A variable
referenced
inside
a loop body is likely
to be
referenced
During
.
based
when
a portion
algorithm
of a span is spilled,
for global
on priorities
register
is summarized
the priority
allocation
in
Figure
using
7. In
of the span is
clique
this
separators
algorithm,
the
program
is partitioned
into code segments,
and an interference
graph for a
single
partition
is constructed
and colored.
The span priorities
are maintained globally
and updated
as portions
of spans are spilled.
One by one, the
subgraphs
are constructed
and colored. The constrained
nodes in the graph,
those that have fewer neighbors
than the number
of registers,
are colored
last, as they can be colored
no matter
what
colors are allocated
to their
neighbors.
that
During
belong
current
ered.
the
coloring
to the current
graph
The
is colored
node
with
of a partition,
partition
before
highest
only
are examined.
other
nodes
priority
with
is selected
priorities
of those
A constrained
lower
and
priorities
colored
nodes
node in the
are consid-
if a register
available.
If the portions
of the name span represented
by this node
already
been assigned
a color during
the processing
of other partitions,
attempt
is made to assign
is assigned,
and following
duced at appropriate
a currently
selected
partition
registers
the same color. If this is not possible,
the coloring
of the partition,
copy
the
TOTALSAV,
by the
is not
normalized
priority
of a node,
length
of the
which
span.
is the
In the
phase. During
global allocation
frequencies
that
do not differ
value
algorithm
and Hennessy
[1984],
the priority
is normalized
because the global
allocation
phase is preceded
local register
allocation
ables have occurrence
allocation,
frequency
to color
current
of the span is updated.
Overallocation
of
registers
to only those spans for which
TO TMSAV
is positive.
In the algorithm
presented,
developed
by Chow
length
of the span,
another
color
code is intro-
points in the program.
If no register
is available
node, the portion
of the span belonging
to the
is spilled,
and the priority
is prevented
by allocating
is
had
an
the unallocated
greatly,
as the
by the
by the
varilocal
based on the occurrence
frequencies
of variables,
smoothes
out the
differences
in spans during
global allocation.
Thus, the adjustment
of the priority
by the span length
is needed, as a longer range occupies the
register
for a longer period of time. However,
in the above algorithm
there is
no local allocation
phase, and hence, the priorities
are not normalized.
If
several
live
spans
6. PERFORMANCE
have
the
same
priority,
the
shortest
span
is colored
first.
EVALUATION
To evaluate
the performance
of the clique
separator
approach
to register
allocation,
we present
space and time complexities
for the coloring
process
and experimental
results
detailing
the space savings
and cost of the clique
technique.
ACM Transactions
on Programming
Languages
and Systems,
Vol. 16, No. 3, May 1994.
382
R. Gupta et al.
.
Allocate
Registers
{
for
each span do compute priority HMAL,SAV
endfor
unconstrained
~
{ nodes whose degree is less than the number of registers )
constrained
+-- { all nodes that do not belong to unconstrained
set ]
P~tion
the program by identifying the separators wrd
determine the order for processing the partitions.
repeat
graph for the partition to be processed next.
Construct the irueijierence
repeat
with highest priority TOTMSAV.
Choose span lr from constrained
i f lr has more colored neighbors than number of registers then
Spill the portion of span lr in the current partition.
Update the priority TOTALSAV
for lr.
e 1se i f color assigned to Ir in earlier partitions is not av~ilable then
Assign another color to lr and introduce Copy code.
else
Assign appropriate color to lr.
end if
nodes m current partition have been processed
unt i 1 all constrained
unt 11 all partitions have been processed
nodes.
Assign colors to unconstrained
)
Fig. 7.
In the analysis
below,
Priority-based
n is the number
register
allocation.
of live ranges,
and
m is the number
of program
partitions
created by the clique separators.
We assume
to be colored are chosen using a priority-based
scheme.
that
nodes
Space complexity.
The space complexity
of the coloring
heuristic
when
applied
to an x-node interference
graph is 0(x2),
as there can be at most
x(x – 1) edges in the graph.
Since only the interference
graph for a single
code partition,
consisting
of 0( n/m)
nodes,
in time, the space required
by the algorithm
Run-time
The run-time
complexity.
is constructed
at any given
is 0( n2/rn 2 ).
complexity
of a priority-based
point
coloring
heuristic
when applied
to an interference
graph with x nodes is 0(x2),
since
in each iteration
of the loop, one span is chosen, and we may have to perform
x iterations.
The time complexity
of processing
a single
code partition
is
spans. Since there are
0( n2/m2 ), as its interference
graph contains
0( n/m)
m partitions
to process, the run-time
complexity
of the coloring
algorithm
is
O(nz/m).
This time does not include
the time for partitioning.
In the above analysis,
it is assumed
that the subgraphs
resulting
from
partitioning
are constructed
and colored
one at a time. An approach
for
further
speeding
up the coloring
process is to construct
the graphs and color
them in parallel.
However,
there will not be any savings
in storage,
as all
graphs
would
have
various
priority-based
Table I.
ACM
Transactions
to be constructed
simultaneously.
register
allocation
approaches
on Programming
Languages
and Systems.
Vol
16, No
The complexities
are summarized
3, May
1994
of
in
Register Allocation via Coloring Using Clique Separators
Table I.
Complexities
of Priority-Based
Sequential
0(n2)
o(?z’/rn2)
O(n2)
The
clique
technique
implemented
with
(m processo;sl
O(n2)
Chaitin’s
C on a Sun
approach
3/50
and
described
Workstation.
In
the
exception
of
program
towers
two
to
of the clique separator
approach,
the exhauscoloring
algorithm
in which
one interference
to investigate
of programs
the
had
in
order
the entire
program
is constructed
was also implemented.
use the same heuristics
in selecting
nodes to color and
Results
of experiments
approaches
for a sample
towers
Parallel
O(n2/m2)
using
in
analyze the space performance
tive technique
using Chaitin’s
graph
for
techniques
Approaches
0(n2/m)
separator
3 was
383
With partitioning
Without
Partitioning
Space
Time
Section
Register Allocation
.
Both
spill.
the space performance
of the two
are given in Table II. The programs,
program,
procedures,
contained
the
results
one
of
procedure.
which
are
The
presented
separately.
The
experiments
interference
produced.
Table
registers,
thus
column
in
the
exhaustive
sizes
clique
be
program.
On
method
more
exhaustive
was,
on average,
program.
The
graphs
have
graph.
These
allocated
results
number
fourth
than
used,
of
of
first
number
generated
lists
clique
labeled
“Average
between
the
all
technique,
the
maximum-size
of the
than
graphs
smaller
graph
times
also
Thus,
the
than
the
be
that
the
same
same
clique
in
graph
for
use
number
of code
by
the
program
the
since
for
the
entire
separator
each
savings,
reclaimed
quality
by
the
generated
cliques
memory
can
for
the
graph
using
smaller
found
in
the
found
constructed
constructed
colored
than
a significant
storage
graphs
that
entire
once
by
the
another
of registers
were
produced
when
was
generated.
of
experiments
these
in
was
the
experiments.
registers
column
the
was
difference
and
code
the
subgraphs”
column,
percentage
size
of the
two
the
schemes.
the
times
indicate
approaches
of
gives
the
technique
smaller
largest
experiments
set
allocation
The
been
were
Another
II,
size
five
more
in both
no spills
the
results
fourth
table,
that
“Clique
of the
of
numbers
the
column
using
the
the
exhaustive
Table
than
approach.
labeled
in
In
program
constructed
gives
considerably
average,
are
second
sizes
quality
unlimited
code.
entire
in
the
cliques.
from
were
the
the
given
the
and
an
spill
column
“Savings,”
difference
assuming
and
The
the
methods
for
for
graphs
using
seen
method
need
graph
with
constructed
can
the
subgraph
column,
constructed
graph
As
last
results
interference
average-size
The
two
the
algorithm.
of
the
program,
interference
the
graph
of the
to determine
by
displays
name
of
the
II
the
by
size.”
designed
generated
eliminating
is the
of nodes
with
were
graphs
gives
ACM
that
the
Transactions
The
are
number
performed
presence
of
compare
Table
second
available
of
to
spilling.
to
spills
on Programmmg
column
the
for
Languages
in
the
Table
register
each
III
register
III
Vol
the
gives
allocators,
technique.
and Systems,
two
presents
These
and
the
the
results
16, No 3, May 1994.
384
R. Gupta et al,
.
Table 11, Comparison
Nodes in
Entire
program
Program
Matrix
Spilhng
Average
size
4,5,5,4,8,6,7,9,5>6
1, 17,5, 10,15,23,8,8
3,7, 11,2,7
6
11
6,28,5,2,
10,14,13,14,
12,
92,20,10,4,42,20,
15,14,12
2,4,5,4,3,6,
15,
34,7, 17,7, 17
74
Multiply
without
Clique subgraphs
34
57
26
6
11
233
Sheve
Bubble Sort
Changer
Towers of hanoi
Hanoi proc
FFT
of Techniques
Savings
(%)
6
11
6
6
11
19
74
59
58
0
0
60
10
54
indicate
that both techniques
performed
about the same in terms of generating spill code. In all but one case, the same number
of spills was generated.
In
that one case, the clique approach
generated
one more spill; however,
since
heuristics
are being used in both cases, the exhaustive
method
could also
generate
more spills. Importantly,
these results
indicate
that the quality
of
code produced
by the two methods
is similar.
Table III also gives the time that it took both allocators
to execute on the
longer
programs.
execution
time
allocators
were
These
of the
run
the total
time
detailed
analysis
that
numbers
clique
five times,
it took
were
technique.
To
and the results
each allocator
of the timings
generated
were
to execute.
for the clique
to determine
determine
averaged.
Columns
separator
the increased
these
numbers,
Column
the
5 gives
6–9 give a more
approach:
The times
given in column
6 are for partitioning
the program
into cliques, in column
7
for building
the interference
graphs, in column 8 for coloring
the graphs, and
in column 9 for spilling
registers.
From the timings,
the clique separator
took
more time to execute,
basically
due to the partitioning.
The percentage
of
increased
execution
time
ranged
from
34 percent
execution
time
of the clique approach
to the
to 96 percent.
Approximately
for the clique
technique
was spent
exhaustive
approach
60 percent
of the
in partitioning.
The clique
technique
took less time to build the graphs, less time to color, and less time
to spill than did the exhaustive
approach.
From these results,
it is clear that,
although
exhaustive
overhead.
the time
to allocate
using
approach,
it is a practical
cliques
may be more than
approach
and can reduce
for the
memory
7. CONCLUSION
We have presented
a technique
for allocating
registers
using coloring
that
avoids construction
of the interference
graph
for the entire
program
code.
Graphs
for smaller
portions
of the code are constructed
and then colored
independently.
The colors of adjacent
graphs are then combined,
leading
to a
coloring
of the entire graph. By using this method,
the memory
demands
for
coloring
are dramatically
reduced.
Experimental
results
indicate
that the
ACM
Transactions
on Programming
Languages
and Systems,
Vol
16, No 3, May
1994
Register Allocation via Coloring Using Cilque Separators
Table III.
Program
Name
Clique Performance
Number of
Registers
Bubble Sort
Method
7
8
Matrix
Multiply
9
10
FFT
9
10
11
13
largest
subgraph
interference
using
cliques
graph
due
Chaitin
Clique
Chaitin
Clique
4
4
3
3
3.33
4.88
3.39
4.85
Chaitin
Clique
Chaitin
Clique
Chaitin
Clique
Chaitin
Clique
8
9
6
6
4
4
1
1
26.60
51.72
24,86
44.82
24.96
45.22
25.25
44.10
than
need
five
to better
color
graph
is expensive
more
time
and
expensive
Spill
0.98
0.26
0.00
0.67
0.24
0.00
1.02
0.25
0.00
0.68
0.28
0.00
1.34
1.15
1.31
1.17
0.40
0.47
0.46
0.48
0.02
0.02
0.00
0.00
11.84
9.98
11.40
7.55
11.46
7.66
11.94
7.68
2.52
2.44
2.07
1.95
2.18
1.97
2.01
1.91
0.10
0.08
0.06
0.04
0.06
0.02
0.02
0.02
2.89
2.80
32.62
26.91
27.09
25.82
times
smaller
the time
code. It
performance
based
Color
2.28
and coloring
the
BuildGraph
2.33
that
constructing
to partition
lead
approach,
is about
(average of 5 samples in seconds)
Partition
2.35
3.23
2.48
3.32
would
clique
Total
2
2
1
1
approach
the
Timing
of
We also demonstrated
is higher
to the
385
in the Presence of Spilling
Chaitin
Clique
Chaitin
Clique
constructed
graph.
Number
spills
.
the
number
algorithms,
such
the
the entire
is expected
when
on the
than
to allocate
interference
that
the
heuristic
of nodes.
as the
entire
registers
clique
used
Using
to
the
optimal,
may
presented
in
be
possible.
Although
paper
program
to determine
in a flow
graph
separators
one at a time
Although
version
of register
graphs
the
et al.
basic
block
flow.
technique
is to consider
the basic
1989].
approach
In this
boundaries,
taking
At a divergence
independently,
searching
into
of control
for clique
this
blocks
clique
account
flow,
the
separators.
merges, a separator
for one of the paths is found (the one
to execute), and the other path uses this clique separator.
we basically
code partitioning
in
approach
of control
are considered
another
used
[Gupta
across
and merging
paths
When control flow
that is more likely
quentially,
were
another
are determined
the branching
different
traces
cliques,
have
advantage
allocation
considered
performing
of the clique-based
can be performed.
can be performed
can all be constructed
and
in parallel
colored
register
approach
allocation
is that
Once the traces
for each
in parallel.
are found,
trace,
and
Renaming
se-
a parallel
then
would
the
the
have
to be done to adjust the colors of the registers.
We are currently
developing
a
parallel
algorithm
for allocating
registers
using the clique approach
in order
to investigate
using cliques
its performance.
We are also considering
the performance
when more expensive
algorithms
are used for coloring.
ACM
Transactions
on Programming
Languages
and Systems,
Vol
of
16, No. 3, May 1994.
386
R, Gupta et al.
.
ACKNOWLEDGMENTS
We would
ments.
like
to thank
We would
constructive
Jiyang
also like
comments
Liu
to thank
for
assisting
Lori
Pollock
on an earlier
version
in performing
and the
of this
the
reviewers
experifor their
paper.
REFERENCES
CHAITIN,
G. J.
SIGPLAN
1982.
82
Register
Symposium
allocation
on
and
Compiler
spilling
via
Construction.
graph
coloring.
SIGPLAN
In Proceedings
Not
(ACM)
17,
of the
6 (June),
98-105.
CHAITIN,
P. W.
G. J., AUSLANDER,
1981.
Register
M. A.,
CHI, C, H, AND DIETZ, H. G.
Annual
Hau,cuL
CHANDRA, A. K.,
allocation
via coloring.
1988.
InternatLonat
Register
Conference
COCKE, J., HOPKINS,
Cornput.
Lang
allocation
on System
M.
E., AND MARKSTEIN,
6, 1, 47-57.
for GaAs computer systems. In 21st
vol. 1 (Jan ). IEEE Computer
Sczences,
Society, Washington,
D. C., 266-274
1990. Automatic
construction
of sparse data flow
CHOT, J -D., CYTRON, R., AND FERRANTE, J
of the ACM
Symposl u m on PrmcZples
of Programmmg
evaluation
graphs. In Proceedings
Languages.
ACM, New York, 55-66.
CHOW, F. ANn HENNESSY, J
1984.
Register allocation by priority-based
coloring. In ProceedSIGPLAN
Not (ACM) 19, 6
ings of the SIGPLAN
84 Symposium on Compiler Construction.
(June), 222-232.
DONGARRA, J. J, AND JINDS, A. R.
1979.
Unrolhng
loops in Fortran. Softw. Pratt.
Exper.
9, 3
(Mar.), 219-226.
IEEE
FISHER, J. A, 1981. Trace scheduling:
A technique for global microcode compaction.
Trans. Comput.
C-30, 7 (July), 478-490.
GAREY, M. R. AND
JOHNSON,
D S.
1979
Computers
and
Intractabil@:
A GuZde to the Theory
of
Freeman, San Francisco, CalIf.
GAWUL, F, 1977, Algorithms
on chque separable graphs. DLscrete Math. 19, 159-165.
GOLUMBIC, M. C, 1980. AlgorlthmZc
Graph
Theory and Perfect Graphs.
Academic Press, New
York.
GLPTA, R,, SOFFA, M. L. AND STEELE, T.
1989. Regzster allocation
via chque separators. In
NP-Completeness
Proceedings
of SIGPLAN
89 Conference
(June). ACM, New York, 264-275.
LARUS, J. R. AND HILEYNGER, P, N. 1986.
on Programming
Language
Design
and
Implementa-
tion
Proceedings
of the
SIGPLAN
86
Register
Symposium
on
allocation
Compiler
in the SPUR
Constructmn.
Lisp compiler
In
ACM, New York,
255-263.
MACLAREN, M. D
1984. Inline routines in VAXELN
Pascal. In Proceedings of the SIGPLAN
SIGPLAN
Not. (ACM) 19, 6 (June), 226-275.
Symposium on Compiler Construction.
TAKJAN, R. E, 1985. Decomposition
by clique separators. Discrete Math. 55, 2, 221-231
Received September
ACM
TransactIons
1990; revised May 1993; accepted June 1993
on Programmmg
Languages
and Systems,
Vol
16, No 3. May
1994
© Copyright 2026 Paperzz