Automatic Distribution in Pangaea

Automatic Distribution in Pangaea
André Spiegel
Freie Universität Berlin
Institut für Informatik, Takustraße 9, D-14195 Berlin
[email protected]
Abstract. Pangaea is a system that can distribute centralized Java programs,
based on static source code analysis and using arbitrary distribution platforms,
such as RMI or CORBA, as a backend. Pangaea takes the idea of distribution
transparency one logical step further: both the decision for an appropriate distribution strategy for a program, and the realization of that strategy on a particular
distribution platform, are accomplished not only transparently, but also automatically. An important benefit of using static source code analysis for automatic
distribution is that it can detect optimizations which would be impossible for a
purely run-time based approach.
1 Introduction
Pangaea1 is a system that can distribute centralized Java programs automatically. Based
on static source code analysis, Pangaea first takes an abstract decision how a given program should be distributed, such that certain requirements and optimization criteria
are fulfilled. The distribution strategy thus obtained indicates which objects should be
placed onto which node, when and how object migration is to be employed, etc. Pangaea then realizes that strategy by transforming the program’s source code for a given
distribution platform, trying to make good use of the abilities and features of that platform. Pangaea can therefore be considered a distributing compiler: the source language
is pure Java, without any restrictions or added constructs, the target language is the
distributed Java variant of the chosen distribution platform.
Pangaea is targeted for situations where programs need to be distributed that were
written as centralized applications. For example, Pangaea can be used to split large web
applets into a client and a server part, so that they can be executed on small hand-held
devices. Another application area is parallel computing: Pangaea allows the programmer to formulate concurrent algorithms in a centralized fashion, i.e. using threads but
without any concern for distribution-related issues. These issues are then taken care of
automatically by Pangaea after the program is finished.
Pangaea is currently in its implementation phase. In this paper we give an overview
of the system and describe some of the results we have achieved so far.
1
Pangaea is the name of the ancient continent in which the entire landmass of the earth was
centralized until about 200 million years ago [10], when it broke up and drifted apart to create
the distributed world we know today.
2 Distribution and Static Analysis
Pangaea does not depend on any particular distribution platform, it is rather able to make
use of the capabilities and features of arbitrary platforms. To estimate the use of static
distribution analysis, it is therefore wise to use an ideal model of distribution, which
extrapolates the technology of today into the future, even though distribution platforms
of today, such as RMI or CORBA, only implement part of that model. When a concrete
program is to be distributed on a concrete platform, Pangaea decides individually what
capabilities are offered by the platform, and how they can be used.
2.1 An ideal model of distribution
We say that a program is centralized if all of its run-time objects reside in a single, centralized address space. To distribute the program means, for us, to place these objects
onto a set of loosely-coupled machines. The distribution has no impact on the execution logic of the program: a sequential algorithm will still be sequential after it has been
distributed, while a concurrent algorithm that has been written using threads (which,
in the centralized case, is executed by time-slicing) can run truly in parallel. Interactive, client/server type applications are often purely sequential programs; one does not
distribute them to achieve parallel execution, but to use them in inherently distributed
settings such as the Internet.
The distribution platform enables the objects to communicate with each other across
machine boundaries, using remote method invocations and possibly remote field access.
Without loss of generality we also assume that objects can be created remotely; on
platforms like CORBA [4] and RMI [9], where there is no explicit remote creation
facility, it can easily be simulated.
To distribute non-trivial programs efficiently, mobility mechanisms are also indispensable, i.e. mechanisms for migrating, replicating, or caching objects. We can distinguish two fundamentally different kinds of such mechanisms. Synchronous mechanisms
are mechanisms where the act of changing an object’s location is tied to the control flow
of the program, i.e. it is carried out each time when the execution reaches a certain point
in the code. Examples for this are explicit migration statements in the code (such as in
JavaParty [6]), or more structured techniques such as passing objects by value [7] or parameter passing modes like pass-by-move and pass-by-visit, known from the Emerald
system [3]. An asynchronous mechanism, on the other hand, consists of an entity within
the run-time system which monitors interactions between objects and, based on its observations, changes the locations of objects asynchronously, e.g. to reduce the number
of remote invocations. Few Java-based platforms currently provide such a mechanism,
but one example is the FarGo system [2].
2.2 The benefits of static analysis
From a software engineering perspective, the benefits of handling distribution transparently and automatically are obvious, because the complexity of distributed programming is reduced. The particular benefit of using static analysis to that end is that it
allows optimizations that would be impossible to detect at run-time: even on an ideal
distribution platform where objects are placed transparently and re-located dynamically
as the run-time system sees fit, static analysis would still be benefitial, if not essential.
Examples for such optimizations include:
– identifying immutable objects (or places where objects are used in a read-only fashion), because such objects can freely be replicated in the entire system, passed to
remote callees by value (serialization) and need not be monitored by the run-time
system at all,
– finding the dynamic scope of object references, discovering for example that certain
objects are only used privately, inside other objects or subsystems, and therefore
needn’t be remotely invokable, nor be considered in the run-time system’s placement decisions,
– recognizing opportunities for synchronous object migration (pass-by-move, passby-visit), which is preferable to the run-time system’s asynchronous adjustments
because such adjustments can only be made after sub-optimal object placement
has already manifested itself for a certain amount of time.
On the other hand, static analysis can of course only yield an approximation of the
actual run-time behaviour. The goal must therefore be to let static analysis and the runtime system co-operate – those decisions that can be taken statically should be taken
statically, while in other cases, it is essential to leave the ground to the run-time system
in an orderly fashion.
3 Related Work
We know of two projects in which static analysis has been used to automate distribution
decisions, one based on the Orca language [1], the other using the JavaParty platform
[5]. Both projects are concerned with parallel high performance computing, while Pangaea also focuses on interactive, client/server type applications.
The Orca project successfully demonstrated that a static analyzer can guide the runtime system to make better placement and replication choices than the RTS could have
made on its own; the performance achieved is very close to that of manually distributed
programs. While Orca is an object-based language that has been kept simple precisely
to facilitate static analysis, Pangaea makes a similar attempt in an unrestricted, objectoriented language (Java).
We consider the JavaParty project as a first step in that direction. The authors acknowledge, however, that their analysis algorithm has not been able to produce convincing results for real programs yet. We believe that some of the problems responsible
for that are dealt more easily with in our own approach; a more elaborate discussion
can be found in [8].
What further distinguishes our work from both projects is that Pangaea handles
other distribution concepts than mere placement and replication decisions (see previous section). Pangaea does also not depend on any particular distribution platform, it
rather provides an abstraction mechanism that can use the capabilities of arbitrary platforms, and has been specifically designed so that it can be adapted to future distribution
technology.
Centralized Program
(100% Java)
Distribution
Requirements
.java
Analyzer
Pangaea
Backend-Adapter
(CORBA)
(JP)
(D’stha)
Distributed Program
(backend-specific)
.java
CORBA
JavaParty
Doorastha
.class
etc.
Executable Program
Fig. 1. The Architecture of Pangaea
4 Pangaea
The architecture of Pangaea is shown in Fig. 1. We will first give an overview of the
system, then cover particular areas in greater detail.
Pangaea’s input is the source code of a centralized Java program. The Analyzer
derives an object graph from this program, which is an approximation of the program’s
run-time structure: what objects will exist at run-time, and how they communicate with
each other (for details, see section 4.1). The Analyzer decides about the distribution
of the program by analyzing this object graph. The analysis is parameterized both by
requirements specified by the programmer, and the characteristics of the distribution
platform to use, which the Analyzer learns from the Backend Adapter for that platform.
The programmer specifies boundary conditions for the desired distribution, working
with a visualization of the program’s object graph. In a client/server type database application he might, for example, assign some user interface objects to the client machine,
and objects accessing the database to the server2 . Obeying these boundary conditions,
2
One might argue that this approach is semi-automatic at best, as the programmer needs to
assign certain objects himself. However, any automatic system requires user input first in order
to start working, and this is precisely what the programmer does in Pangaea.
the Analyzer completes the distribution, e.g. partitioning the object graph so that there
is as little communication across the network as possible. (For concurrent programs,
where load distribution is required, there are slightly different criteria; a discussion is
beyond the scope of this paper.)
When distributing the program, the Analyzer also considers the capabilities of the
distribution platform to use. An abstract view of these capabilities is provided by the
corresponding Backend Adapter. It tells the Analyzer, for example, whether the platform is capable of object migration or replication; it can also answer queries such as
whether a certain class of the program could be made remotely invokable with that
platform or not.
After the analysis has been completed, the Analyzer passes annotated versions of
the program’s abstract syntax trees to the Backend Adapter. The annotations indicate,
for example, which classes should be remotely invokable, or serializable, and which
new statements should become remote object creations. The Adapter then regenerates
the program’s source code, creating a distributed program for the chosen platform. This
may include the automatic generation of interface definition files or configuration files
(for details, see section 4.2). The distribution platform is then responsible to transform
the distributed program into an executable program (including, for example, stub generation and compilation into byte code), and finally to execute the program under the
control of its run-time system, which may also have been fine-tuned by the Backend
Adapter.
4.1 Object Graph Analysis
The algorithm which derives an object graph from the program’s source code is that
part of Pangaea on which all other analyses most critically depend. Our algorithm is
different from other approaches to static analysis, in that it deals with individual objects,
not only the types of those objects. Although the latter is usually sufficient for common
compiler optimizations, such as static binding of method invocations, it is not enough
for distributing programs. We have described the algorithm in detail elsewhere [8] and
must confine ourselves to a rough overview here.
The result of our algorithm is a graph, the nodes of which represent the run-time
objects of the program. There are three kinds of edges between these nodes: creation
edges, reference edges, and usage edges. (We say that an object a uses an object b
if a invokes methods of b or accesses fields of b.) The graph approximates the actual
run-time structure as follows:
– Some of the nodes in the graph do not stand for a single, concrete run-time object, but rather for an indefinite number of objects of a certain type (hence we call
such nodes indefinite objects). For each type of the program there may be several
concrete or indefinite objects in the graph; an indefinite object therefore does not
simply represent all instances of a given type (by which the analysis would degrade
into a type-based analysis), but a certain subset of those instances.
– Reference edges and usage edges are conservative, i.e. the graph might contain
more edges than in the actual run-time structure, but never less than that. The absence of an edge therefore is a safe information, not its presence.
c
new
reference
reference
export C
A
B
a
reference
b
Fig. 2. Exporting a reference in the type graph (left) and in the object graph (right)
– The algorithm treats objects – at least in the final graph – as unstructured containers
of references, abstracting from their internal details. We say that an object a owns a
reference to an object b if, at any time during execution, this reference may appear
in the context of a, either as the value of an instance field, the temporary value of
an expression, etc.
The object graph is created in five steps:
Step 1. Find the set of types that make up the program, which is the transitive dependency closure of the program’s main class, i.e. it contains all types that are
syntactically referenced in the program.
Step 2. Create a type graph, which describes usage relations and data flow relations at
the type level. A usage edge between two types A and B means that objects of type
A may use objects of type B at run-time; a dataflow edge means that references
of a type C may propagate from objects of a type A to objects of a type B , e.g.
as parameters of a method invocation. The type graph is found by simple syntactic
analysis; relations between two types also hold for any subtypes of these types that
are part of the program.
Step 3. Generate the object population of the program, which is a finite representation
of the a priori unbounded set of objects that may exist at run-time. The key idea
here is to distinguish between initial allocations and non-initial allocations in the
program. An initial allocation is a new statement which is guaranteed to be executed exactly once whenever the type in which it is contained is instantiated (e.g.
because the statement appears in the constructor of the type).
We consider the static methods and fields of the program’s classes as static objects
which are created automatically and hence, are always part of the object population. The initial allocations performed by these static objects yield, transitively, the
set of the initially created objects of the program. The existence of these objects
can safely be predicted. For the non-initial allocations, on the other hand, it is not
certain how often, if ever, they will be executed at run-time. For each non-initial
allocation that appears in the type of an object, we therefore add (transitively) indefinite objects to the population. Thus, we obtain the nodes of the object graph,
the creation edges, and, since creation usually implies reference, we also get some
reference edges between the objects.
Step 4. Propagate the reference edges within the object graph, based on the data flow
information from the type graph. See Fig. 2: if the object graph contains an object a
which has references to two objects b and , and if the type graph indicates that the
corresponding type A exports references of type C to type B , then a new reference
edge from b to is added to the object graph. This process is repeated using fix point
iteration. The result is the information which objects “know” which other objects,
and might therefore also use them.
Step 5. Add usage edges to the object graph, in all places where an object a knows
an object b, and the type graph indicates that there is a usage relation between the
corresponding types.
Overall, the algorithm is of polynomial complexity, and therefore also suitable for
larger programs. We have implemented the algorithm completely and tested it on a
range of non-trivial example programs with up to 10,000 lines of code. The results are
promising: after computing times on the order of several minutes at most, the run-time
structure could easily be seen in the object graph, the detail resolution being sufficient
for distributing the programs (see also the case study in section 5).
4.2 Backend Adaptation
Pangaea’s Backend Adapters are still in their planning and experimentation phase. Their
task is, on the one hand, to provide the Analyzer with an abstract view of the capabilities
of a distribution platform; they hide, on the other hand, the details of how to realize a
certain distribution concept on that platform. In the following, we will restrict ourselves
to the simplest case of a platform that only allows remote method invocations and remote object creations, i.e. we disregard any mobility mechanisms in this discussion.
During analysis, the adapter informs the Analyzer whether certain classes may be
made remotely invokable with that platform. If the platform does not allow remote
access to instance fields, for example, the adapter needs to check whether the class
contains any public instance fields. The Analyzer uses this information when deciding
about distribution boundaries.
After the analysis has been completed, the backend adapter re-generates the program’s source code, guided by annotations from the Analyzer, indicating which classes
need to be remotely invokable, and which new statements should become remote object
creations.
A remote object creation is, conceptionally, a new statement with an additional
parameter that specifies the machine to use. In the program, this might be realized as
a preceding call to the run-time system (as in JavaParty), or a call to a factory object
on the remote machine (as in CORBA). The corresponding code transformations are
trivial.
To make a class remotely invokable, however, requires vastly different amounts of
work on various platforms. In JavaParty, it is enough to add a single keyword (remote)
to the class definition, whereas in CORBA, an IDL description needs to be generated
and, depending on the implementation, some changes in the inheritance hierarchy may
be required.
Additionally, the Backend Adapter is reponsible for making sure that the remotely
invokable class has the same semantics as the original class. This is not trivial, since
all distribution platforms that we know of introduce minor semantic changes for remote
invocations. For example, array parameters are usually copied in remote invocations,
but passed by reference in local invocations. Technically, though, it is indeed possible
to maintain local semantics, if, unlike with ordinary middleware, source code transformations are allowed. This is easily seen if one realizes that all distribution platforms
mentioned provide both a remote reference mechanism, and a remote value transfer
mechanism. The local call semantics may thus be implemented in any case. For example, to pass an array parameter remotely with standard semantics, the array must be
encapsulated into a remotely invokable object, which is easily achieved by source code
transformation (to avoid large amounts of costly remote invocations, that object might
also be migrated to the receiver). Only if the analysis can prove that the receiver only
reads the array, and doesn’t modify it, then pass-by-value can be allowed.
Fig. 3. A database for chess openings
5 A Case Study
As an example, we consider a graphical data base for chess openings written in Java
(Fig. 3). The user may move the pieces on the chess board; the program searches a data
base for the name of the corresponding opening, and a commentary on the move, and
displays these on the screen. The data base is a simple text file (at this time, with very
rudimentary opening knowledge, about 25 kByte long); the program itself consists of
approx. 2,500 lines of code in about 40 Java classes.
In order to use the program in the Internet, it is to be distributed so that the graphical interface runs, as a web applet, on a client computer, while the database stays on
the server (if it were of realistic size, it wouldn’t be practicable to download it to the
NameView
Parser
Scanner
Database
CommentView
Cache
Board
MovesView
Position
Board
MoveList
BoardView
MoveList
use (method call)
TurnView
Position
naive
distribution
"frequent" use
(call in a loop)
good
distribution
Fig. 4. Object graph (simplified) with distribution boundaries
client). The optimization criterion for distributing the program is to have the best possible interactive response time. Since the program is purely sequential, this is equivalent
to making as few calls across the network as possible.
Fig. 4 shows, although greatly simplified, the object graph of this program, as Pangaea’s Analyzer actually computes it. Each node stands for an individual run-time object here, the edges are usage edges, indicating method calls. The objects representing
board coordinates or chess moves have already been removed from the graph: Pangaea
realizes that these objects are immutable, and may therefore be treated as values. It is
also visible that there are two instances of class Board at run-time: one of them is the
model of the chess board where the user makes his moves, the other one is used internally by the Parser when interpreting the text file. A purely class-based analysis or
distribution would therefore already be inadequate here.
A naive distribution of the program would be to place the user interface objects (the
View objects to the left) onto the client, and the application logic onto the server. In
this program, however, this leads to very bad response time, because the user interface
objects communicate heavily with the left Board object. Especially the BoardView
object issues, for each move of the user, 64 calls to obtain the current state of the chess
squares.
To estimate the communication volume along the edges, a simple heuristic is already sufficient. If we only mark all edges that represent a method call in a loop as
“potentially expensive”, and avoid drawing a distribution boundary across such edges,
we already obtain the right distribution for this program. It consists of putting the left
Board object, and all objects belonging to its interior, onto the client. As the graph
shows, this also means that only the Database object ever needs to be invoked across
Fig. 5. Distribution via Ethernet (10 Mbps)
Fig. 6. Distribution via Modem (28800 Bps)
the distribution boundary, and must hence be remotely invokable. For all other objects,
this is unnecessary.
The impact on the program’s response time is dramatic. In Figures 5 and 6, the
response time of the program for certain user interactions is shown for a centralized
version, the naive distribution, and the optimized distribution of the program (we created these distributions manually using JavaParty; only very few program lines had to
be changed). The light part of the bars represents the time until the first visible reaction of the program, the darker part is the time until the input event had been completely
processed. It can be seen that the naive distribution, even in a fast network, is noticeably
inconvenient, while with a slow connection, it is simply not acceptable. The correctly
distributed program, however, comes very close to the centralized program’s performance, even on the slow network.
6 Summary and Future Work
The Pangaea system shows that it is possible to distribute centralized Java programs
automatically. On the one hand, this means to employ static analysis to arrive at an abstract decision how a given program should be distributed, so that certain requirements
and optimization criteria are fulfilled. On the other hand, we show that the realization of
this decision on a given distribution platform is a purely mechanical process, which can
also take place automatically. Pangaea can therefore be seen as a distributing compiler:
just like a traditional compiler maps the constructs of a high level language as efficiently
as possible onto an underlying machine architecture, the Pangaea system translates Java
programs for a given middleware layer.
We have so far implemented the algorithm that derives an object graph from the
source code, and the user interface to manipulate such graphs and thus to configure
the program for distributed execution. The code generation stage, currently targeted for
the JavaParty platform, is already working for simple programs, and is rapidly being
improved.
References
1. Henri E. Bal, Raoul Bhoedjang, Rutger Hofman, Ceriel Jacobs, Koen Langendoen, Tim
Rühl, and M. Frans Kaashoek. Performance evaluation of the Orca shared-object system.
ACM Transactions on Computer Systems, 16(1):1–40, February 1998.
2. Ophir Holder, Israel Ben-Shaul, and Hovav Gazit. Dynamic layout of distributed applications
in FarGo. In Proc. ICSE ’99, Los Angeles, May 1999.
3. Eric Jul, Henry Levy, Norman Hutchinson, and Andrew Black. Fine-grained mobility in the
Emerald system. ACM Transactions on Computer Systems, 6(1):109–133, February 1988.
4. OMG. The Common Object Request Broker: Architecture and Specification, Revision 2.0,
July 1995.
5. Michael Philippsen and Bernhard Haumacher. Locality optimization in JavaParty by means
of static type analysis. In Proc. Workshop on Java for High Performance Network Computing
at EuroPar ’98, Southhampton, September 1998.
6. Michael Philippsen and Matthias Zenger. JavaParty: Transparent remote objects in Java.
Concurrency: Practice and Experience, 9(11):1225–1242, November 1997.
7. André Spiegel. Objects by value: Evaluating the trade-off. In Proc. PDCN ’98, pages 542–
548, Brisbane, Australia, December 1998. IASTED, ACTA Press.
8. André Spiegel. Object graph analysis. Technical Report B-99-11, Freie Universität Berlin,
July 1999.
9. Sun Microsystems. Java Remote Method Invocation Specification, February 1997.
10. Alfred Wegener. Die Entstehung der Kontinente und Ozeane. Vieweg, Braunschweig, 1915.
6. Auflage 1962.