System Description Document

System Description Document
Table of contents
1.
2.
Introduction................................................................................................................. 3
1.1.
General................................................................................................................ 3
1.2.
Overview............................................................................................................. 3
1.3.
Reference documents .......................................................................................... 4
Software functional description .................................................................................. 5
2.1.
MPICH................................................................................................................ 5
2.2.
Ox........................................................................................................................ 5
2.3.
Description of ox2mpich.dll ............................................................................... 5
2.4.
The different functions in the dll file .................................................................. 6
2.4.1.
Init ............................................................................................................... 7
2.4.2.
Finalize........................................................................................................ 7
2.4.3.
Comm_Size................................................................................................. 7
2.4.4.
Comm_rank................................................................................................. 8
2.4.5.
Get_processor_name................................................................................... 8
2.4.6.
Wtime.......................................................................................................... 8
2.4.7.
Reduce......................................................................................................... 8
2.4.8.
Bcast............................................................................................................ 9
2.4.9.
Send............................................................................................................. 9
2.4.10.
Recv .......................................................................................................... 10
2.4.11.
Probe ......................................................................................................... 10
2.4.12.
Iprobe ........................................................................................................ 10
2.5.
Installation program .......................................................................................... 10
2
1. Introduction
1.1.
General
The System Description Document (SDD) provides a brief summary of the hardware
components and software programs used in the ParaDiOx. It will also give a more
detailed description of how the software programs are linked together. The main purpose
of ParaDiOx is to split a time consuming calculation in Ox to a number of smaller
calculations and distribute them trough the network to a number of computers which can
process the calculations at the same time and then return the answers to the main
computer.
1.2.
Overview
The ParaDiOx is based on the requirements described under headline 1.3 Demands in the
Preliminary Project Specification. The systems that can be built upon ParaDiOx consist
of one “master” computer and several “slave” computers, which are all connected via a
network. All calculations are controlled from the master computer. The slave computers
only act unconditionally (they will get an input from the master, process the input
(calculate) and then send an output to the master).
Consider the following example (figure 1.2-1): A master computer is going to calculate
an equation, using a parallel algorithm. It sends a part of the calculation (the red message)
to the first slave and when the slave has finished his calculation, the master will receive
the answer from it. Similar messages are sent to the other slave computers, which in turn
will process their calculations at the same time. The answers are then post processed by
the master. This will enhance the calculation time (but in this trivial case, the network
communication will eat up all time we have won in enhanced calculation performance).
3
Figure 1.2-1
1.3.
A trivial parallel calculation example.
Reference documents
The following documents contain additional information about the software used in
ParaDiOx:
•
Preliminary Project Specification. This document explains the demands, purpose
etc. in an early stage of the project. The document can be found at:
http://www.nada.kth.se/projects/proj02/paradiox/ (2002.04.26)
•
Ox Documentation. This document contains detailed information about Ox. The
document can be found at:
http://www.nuff.ox.ac.uk/Users/Doornik/doc/ox/index.html (2002.04.26)
•
MPICH Documentation. This document contains manuals and documentation
about MPICH as well as further documentation about Message Passing
Interface (MPI). The document can be found at:
http://www-unix.mcs.anl.gov/mpi/mpich/ (2002.04.26)
4
2. Software functional description
Paradiox is built on two third part products, called MPICH and Ox. The MPICH is the
foundation that has been modified to manage the requirements needed to do distributed
calculations for Ox.
2.1.
MPICH
MPI is a library specification for message passing, proposed as a standard by a broadly
based committee of vendors, developers and users. MPI was designed for high
performance on both massively parallel machines and on workstation clusters. MPI is
widely available, with both free and vendor-supplied implementations.
2.2.
Ox
Ox is an object-oriented matrix language with a comprehensive mathematical and
statistical function library. Matrices can be used directly in expressions, for example to
multiply two matrices, or to invert a matrix. Use of the object oriented features is
optional, but facilitates code to be re-used. The syntax of Ox is similar to the C, C++ and
Java languages. This similarity is most clear in syntax items such as loops, functions,
arrays and classes.
2.3.
Description of ox2mpich.dll
Here we describe the functions on a lower level. The ox program calls functions in the dll
file and the dll file interprets that information and makes a new function call to the
MPICH software (in turn, MPICH manipulates that into the MPI standard and sends it
across the world of connected MPI clients). In other words, this dll file is used to connect
the Ox environment to the MPICH environment. As said in earlier sections, this dll file
can be further developed in order to include more functions. As a result of that, this
document will only concern itself on existing function as of this date.
A general note of information is that all functions must be defined in a certain way.
Otherwise, the Ox programs will not be able to make a function call to them.
5
void OXCALL Init(OxVALUE *rtn, OxVALUE *pv, int cArg)
{
//do stuff
}
Instead of receiving and returning values in the “ordinary” way, used by languages such
as Java and C, the two arrays rtn and pv are used. The third argument, cArg contains how
many elements the incoming array contains. The incoming values can be found in the pv
array, whereas the rtn array is used for returning purposes.
As explained in the user guide, the arguments are not the variables themselves but rather
the addresses to them. This can be cleverly used in the dll file when you want to return
values, since all you have to do is to write to the same address found in the incoming
array!
2.4.
The different functions in the dll file
As in the function reference for the Ox programming section, we list and describe each
function on the dll level one by one. If you need more capabilities, all you have to do is to
implement such functions, rebuild the dll file and replace this enhanced version with the
current dll file.
The syntax we have chosen to use is to make the dll file transparent to the ox
programmer. For example, if the general MPI command is called “MPI_Init”, then we
have named the function at the dll level to be simply “Init” and this function calls the
MPI_Init in the MPICH software. Then when you call this Init function from the ox
program, you import the external function Init as “MPI_Init” and that way you have the
same syntax as “before” the dll file.
6
We describe the general ideas behind the functions here in this document. More precise
comments on exact syntax and other details can be found in the source code of the
ox2mpich.dll file.
2.4.1.
Init
As specified in the MPI standard, two addresses must be sent as arguments to the
initialization function. We have only implemented to send dummy addresses, which have
no useful meaning. If you create a more advanced program, which requires these
arguments, the dll file must be developed.
2.4.2.
Finalize
As you can see in the source code, this function is very straightforward. No arguments,
no nothing. Just call the MPI_Finalize.
2.4.3.
Comm_Size
Here is the first example of making use of the rtn array. The function itself returns the
number of computers in the domain, since it is mapped to the MPI_Comm_Size function
described in the previous chapter.
void OXCALL Comm_size(OxVALUE *rtn, OxVALUE *pv, int cArg)
{
int numprocs;
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
OxInt(rtn, 0) = numprocs;
}
As in the ox code, the address of the integer is sent to the MPI system and it manipulates
the value at that address. That way we simply return the value after calling the
Comm_Size function.
7
2.4.4.
Comm_rank
This function works the same way as the Comm_Size function. The only difference is
that it calls another MPI command and returns that value in the return array.
2.4.5.
Get_processor_name
This function returns a string of characters instead of an integer. Hence, the return array
cannot be used and that is why we store the processor name at the address of the
incoming variable in the pv array instead.
OxValSetString(OxArray(pv,0), processor_name);
2.4.6.
Wtime
This function returns the double value returned by the MPI_Wtime function. Since it is a
double, it can be returned in the rtn array instead of manipulating the incoming variable.
double wtime = MPI_Wtime();
OxDbl(rtn,0) = wtime;
2.4.7.
Reduce
This function is used for "reducing" arguments into one, using the specified operation. As
explained in previous chapter, the addresses of the buffers, the root and the operation are
provided by the calling unction.
First we have to find out what kind of operation is specified in the
Incoming String. An evaluation and if-else combination takes care of that problem and
stores the actual operation in a variable for usage in the sending stage.
8
When the operation is known, we check what kind of data is provided. Depending on the
result of this if-else combination, different strategies are chosen. If memory needs to be
allocated dynamically (string, matrix and array), memory is handled and freed etcetera.
When everything has been set, the message with correct size of the data and world
definition etcetera is sent.
2.4.8.
Bcast
This function is used to broadcast the incoming message. In order to do that, we must
first parse what kind of message that is being broadcasted. The two functions
OxLibCheckType and OxValType do this. Then we check the result with if statements. In
the current release, we can manage int, double, ox_matrix, ox_array and ox_string.
When we have the type it is simply a matter of sending the information in the correct
way. In the int and double cases, it just is to specify the data, the size of it, who sends it
and where.
In the other three cases, it is more complex a procedure to send the data. Since the size of
the matrices etc is unknown, one has to loop through the data structure and allocate
memory and send the data after that. We have limited this function to handling doubles
only in the cells of the array/matrix, since that probably is what needs to be broadcasted.
2.4.9.
Send
This function works in the very same way as the broadcast function. The only two
differences are that a receiver is specified, instead of sending to the entire domain, and an
information tag is also required to be sent along with the message. This information can
then be used by the receiver in order to find the desired message.
9
2.4.10.
Recv
This function is a mirror function of the send function. The only difference is that this
receiving function specifies what source has to have sent the message.
2.4.11.
Probe
This function makes it possible to probe the environment for messages sent by the send
functions. It waits until a message has been detected before continuing. It simply uses the
MPI_Probe command with correct arguments received in the pv array from the ox
program.
2.4.12.
Iprobe
The only difference in the Iprobe function from the Probe function above is that it merely
checks whether there is a message to be received and then returns an integer to the user.
If there is no new message, the return value will be zero.
2.5.
Installation program
ParaDiOx is distributed as installation programs. The program that is used for creating
these installation programs is GP-install that is a free to use product. The installation
program contains a program written in Visual Basic (VB) that adds a user to the local
machine.
10