Mercury Computer Systems, Inc.

Data Flow in UML
Dr. Jeffrey E. Smith
Mercury Computer Systems, Inc.
[email protected]
Agenda
Model based parallel programming
alternatives
 Focus on framework/UML
Conceptualization

– Data Parallel CORBA
– Data Flow in UML Superstructure
Motivation: From “Portable HP SW for SIP What’s Next”, Lincoln Labs



Moore’s law addresses
computations, not
complexity
In their roadmap for
advancing RT
Embedded Software
Development, they
identified model-based
development and
automated mapping
support as the key longterm technologies
“Blue Jean” datapoint
Development Steps
Develop Model
(GUI + Interpreted Code)
Make Parallel
(Simulator)
Auto-Generate
Application Code
(Cross Compilation)
Methods to Conceptualize/Apply High
Performance Data Flow Applications
Observations


UML doesn’t include consistent model of data
flow … yet … not really
Translate UML diagrams to any source - might be
an avenue of tool support worth exploring
Goals: Component Reuse, Software Productivity,
Leverage Existing Investments & Wider
Programming Base
Requirements and Design
Constructor
(Programmer 1)
Model Behavior
UML
Translate
Graph(ical)
CORBA
SCE
Parallel/DSP
V/P Compilers . . .
Prototypers
Executable Prototype
POSIX-Compliant API
POSIX-Compliant kernel
Profile-Guided Optimization
Executable
Deliverable
Source
Optimizer
(Programmer 2)
Dynamic Compilation Can Provide a Solution
Collect runtime execution behavior
• Memory
usage
• instruction and data caches
• translation look-aside buffers
• Control flow
• branch probabilities
• program “traces”
• Call graphs
• gprof statistics
• Data dependencies
• data-dependent control flow
• Variable values
• value locality
• interprocedural dataflow
•Hardware counters
• pipeline stalls
High-Level Algorithms
Work with
OMG
UML
UML with Data Flow
(Par.)CORBA
Common CASE &
Data-Flow Machine
Development
IDE
1-7 Transforms
Non-Optimized
Low-Level Algorithms
Profile-Guided
Optimizations
Feedback
Optimized
Low-Level Algorithms
Next Steps



Application to IR formation, fusion, template matching
Collect software productivity metrics on above and MITRE
benchmarks
Experiment with optimization of UML transformed (through
data parallel CORBA or specialized data parallel compiler
IDEs) software to efficient embedded platforms

Work with OMG in introducing data flow, in a way
that supports streaming high-performance, dataflow distributed computers

Examine possibility of embedding dynamic profile
optimization into runtime system

Work with CASE and IDE vendor to integrate
model-based development of efficient streaming
high-performance, data-flow distributed computer
targets
Trick is to:
1) Discover common patterns (SCE, PAS, Par. CORBA, …)
2) Feed this forward into standard OMG specs
3) Simplify our own software architectures/APIs
Action Semantics DataFlow
PAS Channel
GlobalDistribution
Pin
partitioningInfo
dimensions
Layout
piece
Distribution
+/availableOutput
0..n
Action +action
0..n
0..1
0..1
+action
0..n
+outputPin OutputPin
0..n
{ordered}
+source
Partition
groupDims
size
memoryLayout
type
shape
processSet
1
ChannelSide
Src
{ordered}
+inputPin
+/availableInput 0..n
0..n
InputPin +destination
1
{ordered}
+flow
BufferSet
Dst
1
1
1
+flow DataFlow
1
Channel
1
1
CORBA Sequence
aClient :
SingularClient
anOrb :
NonParallelOrb
anAdaptor :
PortableObjectAdaptor
aServant :
anApp :
SingularServant
Application
create_instance( )
create_opaque_Reference
associate_servant_with_Reference
convey_Reference( )
connect(Reference)
create_params( )
check_location_and_interface( )
location, key and interface are
part of this reference
invoke(handle, ops, args)
GIOPSend(ops, key,args)
dispatch(ops,args)
Data-Parallel CORBA Sequence
PC :
ParallelClient
PO : ParallelOrb
POC :
ParallelObjectCreator
POF :
PartObjectFactory
PPF : PartFactory
PPA : PartAdaptor
PS : PartServ ant
PSA :
PartServ erApp
create_part_instance( )
ParallelBehav ior
includes
PartObjectRef erence
create_PartObjectRef erence( )
Red f ont signif ies messages that
dif f er f rom the ty pical CORBA
sequence
register_part(whole)
register_part(whole)
get_PartObjectFactory
create_ParallelObjectRef erence(whole)
To collect parts of whole.
ParallelObjectRef erence includes
Location, ObjectKey and
IRinterf aceID
conv ey _ParallelObjectRef erence
connect(ParallelObjectRef erence)
create_params(whole)
inv oke(handle,ops,args)
GIOPSend(ops,args,ObjectKey )
dispatch(ops,args)
Meta-Classes
Object
PartObjectFactory
ParallelObject
SingularObject
LocalObject
PartFactory
Application
PortableObjectAdaptor
Serv ant
create_opaque_Ref erence()
associate_serv ant_with_Ref erence()
GIOPSend(ops, ObjectKey , args)
create_instance() : instance
dispatch(ops, args)
PartObject
Serv erApp
ClientApp
Orb
PartAdaptor
Corba::Current
<<executes in the context of >>
create_PartObjectRef erence()
<<executes in the context of >>
connect(Ref erence) : handle
check_location_and_interf ace()
inv oke(handle, ops, args)
PartServ erApp
<<executes in the context of >>
Client
ParallelCorba
::Current
conv ey _Ref erence()
NonParallelOrb
Interf ace
identif ier
ParallelOrb
SingularServ ant
PartServ ant
create_part_instance() : instance
<<executes in the context of >>
ParallelObjectCreator
SingularClient
PartInterf ace
<<executes in the context of >>
<<executes in the context of >>
get_PartObjectFactory ()
create_params() : ops, args
<<initiates in the context of >>
ParallelClient
<<initiates in the context of >>
create_params(whole) : ops, args
conv ey _ParallelObjectRef erence()
connect(ParallelObjectRef erence) : handle
<<executes in the context of >>
SingularInv ocation
ParallelInv ocation
<<initiates in the context of >>
Inv ocation
Collectiv eInv ocation
<<embodies contents of >>
Only by
parallel client
Data Structures
Reference
ParallelObjectReference
PartObjectReference
1
1
ParallelBehaviorProfile
ParallelRealizationProfile
1
1
ParallelProxyProfile
ParallelAgentProfile
1
ParallelRealization
{is exactly}
1
PartReferenceProfile
1..n
1
1
StandardIIOPProfile
ParallelBehavior
1..n
OperationDescription
{for every arg}
0..n
DataPartitioning
1
1
RequestDistribution
IRInterfaceID
Location
adaptorAddress : integer
ObjectKey
Runtime Associations
PortableObjectAdaptor
create_opaque_Ref erence()
associate_serv ant_with_Ref erence()
GIOPSend(ops, ObjectKey , args)
dispatch
<<colocate>>
reference create/associate
Serv ant
instance create
<<colocate>>
Application
create_instance() : instance
dispatch(ops, args)
PartObjectFactory
register_part(whole)
create_ParallelObjectRef erence(whole) : POR
get parallel reference
PartAdaptor
dispatch
<<colocate>>
create_PartObjectRef erence()
PartServ ant
create_part_instance() : instance
part reference create
<<colocate>>
part instance create
<<colocate>>
register part
PartFactory
convey reference
PartServ erApp
Orb
invoke connection
<<colocate>>
GIOP send
connect(Ref erence) : handle
check_location_and_interf ace()
inv oke(handle, ops, args)
Client
conv ey _Ref erence()
GIOP send
ParallelClient
ParallelOrb
ParallelObjectCreator
<<colocate>>
create_params(whole) : ops, args
conv ey _ParallelObjectRef erence()
connect(ParallelObjectRef erence) : handle
invoke connection
get_PartObjectFactory ()
parallel reference create
Control Flow



Each step is taken when the previous one
finishes …
…regardless of whether inputs are available,
accurate or complete (“pull”)
Emphasis is on order in which steps are
taken
Weather Info
Start
Analyze Weather Info
Not UML
Notation
Chart Course
Cancel Trip
Object/Data Flow



Each step is taken when all the required input
objects/data are available …
… and only when all the inputs are available (“push”)
Emphasis is on objects flowing between steps
Design Product
Acquire Capital
Procure
Materials
Build
Subassembly 2
Build
Subassembly 1
Not UML
Notation
Final
Assembly
UML 2.0 Superstructure RFP Excerpt
Further, the way that objects and other data flow
between parts of a system is crucial to understanding
its architecture. The UML currently supports object/data
flow only at the lowest level of granularity {not even},
between the steps in an activity graph {as well as other
locations, in a contradictory way}. It is important
for architects to be able to model object and data flows
between entities at a higher level of granularity,
such as classifiers and packages {as well as many other
requirements coming up}.
{signifies my comments}
Why bring back data flow
explicitly into UML?





With parallel computation increasingly used to increase
computation speeds, there is interest in linking streaming
data flow machines with a matching modeling paradigm
To bring back data flow standard – developers have been
building unique custom DFDs out of standard UML
structure (patterns) - some CASE vendors added data
flow at model & meta-model level
To link/integrate existing DFD toolsets with UML toolsets
& existing simulators e.g. Ptolemy [Park]
Functional modeling (only third left out of OMT) fits OO
and non-OO modeling paradigm and can be united with
other UML models [SD, DSH]
Currently addressed in piecemeal in UML (shown later),
none of which conform to pre-existing modelers view
(OMT view) of data flow
Why bring back data flow explicitly into
UML (cont)?



Object model defines system components, dynamic
model (state machines) define system control but
functional model (data flow) defines what
computations occur in a system & functional
dependencies between processes
Need expressed in software process/workflow,
defense, medical, wireless and digital video
domains
Example: When response to Action Semantics RFP
was presented in OMG Plenary, diagrams were not
done in UML (were in data flow) - reason given was
it would take too much space in UML
Why bring back data flow explicitly into
UML (continued) ?

Different (yet related) underlying semantics than State Machines
»
»
"is-used-to-produce" relation
Can have consistent parent/child (state/substate) diagram from state machine
point of view that violates data consistency model [TK]
d0:1
d0:4
P1,2
d
d
P1
0:1
d0:2
d0:3
»
»
»
»
»
0:4
d0:2
d0:5
d0:3
P1,1
P1,3
d0:5
Unique inheritance (decomposition) requirement
– Example definition: Let P be a process and D a composition of P. D is
consistent with P iff the I/O relationships that are 1) specified for P must also
hold for D and 2) not specified to hold for P must not hold for D [TK]
Relation to state machines: A trigger (t) of a control process "is-used-to-produce"
a response (r) of the same process iff there is a transition in the STD that is
triggered by t and responds with r [TK].
It is conceptually simpler for some applications – simply a digraph together with a
binary precedence relation.
It is impossible to represent continuous flow, especially with feedback, in a State
Machine because of the theoretically infinite amount of states to represent. This
is a natural modeling view with data transforms.
STDs are sequential within one machine, DFDs are not
Interaction Diagrams (Sequence,
Collaboration)

Different (yet related) underlying semantics
than Interaction Diagrams
» Interaction diagrams are for interaction among objects
» Cannot represent interaction at a lower level (among
methods of different classes)
» Cannot represent interaction among systems
Why aspects of data flow are not
yet supported?






Ambiguous order of input to processes
Considered difficult to unite the semantics of data flow
models with other OO models (other research has
proved this false)
Some DFDs allowed for control flow and control flow is
duplicated in many of the dynamic models
Could be non-deterministic, since not all processes or
data flows are necessarily used to produce the high
level process outputs
No way to represent sequencing, iteration and
conditionals
Considered to be included: but inconsistently and
multiply
Where UML experts think data flow
either exists or would fit?








UML Profile for Enterprise Distributed Object Computing
Activity Diagrams
Collaboration Diagrams
Action Semantics (Data flow is model element that acts
as temporary data store between in and out pins)
Data-parallel CORBA
Using new (data flow) patterns of existing UML
structures
A UML ActivityGraph Profile for EDOC Task Model
(ad/99-10-07)
Object interaction diagram (I assume this option is
merely seconding the Collaboration Diagrams
suggestion)
Initial Requirements Collection






Establish criterion for (automatable) checking of (internal/external)
completeness (well-formed,well-connected,well-introduced,well-rooted)
[TK], consistency [Kung], decomposition (inheritance), boundedness
[Park], determinancy [Park, KM] and termination [Park, KM].
Elementary processes modeled, like Petri nets [Petri], with pre and post
conditions describing the behavior of processes.
Provide ability to express data dimensions and other data properties (in
Class Diagram) and explicit linkage to these from Data Flow Diagram.
Map to rest of UML - Consistent with State Diagrams (events trigger "isused-to-produce" relation), Action Semantics, EDOC [EDOC],
Collaborations, Activity, Class Diagrams (generalization and process
functional dependencies to associations) [Kung], Use Case Diagrams
(actors) [Park], RT UML (ports map to I/O specs) and Deployment
Diagrams (see next 2 bullets).
Provide ability to express parallelization along data dimensions and
mapping to hardware resources in Deployment Diagram. Must express
data distribution types (sequential or parallel) and sub types (round robin,
random even, random statistical, first available, etc.)
Allow ability to specify that arrows in DFD are associated with (virtual)
channels (see Virtual Interface Specification) in Deployment Diagram.
Initial Requirements Collection (cont)






Non-side affecting operations, or previously defined actions, are
decomposed using functional models and these are generally used
at the aggregate level [SD].
Aggregate objects are passed as an input parameter and returned
as an output parameter, allowing a process to access any object
(data stores, object classes, or associations) with the parameter
[SD].
Place all control flow info in a state machine (to solve 4.4 and 4.5)
[SD].
Provide for data store I/O not included in action semantics.
Must be able to model partial objects (multiple partial partitions of
data) described in Data Parallel CORBA Spec.
Provide method to express process synchronization as something
external to processes (as opposed to state machines where this
would be defined in a state) without knowledge of composition
Constraints to
context.
unify behavior,
class & functional
models
Initial Requirements Collection (cont)
for Modeling Streaming Data




Provide a uni-directional data streaming interface with data flow.
Model structured (number of dimensions, extent in each dimension,
packing order & element type) and unstructured global data (number of
data sets, size of data) [DRI].
Model object I/O requirements e.g. support for structured/non-structured
data, dimensions, element types and data partitioning specification (e.g.
indivisible or block type and for each dimension, maximum size, minimum
number of required elements, modulo size, block length, left and right
overlap specs, etc.) [DRI].
Model data stream control e.g. push and pull of data, QoS based on data
control (e.g. rate & latency constraints), control data stream, control
tagged data, etc. [DRI].
Name 2
Properties
Name
Name 1
Properties
Input
Specifications
Properties
Output
Specifications
Data
Distribution
(sub)Type
Name 3
Properties
Global Data
Associate I/O specs with port attributes
Need semantics to model data
Existing Data Flow Semantic Models





Petri Nets [Petri]
Kung, et al [TK, Kung]
Karp and Miller Computation Graphs [KM]
Kahn Process Networks [Kahn]
Parks Bounded Execution [Parks]
Completely different connection in action semantics,
EDOC, Activity Diagrams, Different CASE vendors
References










[BS] D. Bhatt and J. Shackleton, A Design Notation and Toolset for High-Performance
Embedded Systems Development, Lectures on Embedded Systems, LNCS 1494,
Springer-Verlag, VIII, October 1998.
[DRI] Document for the DARPA Data Reorganization Effort, www.data-re.org, Feb 2000.
[EDOC] Cooperative Research Centre for Enterprise Distributed Systems Technology,
UML Profile for Enterprise Distributed Object Computing, ad/99-10-07.
[Kahn] G. Kahn, The Semantics of a Simple Language for Parallel Programming, Info.
Proc., pages 471-475, Stockholm, Aug. 1974.
[KM] R. M. Karp and R. E. Miller, Properties of a Model for Parallel Computations:
Determinacy, Termination, Queueing, SIAM Journal of Applied Mathematics, Vol. 14, No.
6, November 1966.
[Kung] C. H. Kung, Conceptual Modeling in the Context of Software Development, IEEE
Transactions on Software Engineering", 15(10):1176-1187, Oct. 1989.
[Parks] T. M. Parks. Bounded Scheduling of Process Networks Technical Report
UCB/ERL-95-105, PhD Dissertation, EECS Department, University of California. Berkeley,
CA, December 1995.
[Petri] C. A. Petri, Kommunikation mit Automaten, PhD dissertation, translation by C. F.
Greene, Supplement 1 to Technical Report RADC-TR-65-337, Vol. 1, Rome Labs, Griffiss
Air Force Base, NY, 1965.
[TK] Y. Tao and C. Kung: Formal Definition and Verification of Data Flow Diagrams, J.
Systems Software, 16:29-36, 1991.
[SD] S. DeLoach, Formal Transformations from Graphically-Based Object-Oriented
Representations to Theory-Based Specification, PhD thesis, Air Force Institute of
Technology, Wright-Patterson AFB,OH, June 1996, AFIT/DS/ENG/96-05, AD-A310 608.