Install and optimize the DISSCO system to run on the SDSC Comet

XSEDE Oct-Dec, 2016, Quarterly Report
Project Title: DISSCO, a Digital Instrument for Sound Synthesis and Composition
PI Name: Professor Sever Tipei, The University of Illinois at Urbana-Champaign
Allocation End: 6/13/2017
Project Team: Alan Craig, Paul Rodriguez, Robert Sinkovits
Objectives:
Overall
Install and optimize the DISSCO system to run on the SDSC Comet system and parallelize the
code to, ideally, run it in at least real time (the amount of time for computation not to exceed the
time of the performance) depending on the complexity of the underlying data / composition.
The objective in the workplan for this quarter was:
1) Determine feasibility, data design, and implementation strategy, for MPI level
parallelization. Understand overall code logic and thread level parallel logic, identify all
coding changes or enhancements, potential data structure conflicts, communication
requirements, for some initial small MPI tests.
Overall Update:
An extension was applied for and received which extended the allocation to 6/13/2017.
Rodriguez continued running some evaluations of thread level information flow. Working with the PI,
he was able to test out several approaches to changing code that would separate the processing into
different parts that could make it amenable to an MPI implementation. It is hoped to minimize overall
program data and coding changes, at the same time make an efficient MPI implementation.
Reviewing Code and Determining an Implementation Strategy for MPI parallelization.
The DISSCO program is organized around several classes (and corresponding source libraries) that
separate the main musical (data) organization and I/O, from sound synthesis, and from supporting signal
processing routines. The main programs (CMOD) sets up a musical “piece” and the parameters for all
sound events, and the hierarchical organization of sound events. CMOD interfaces with the sounds
synthesis classes (LASS) by creating a “score” and passing sounds to the score class. In turn, the score
methods renders the actual sounds.
Currently the score class use pthreads to process the individual sound renditions. In the first phase of
the workplan we benchmarked performance and found simple ways to improve the pthread waiting and
parallelization with better waiting on share memory queues. Another bottleneck found was the process
to compose sounds runs in its own thread and also accesses the queue. We were previously able to
improve performance by 30-50% simply changing how the composition waited for the memory queue to
unlock. In this quarter, we also attempted improving performance by adding another thread for
composing sounds. However, after several attempts, there was always a segmentation fault, due to
memory problems with the resulting (non shared-memory) score track data structures, likely because of
incorrect time parameters for combining sounds. In other words, the program makes assumptions
about current data structures within a score class, and adding a second composition would entail
unraveling all those assumptions.
Rather than get stuck on the pthread logic, we started looking at MPI solutions. For an MPI
implementation there are 2 evident approaches to break apart the processing, each with pros/cons:
(1) break the score into parts that can run across different compute nodes for hybrid
MPI/pthread execution
(2) replace the pthread multicore with only MPI parallelization
1. Breaking the score into parts hybrid MPI/pthread strategy.
The score class renders individual sound events and builds up a composite sound wave in which
timing and amplitudes are coordinated. The advantage of breaking up the score is that the
individual sound processes are already distributed among threads, so one could use a hybrid
MPI/pthread parallelization, where the MPI is course grained parallelization, and pthreads are fine
grained parallelization. The disadvantage of this is that many composition elements are
implemented at a fine grained level of sound elements, thus the MPI implementation would need to
access those elements and have correct information (such as timing and amplitude).
In some coding experiments, I was able to split the score into two parts according to the number of
sounds that are to be synthesized, and show that it is possible to recompose them together – a kind
of serial MPI version. For this to work I had to expose several methods within score, track, sound
classes so that the higher level class of ‘events’ could access the composition methods (see Figure 13 below). The figures below are shown using the Audacity sound wave editing program.
2. Replace pthread execution with MPI parallelization.
The current score methods distribute individual sounds elements to separate cores using a shared
memory scheme and asynchronous processing through mutually exclusive locking/waiting. This
could be re-programmed into an MPI parallelization, which has the advantage that the overall
course grained program flow and logic would remain intact. The disadvantage is that the fine
grained processing, at the level of sound rendering and composition would require large number of
programming changes to work with MPI communication instead of a shared memory scheme (of
implicit communication). The difficulty is that sound elements are processed through other classes
within the score to handle multi-track, track, sound wave samples, and one has to be careful to
communicate all the information needed to composite final sound waves correctly.
We are planning to set up coding experiments to explore the feasibility and trade off with option 1.
Efforts:
(indicate percentage worked in the last quarter, October-December 2016)
a. Paul Rodriguez – (25%)
b. Sinkovits - (0%)
c. Alan Craig - foster communication, and interface with other XSEDE staff (5%)
Kudos Received:
Upon hearing the real time usage achieved so far the PI commented:
Additional Questions for Consideration:
* Is this project ready to highlight in a symposium talk? – No
* Is there any advanced topic documentation I need to prepare? – No
* If the PI was satisfied, have I asked for a quote regarding the value of the collaboration? – I haven’t
solicited a quote.
* Would others benefit from a tutorial (asynchronous or other) on the techniques I used in this project?
– not sure yet.
* Should I present about this work at a domain conference (paper, poster)? – not sure yet.
* Is the work worthy of a paper, either in collaboration with the PI or individually? – not sure yet.
* Are there new user requirements that have resulted from this work? Would a change in XSEDE¹s
offerings make this project much easier? If so, feed info UREP - no.
* Is it a good time for Nancy and Ralph to check in with the PI? – no, not yet.