Paper Piano version 7

Integer-Based Wavetable Synthesis for
Low-Computational Embedded Systems
Ben Wright and Somsak Sukittanon
University of Tennessee at Martin
Department of Engineering
Martin, TN USA
[email protected], [email protected]
Abstract— The evolution of digital music synthesis spans from
tried-and-true frequency modulation to string modeling with
neural networks. This project begins from a wavetable basis.
The audio waveform was modeled as a superposition of formant
sinusoids at various frequencies relative to the fundamental
frequency. Piano sounds were reverse-engineered to derive a
basis for the final formant structure, and used to create a
wavetable. The quality of reproduction hangs in a trade-off
between rich notes and sampling frequency. For lowcomputational systems, this calls for an approach that avoids
burdensome floating-point calculations. To speed up
calculations while preserving resolution, floating-point math was
avoided entirely--all numbers involved are integral. The method
was built into the Laser Piano. Along its 12-foot length are 24
“keys,” each consisting of a laser aligned with a photoresistive
sensor connected to 8-bit MCU. The synthesis is processed in the
sensor array by four microcontrollers running a strictly
synchronous wavetable synthesis algorithm. In the laser array is
integrated a microcontroller that can toggle each laser, allowing
the piano to play itself or limit playable keys.
A. Background Review
Throughout the years, music synthesis has branched forth
from many different techniques. FM synthesis is classically
favored for its simplicity but is limited in signal realism. The
Karplus-Strong algorithm is favored for its rich harmonic
content and authenticity particularly to the transient responses
of plucked and struck strings. In [1], Karplus-Strong
polynomials are used to define zeros to selectively cancel
harmonics from a traditional Karplus-Strong transfer
function. The time-domain convolution associated with this
method, however, requires more RAM than may be practical
for a low-computational system. In [2], the authors proposed
an impressive system to synthesize music by modeling the
vibration of a string based on Scattering Recurrent Networks
with very accurate results. To implement this system,
however, would still be far beyond the capacity of a typical
small embedded system. Techniques like these have been
combined in [3] by a control structure that dynamically
selects and combines synthesis techniques to benefit from the
advantages of each. This control structure could prove
invaluable to a system synthesizing a wide range of timbres
and pitches but would be unnecessary for a system with
sufficiently limited scope.
For embedded systems, wavetable synthesis offers an
attractive combination of realism and speed. In this paper, a
lightweight implementation of wavetable synthesis is
discussed. Section B will discuss the mathematics explaining
the components of the wavetables. Part II will examine the
music theory used to derive them and the algorithm coded to
wrap it all up. Part III will cover the applied results.
B. Mathematics and Music Theory
Many periodic signals can be represented by a
superposition of sinusoids, more conventionally called
Fourier series. Eigenfunction expansions, a superset of these,
can represent solutions to partial differential equations (PDE)
where Fourier series cannot. In these expansions, the
frequency of each wave (here called a formant) in the
superposition is a function of the properties of the string. This
is said merely to emphasize that the response of a plucked
string may need a model more complete than a Fourier series
to accurately capture its timbre as shown in equation (1).
∂2 u
∂2 u
u(x,0) = f (x)
(x,0) = g(x)
The function
represents the displacement of a
vibrating string as a function of position
along the string,
, and time .
represent the
initial displacement and speed, respectively. d’Alembert’s
solution [4] to this PDE is given in the form
u(x,t) = Φ(x − ct) + Θ(x + ct) ,
which describes the wavespeed
of travelling waves.
describe travelling waves that move in different
directions at speed . A solution of this PDE is given by
Initial conditions
are used, by an appeal to
orthogonality, to derive a pair of coefficients
each eigenfunction. There exists one eigenfunction for each
eigenvalue . These eigenvalues need not be associated nor
even countable. Each eigenfunction is called a formant.
These formants are not as easily determined for a
generalized eigenfunction expansion as they might be for a
Fourier series. These formants are assumed to relate to the
fundamental frequency of each note in identical fashion. A
vibrating string, for example, should produce the same timbre
(i.e. quality of sound as described by its formant structure)
even as it is tuned higher or lower. It was assumed that the
relations of formants to the fundamental root would fit into
the model of a diatonic scale as described by Western music
theory. This limited the formants to a set of the most
significant few that provide a skeleton into which the
formants can fit. If not perfectly accurate, this assumption
proved a fair approximation, and simplified the enforcement
of periodicity. Adding formants at eigenvalue frequencies
extends the fundamental period of the superposition. The
wavetable is not easily truncated because it is important that
no discontinuities exist in it. However, the wavetable must be
small enough to fit in the limited RAM capacity of a small
Once the pattern of formant relationships and gains was
recognized, the waveform was reconstructed in MATLAB (as
shown by the code in Fig. 4a, and plotted with the ADSR
envelope in Fig. 1) using a superposition of waves with a
similar mapping of frequencies and amplitudes. Multiple
MATLAB files were written to attempt a simulation most
like on-chip synthesis as possible. Every note was sampled
from an array of 256 8-bit numbers and amplitude-modulated
according to the ADSR (Attack, Decay, Sustain, Release)
envelope as illustrated in Fig. 2.
A. Waveform Construction
How to produce a desired timbre could be a subject of
considerably deep inquiry. Applying an idealized partial
differential equation to such a pursuit would be difficult
enough, but easier still than so modeling a string with
considerations for its manifold non-uniformities. The
limitations of this implementation make work this precise
needless. For this design, spectral analyses of professionallyrecorded piano notes were studied as a first step toward
reverse-engineering piano sound.
To lessen RAM consumption on-chip, a wavetable only
large enough to capture the note’s fundamental frequency
was used. To maintain periodicity, formants were chosen that
satisfied periodicity within this window, most notably the
note’s perfect fifth and compound major third. This way there
are no discontinuities in the final wave.
Fig. 2. A set of notes plotted in MATLAB to illustrate the effect of the
amplitude modulation on the repetitive waveshape. Each enlarged section
shows the signal within 50-ms-wide cuts.
The amplitude modulation proved to be more important to
the final sound than was initially expected. The use of an
ADSR envelope to modulate the amplitude of the waveform
provided the striking attack characteristic of a piano note.
Intuitively speaking, this envelope could be just as important
to other instruments, particularly to drums and woodwinds.
B. Firmware Implementation
A program was written in C, using Codevision C compiler
[5], to synthesize a set of simultaneous piano notes on an
embedded system. A timer interrupt function was used to
synchronize the wavetable sampling. The code controlling
amplitude modulation was written inside an interruptible
loop. The output, a superposition of all notes, was output via
a byte register into a DAC, from which an analog signal was
filtered and sent straight to speakers. The flowchart is shown
in Fig. 3b.
Fig. 1. A MATLAB plot of the waveform and ADSR envelopes. The
waveform shapes the timbre of the notes and the ADSR further controls the
amplitude of each note to realistically depict its attack and decay.
A timer built in to the microcontroller increments a byte
register every 32 clock cycles. Each time this register
overflows, an interrupt subroutine is called and the counter is
reinitialized to tune the frequency of such calls. Every timer
interrupt, an 8-bit number for each note is sampled from an
index in the wavetable, then amplitude-modulated according
to its position in the ADSR table. The sum of these numbers
is normalized and output to the DAC.
Fig. 3. (a) A system diagram of the piano hardware, (b) Flowchart describing the piano synthesis algorithm. Two counters increment each interrupt to
synchronize ADSR timings for both damped and undamped notes.
These indices are incremented every so often; those indices
controlling position in the wavetable are incremented
according to the frequency of each note. The indices for
higher notes, then, step through the wavetable faster than for
lower notes. The indices controlling a note’s position in the
ADSR table is incremented slowly enough to happen within
the interruptible loop.
Critical to this program’s success was the tuning of the
notes. The amount by which to increment each index for the
lower frequency notes can get muddled by 8-bit precision as
these increments become small for lower notes and a higher
sampling rate. Floating-point math, on the other hand,
proved to process very slowly. Instead, a 16-bit unsigned
integer was used to represent each index, but scaled up by
256. This yielded tuning more than precise enough for this
application while demanding much less CPU time than
floating-point math.
The indices for the ADSR table, in contrast, were fairly
simple to process. These indices, also 8-bit, were
incremented by one every so often according to the length of
the note, rather than by a certain amount every timer
interrupt. The increment timings were scaled down from the
interrupt frequency by incrementing a counter in the
interrupt function that the interruptible amplitude-modulating
code would check. Also within this loop, checks are made
concerning user input and which notes should start and stop.
To achieve the best performance is to strive for the best
sampling frequency while leaving enough CPU time to
handle amplitude modulation and sampling user inputs.
Traversing the ADSR table is not as important to the user’s
ear as synchronously traversing the wavetable. Output from
the wavetable must be synchronous. This is why the
amplitude modulation is interruptible but wavetable
sampling is not. In addition, every interrupt must consume
the exact same amount of CPU time to remain synchronous
and consistent. For this reason, the use of control statements
was avoided within the timer interrupt code. The use of
Boolean numbers in formulae allowed the complete
avoidance of if conditions. In this way, a logical 0 or 1 can
be used as a coefficient just like a numeric 0 or 1. For
example, a number to be incremented if a condition is true
may always be incremented without an if condition, even
when that means the number is incremented by zero.
The final product was a laser piano synthesizer. Along its
12-foot length, an array of lasers (each 5 mW 650 nm)
shoots along the floor into an array of photoresistive sensors,
one for each of its 24 fully independent notes, as Fig. 3a
shows. Full 24-note polyphony from C4 to B5 is processed
by four microprocessors (AVR ATMega644), each
governing its own range of 6 notes. Overclocked with 27
MHz piezoelectric crystals, each chip does its job
consistently at a sampling rate just above 10 KHz. The
synthesis algorithm uses two separate ADSR timings. A note
is stepped through the ADSR more slowly as long as the user
input corresponding to that note stayed present. This way a
laser blockage that remains causes a note to sustain longer
like holding a key down on a real piano. Each sampling
period, the amplitude of the superposition of synthesized
notes is output to an 8-bit DAC (DAC-08CN).
In the laser array is integrated a microcontroller
(ATMega32) that can toggle each laser. Programmed with
the Harry Potter theme, the Imperial March, and Für Elise,
the piano can effectively play itself. This microcontroller is
also programmed with a set of scales to make playing the
piano easier. Because this microcontroller could not source
enough current to all 24 lasers, the lasers are sourced by
Darlington pairs the MCU controls.
The sensors were constructed using photoresistors
aligned with plastic tubes and wired in series with a static
resistor. Each photoresistor has high resistance when
darkened and lesser resistance when light is shone upon it.
The sensor circuit uses voltage division to sense the
difference between a laser shining and a blockage. The series
resistor was selected to maximize the difference between the
maximum and minimum voltages across the photoresistor. It
can be shown the best series resistance is given by the
geometric mean of the highest and lowest resistances across
the photoresistor. The low and high voltages were reliably
distinguishable as TTL logic so the sensors could be wired
directly to tri-state pins on the chip.
The DAC selected outputs a current signal, so all the
DAC output signals are added simply by wiring them to a
common node. This current signal from the DACs was
filtered by a parallel RC filter with cutoff frequency 1500 Hz
meant to smooth the discontinuities in the DAC output
signal. Finally, a series capacitor was added to normalize the
voltage output to the speaker and eliminate popping.
Even quite complicated timbres can be reproduced with
wavetable synthesis. On systems with more computational
power, more complicated string modeling techniques can
better be appreciated and produce higher quality sound.
There is much left to do modeling responses of musical
instruments for the purpose of synthesizing them. There may
be formants at frequencies lower than the fundamental
frequency of the note or outside diatonic scales. When
formants are not as limited, it can quickly become important
to sample from a larger waveform envelope to ensure
periodicity of the output signal. It is important to implement
these with an algorithm that does not fail to keep wavetable
output synchronous while keeping ADSR timings steady.
Using integer calculations wherever possible may sacrifice
some precision, but for considerable savings in CPU time.
Fig. 4b shows the envelopes initialized as 8-bit integral types
(unsigned char in C). These methods proved effective in an
implementation on an 8-bit embedded platform. Similar
techniques may be effective on larger systems. Another step
to make such an algorithm more effective could be to add
transient effects, e.g. for synthesizing the music of an
acoustic guitar, the “scratch” when a string is plucked. At
some extent, however, it may become more efficient to
pursue a different algorithm.
The authors would like to thank Robert Reeves for his
discussion and help in this work.
[1] I. A. Cummings, R. Venugopal, J. Ahmed, and D. S. Bernstein,
“Generalizations of the Karplus-Strong Transfer Function for
Digital Music Sound Synthesis,” in IEEE Proceedings of the
American Control Conferences, 1999, pp. 2210-2214.
[2] A. W. Y. Su and L. San-Fu, “Synthesis of Plucked-String Tones
by Physical Modeling With Recurrent Neural Networks,” in IEEE
Workshop on Multimedia Signal Processing, 1997, pp. 71-76.
[3] S. D. Trautmann and N. M. Cheung. “Wavetable Synthesis for
Multimedia and Beyond,” in IEEE Workshop on Multimedia Signal
Processing, 1997, pp. 89-94.
[4] D. L. Powers, “The Wave Equation,” in Boundary Value
Problems and Partial Differential Equations, 6th ed., Academic
Press, 2009, pp. 229-231
Fig. 4. (a) Code snippet depicting waveform construction in MATLAB, (b) envelope initialization in C generated by MATLAB for the synthesizer firmware.