Piano Transcription using Wavelet Decomposition and

Piano Transcription using Wavelet Decomposition and - VBN

Piano Transcription using
Wavelet Decomposition and Neural Networks
Esben Madsen, Johnni Thomsen Pedersen and Louise Baldus Vestergaard
Group 742, Supervisor: Søren Krarup Olesen
Abstract
1.1
This paper examines the possibility of transcribing the notes
played on a piano using a simple feature extraction and neural networks. Earlier transcription systems using neural
networks have used considerably complex algorithms to handle polyphonic music.
Note Characteristics
Piano notes consists of a fundamental frequency,
F0, and overtones. The overtone partial frequencies are slightly inharmonic [1, p.203]. The piano
is a pitched instrument [1, p.167], and feature extraction of the piano notes should be solved by
some kind of pitch detection. Since piano sounds
consist of several frequency components, pitch
detection should either aim to find the fundamental frequency played, or to extract the significant features of the note. The notes of a piano
spans over a little more than eight octaves, hence
the frequency contents of the seperate tones and
overtones might overlap when notes are played
simultaniously. For a transcription to be usefull
for musicians, the transcription should be represented on a note sheet, hence the time at which
the note would be played is important too.
We have implemented the Daubechies D4 wavelet decomposition in ANSI C for feature extraction and suggested a neural network feed-forward structure for note detection.
88 networks were constructed, one for each note on a piano.
Training of the 88 networks have been done with a reduced
set of training data; each note of one specific piano have been
used. The results achieved were inconclusive; the overall performance are not satisfactory, perhaps due to too few trainingdata or non-optimal wavelet decomposition. From the results we find it acceptable to conclude that it is possible to
use wavelet decomposition as a feature extraction tool for a
neural network, but the ammount of training data must be
greater and other wavelet decompositions should be examined.
Keywords: Piano transcription, wavelet decomposition, neural networks.
1.2
1
Time vs Frequency Analysis
Introduction
Music can be interpreted in the time and/or frequency domain. In the time domain, music is
recorded and played. Detection of notes present
could be achieved with cross correlation methods.
In the frequency domain it can be represented and
understood by e.g. Fourier Transform(FT) [1, p.
21]. FT reveals the frequency contents of a signal,
but it is only well defined for infinite length, stationary, continous sine waves. This is not fully
usefull for music transcription, since the signal
processed is neither infinite nor stationary [1]. Instead a time-frequency representation should be
chosen.
Various solutions have been proposed for creating an automatic transcription system for music composers and musicians in general [1]. As
there are many different genres of music and instruments, most of the previous transcription systems have focused on either classification of instruments [2], detection of notes [3] or rhythm
played [4] or a combined version of these [5]. As
a full transcription system is out of scope for this
research, the aim has been limited to transcription
of one specific piano, with focus on feature extraction.
1
2
System Description
Presentation of the output could be done in various ways. For a fully working system, presentation on a note sheet and/or as a MIDI file would
be desired, but for this research, the presentation
of data was done simply by viewing the output
from the decision/discrimination block.
The constructed transription system can basically
be caracterized as a pattern recognition system,
and thus a nerual network as used in this system
is one of several options.
The transcription system converts a piece of sampled piano music to a representation of notes on
a time scale. The system was divided into the
blocks shown on figure 1.
3
Method overview
In the following the term “pitch” is used. The definition of pitch in this paper is the same as used in
the MIDI protocol. Most musicians would probably call the 88 keys on a piano for A0 to C8, with
A4 having a fundamental frequency of 440 Hz.
Thus A0 would have a fundamental frequency
of 27.5 Hz. Since music is percieved dyadically,
there is uneven spacing between the fundamental frequency of the notes. It is often practical to
have an expression for these with even spacing.
The MIDI definition is:
Pitch = 69 + 12 · log2 (
f
)
440
(1)
A4 now corresponds to pitch 69. The lowest note
on the piano is pitch 21, the highest is pitch 108.
As preprocessing, the music is split into blocks
of 4096 samples, corresponding to a little under
1
10 of a second, since the sampling frequency is
44.1 kHz. The processing of a sound block is illustrated on figure 2, following these steps:
F IGURE 1: Block diagram of the system
1. A block of 4096 samples of the wave file is
picked out.
The two most important blocks were the feature
extraction and the note recognition, and they recieved the most attention in this paper.
2. The block is processed, using the implemented wavelet transform, which is described in 4.
The feature extraction block had the purpose of
compressing the massive amount of information
in the sound data to as few as possible meaningful features, that could be analyzed further. This
block was implemented as a wavelet decomposition, described in 4.
3. 112 predefined coefficients of the wavelet
decomposition is extracted.
4. The coefficients are fed to each of the 88 neural networks, each network charged with
the detection of a specific note.
The note recognition block was used for analyzing the features extracted from the signal. This
was implemented with neural networks, and the
implementation described in 5.
5. Each neural network then present a value
on the output. The value indicates how
likely it is that the note, which this specific network was trained to recognize, was
present in the block.
The decision/discrimination block was meant to
give a reliable binary result based on the note
recognition. This was done simply with a threshold on the output of the neural networks.
2
• The algorithms are very effective – the
complexity of a Discrete Wavelet Transform (DWT) for some algorithms is O(N ).
For other algorithms the complexity can in
worst case be up to O(N · log(N )), like the
Fast Fourier Transform (FFT). [6, p. 40]
• A wavelet expansion (inverse transform)
can give a better description and separation
of local changes than a fourier transform [6,
p. 7], that by definition only can represent a
signal as a combination of sines.
• A wavelet can be designed for a specific
type of signals, and is thus able to contain
sharp corners and represent discontinuities
or sharp corners with only a few coefficients, where the Fourier Transform would
require a lot of coefficients.
F IGURE 2: Data flow through the system
The mother and father1 wavelet pair chosen for
implementation was the Daubechies D4 algorithm, which is used in a wide range of applications, and often used for examples in litterature
on wavelets, due to it being both very simple and
very effective.
The array of neural networks will enable detection of polyphonic piano music.
4
The DWT was implemented using the lifting
scheme2 [7], which improves the complexity by
roughly 50% compared to the standard filter bank
implementation [7, p. 264].
Feature Extraction
For the feature extraction, it was decided to test
whether a wavelet decomposition would suffice,
compared to the more complex methods previously used, like for instance the networks of
adaptive oscillators used by Marolt in [5].
To balance frequency and time resolutions, an input of 4096 samples (with sample rate 44.1 kHz)
is selected, giving an output of the same size, and
the DWT is performed recursively, giving 12 subbands of length 20 to 211 , each band containing
one octave (not necessarily equal to the piano octaves).
The wavelet transforms have some interesting
features compared to other methods of extracting
frequency eg. the Fourier transform.
• In it’s commonly used form, the scale of the
transform is dyadic, that is, each frequency
band contains one octave.
1 The
mother wavelet is the wavelet function, and the father wavelet is the scaling function
scheme is an effective way of implementing a wavelet decomposition
2 Lifting
3
Frequency
range (Hz)
Pitch
(MIDI)
21-43
43-86
86-172
172-345
345-689
689-1378
1.38-2.76 k
2.76-5.51 k
5.5-11.0 k
11-22 k
21-28
29-40
41-52
53-64
65-76
77-88
89-100
101-108
Number of
samples
1
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
Used
samples
A neural network consists of at least an input and
an output layer, and possibly some hidden layers, each containing a number of neurons. The
neural network takes as input a feature set, which
will have to be generated from the input signal,
and via a weighing vector an output result is
achieved.
16
32
32
32
Neural networks are trained prior to recognition.
This training can be supervised, telling the network what the desired output are, given a specific
input. It is also possible to train a network unsupervised. Typically, the training algorithm will
then try to classify the input in 2 or more preprogrammed classes.
Neural networks have a huge advantage when
it comes to runtime complexity. When trained
sufficiently, they are fast to execute and don’t
take up much memory. Due to their nature, a
large amount of training data is needed by supervised learning in order to make the results sufficiently reliable. If the training data doesn’t contain enough “general” information of the class to
be detected, it can respond very well to training
data and not well at all to unknown data. This
fact and the time and computational resources required during training is the foremost drawback.
It is also very difficult to construct a “cookbook”
neural network; a lot of iterations are needed.
112
TABLE 1: Contents of the wavelet decompositions and the used samples
Table 1 illustrates how parts of the wavelets are
selected for further analysis. As the upper 2
bands (5.5-11 and 11-22 kHz) does not contain
much relevant information, disredarding onset
detection [1, p. 108] (the highest fundamental frequency is just above 4 kHz), these are discarded
hereby already limiting the data by 14 . Furthermore, by looking at the transforms of recorded
samples, the scope is limited to only including
the four octave bands from 86 Hz to 1375 Hz, as
these contain the most information/amplitude response overall. Finally, only the first 32 samples
from each of the two bands with 64 and 128 samples are used, giving a total of 112 samples to use
as input to the neural networks.
The major force of neural networks are the effeciency during run-time. Once properly trained,
the execution takes up a very small amount of
memory. In the case of the feed forward network,
there are no need to save any calculations, except
the results needed for the next neuron.
A neuron consists of weighting coefficients for
each input, these are summed, and possibly
biased and an activation function handles the
summed output [8, p. 11].
There are of course other types of wavelets and
scaling functions than the one used, including
Haar, which is the most simple type as well
as more comlex constructions, like the CohenDaubechies-Feauveau wavelet, however a review
of multiple types is out of scope of this article.
5
The activation function can theoretically be any
kind of function, but in actual implementation
3 types are predominant: The threshold function, the piecewise-linear function and the sigmoid function.
Note Recognition
According to [8, p. 14], the sigmoid is the commonly used. As no exact structures of neural networks are mentioned in the method description
of neither [5] nor [9], the sigmoid activation function was chosen.
To detect which notes were present in the piano music, some note recognition was needed.
As previously mentioned, neural networks were
chosen for this task, since they have been used by
others with good results [1] [5].
The network was chosen to have:
4
• supervised learning, since training - and
test data are avaliable
4 million 16 bit samples. It was made sure that
roughly half of the sequences contained the note
that the specific network were to detect. Each sequence consisted of 0 to 3 simultaneously played
piano notes with uniform distribution of both
number of notes played and which notes was
played, not considering the target note. To take
the hammer stroke into account and get more different looking data from the sample set, the 4096
samples to be extracted from each single note
were chosen at random. Each note could be taken
arbitrarily from the first sample to the 25.000nd
sample. Figure 3 shows an example of data for the
network to detect pitch 69. This was done to minimize the risk of overfitting the networks. Each
network was trained with 5 epochs (repetitions)
of training data. Figure 3 shows an example of
data for the network to detect pitch 69.
• multilayer structure, due to the complexity
of the desired system
• feed-forward, fully connected structure
It was chosen to implement a network with the
following neuron - and layer structure:
• 112 neurons in the input layer, for each of
the extracted sample values
• Three hidden layers with 20, 30 and 30 neurons respectively
• An output layer with one neuron
This structure of the network is outlined in the
bottom of figure 2.
5.1
System Development
5.1.1
Data
Extraction of commercially prerecorded piano sequences have been beyond the scope of this article. Instead it has been attempted to use a rather
reduced sample space. For the test- and training
notes respectively, a separate sequence of all 88
piano notes were recorded to a wave file with a
resolution of 16 bit and a sample rate of 44.1 kHz.
Each note was played, keeping the key depressed
somewhere between one half and one second.
Both sequences were recorded using the same piano. As the conditioning of both training and test
data were identical, they will from here on be referred to simply as “the data”. An examination
of the recorded wave files showed a considerable
increase in signal energy at the onset of each note.
This is due to the percusive nature of the hammer
hitting the strings [1, p. 108]. Each note were extracted into its own wave file, containing 30.000
samples, starting 2000 samples before maximum
energy level were reached for each note.
F IGURE 3: Example of 5 consecutive
generated sequences
Separate data were constructed for each specific
neural network, as displayed in figure 4. This
way each network could be trained with their target note present in half of the data. The data contained 1.000 sequences consisting of 4096 samples
of mixed piano notes, totalling a little more than
5
hits are detected. Around pitch 80 the amount
of type I and type II error roughly evens out, but
without much consistency from pitch to pitch. All
in all 79.4% of all errors are Type II errors.
F IGURE 5: Type I (solid red line) and
type II (blue stippled line) errors for
each network
Disregarding whether a given netresponse is considered a hit or miss, figure 6 shows both the least
square error and the mean absolute error.
0.8
F IGURE 4: A flowchart of note extraction and training of the neural networks
0.6
0.4
0.2
0
6
30
40
50
60
70
80
90
100
Results
MatLab have been used for generation and training of the neural networks.
F IGURE 6: Least square error(solid
red line) and the mean absolute error(stippled blue line)
There is a huge difference in how well a given network performs. Figure 5 shows the Type I and
II errors for all networks, where Type I is a false
positive/hits, type II is a false negative/misses.
These errors are taken from a dataset of 1000 sequences. Until around pitch 60, type II errors are
by far predominant, meaning that extremely few
As end result we achieved correct detection of
target note in 33.1% of the actual note occurence
and correct absence of target note in 82.9 % of the
cases. An overview is displayed table 2.
6
Output
1
0
1
33.1%
66.9%
0
17.7%
82.9%
our decision algorithm is rather crude; if a network outputs more than 0.5, we consider it a hit.
A much more plausible method would be to employ a statistical framework. Both based on accumulated a priori knowledge of the frequency
with which the note is played, but also which intervals seems reasonable; an interval of a small
second occurs with extremely lower probability
than for example an octave. Since the outputs
from each network is not binary, these can easily
be weighed to accomodate a different statistical
probability set depending on musicstyle.
Input
TABLE 2: Overview of results – Type I
and type II errors as well as correctly
detected notes.
It should be noted that pitch 71 performs exceptionally well, so it was decided to investigate further. A new training and test sequence was run,
this time with up to 10 different notes present and
10.000 sequences. The note was correctly “detected” in 72.1% of the cases when present and
correctly “not detected” in 97.7% when missing.
7
8
Conclusion
A simplified framework for polyphonic piano
note recognition has been made. The goal has
been to determine whether a wavelet decomposition could be used for feature extraction for a
neural network and this has been achieved only
to some extent. The overall results does not fully
confirm the usability of wavelets for decompostion, but a certain pitch is consequently performing convincingly. Our test results have not shown
whether the decomposition provides an insufficent featureset for the network, the network suffers from non-optimal design or the reduced sampleset used are to blame. Further studies in the
field should examine this issue.
Discussion
Overall the results are ambiguous. The root
cause is considered to be the limited training data
present. In the case of pitch 71, it is well beyond
reasonable doubt that the results are not haphazard. The extremely fine results for pitch 71 could
be caused by a large corrolation between training and testdata. But since it still performs well,
even when using up to 10 simultaneous notes, we
speculate that the main cause is the choice of featureset used. This indicates that the network response relies heavily on what parts of the wavelet
is used. We find it acceptable to conclude that it
is possible to use wavelet decomposition as a feature extraction tool for a neural network. Further
research should focuse on evaluation of best fitting mother wavelet as well as selection of coefficients from the wavelet decomposition.
A D4 wavelet decomposition lifting scheme has
succefully been implemented in ANSI C. It has
been concluded that wavelet decomposition is
very efficient regarding execution speed compaired to the FFT. The actual implementation of
D4 are based on Daubechies own publications
and is more efficient than the default decomposition method, which uses filter banks.
9
The cause of the notable rise in variance for Type
I and II errors on figure 5 can be explained by the
choice of featureset. As seen in table 1, frequencies above 1.4 kHz are not represented. According
to formula 1, this roughly corresponds to pitch 87.
That means that the fundamentals of pitch 88 -108
is not represented in the featureset and that pitch
75 - 87 is only represented by their fundamental
frequency.
Acknowledgements
We would like to thank Uwe Hartmann for an introduction to neural networks.
References
[1] Anssi Klapuri and Manuel Davy, editors. Signal Processing Methods for Music Transcription.
7
Springer, 1st edition, 2006. ISBN 0-387-306676.
tions in Computer Music, Barcelona, Spain,
November 15-17, 2001.
[2] Geoffroy Peeters Perfecto Herrera-Boyer and
Shlomo Dubnov. Automatic classification of
musical instrument sounds. Journal of New
Music Research, 2003.
[6] Ramesh A. Gopinath C Sidney Burrus and
Haitao Guo. Introduction to Wavelets and
Wavelet Transforms. Prentice Hall, 1st edition,
1998. ISBN 0-13-489600-9.
[3] Anssi Klapuri. Automatic transcription of
music. Proceedings of the Stockholm Music Acoustics Conference, Stockholm, Sweden, August 6-9, 2003 (SMAC 03).
[7] Ingrid Daubechies and Wim Sweldens. Factoring wavelet transforms into lifting steps.
Journal of Fourier Analysis and Applications,
1998. http://www.springerlink.com/
content/r0n381423k7v8655/.
[4] Anssi Klapuri and Manuel Davy, editors. Signal Processing Methods for Music Transcription.
Springer, 1st edition, 2006. ISBN 0-387-306676. Chapter 4: Beat Tracking and Musical Metre Analysis by Stephen Hainsworth.
[8] Simon Haykin. Adaptive Filtering Theory. Information and System Sciences. Prentice Hall,
4th edition, 2002. ISBN 0130901261.
[9] Monti Bello and Sandler. Techniques for automatic music transcription. In Proceedings of
the first International Symposium on Music
Information Retrieval (ISMIR-00), Plymouth,
Massachusetts, USA, October 2000.
[5] Matija Marolt. Transcription of polyphonic
piano music with neural networks. Proceedings of Workshop on Current Research Direc-
8
Contents
1 Preface
5
2 General Guidelines
5
3 List of Abbreviations
7
I
8
Analysis
4 External Restrains
8
5 Initial Specification of Requirements
5.1 Platform . . . . . . . . . . . . . . . .
5.2 Instrument . . . . . . . . . . . . . . .
5.3 Harmony . . . . . . . . . . . . . . . .
5.4 Detection Speed . . . . . . . . . . . .
5.5 Success Rate . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
10
10
10
10
6 Detection of Note Onset
11
6.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7 Methods for Detection
nals
7.1 Purpose . . . . . . .
7.2 Methods . . . . . . .
7.2.1 Off-line . . .
7.2.2 On-line . . . .
of Monophonic and Polyphonic Sig.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
13
14
8 Pitch Detection
16
8.1 Approaching the Problem from a Physical Angle . . . . . . . . 16
8.2 Previous Pitch Detection Studies . . . . . . . . . . . . . . . . 17
9 Wavelets and Assement of
9.1 Purpose . . . . . . . . .
9.2 Analysis . . . . . . . . .
9.3 Conclusion . . . . . . . .
Efficiency
18
. . . . . . . . . . . . . . . . . . . . . 18
. . . . . . . . . . . . . . . . . . . . . 19
. . . . . . . . . . . . . . . . . . . . . 20
1
10 Neural Network
21
10.1 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . 21
10.2 Suitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
11 Data Preprocessing for a Neural Network Proposed by Others
23
12 Deciding Method for Further Analysis
12.1 Preliminary Analysis . . . . . . . . . . . .
12.1.1 Statistical Methods . . . . . . . . .
12.1.2 Methods based on Auditory Models
12.1.3 Neural Networks . . . . . . . . . .
12.1.4 Wavelets . . . . . . . . . . . . . . .
12.2 Decision . . . . . . . . . . . . . . . . . . .
13 General Construction
13.1 Purpose . . . . . . . . .
13.2 Analysis . . . . . . . . .
13.3 Sampled Piano Music . .
13.4 Feature Extraction . . .
13.5 NN . . . . . . . . . . . .
13.6 Decision/Discrimination
13.7 Presentation . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
24
24
25
25
25
.
.
.
.
.
.
.
27
27
27
27
27
28
28
28
14 General Thoughts
29
14.1 optimizing the networks . . . . . . . . . . . . . . . . . . . . . 29
14.2 Optimizing the Wavelet Decomposition . . . . . . . . . . . . . 29
14.3 Optimization of the Decision Algorithm . . . . . . . . . . . . . 29
II
Design
30
15 Architectural Considerations for the Neural
15.1 Adjustable Network Elements . . . . . . . .
15.2 Classes of Networks . . . . . . . . . . . . . .
15.3 Layers . . . . . . . . . . . . . . . . . . . . .
15.4 Types of Neurons . . . . . . . . . . . . . . .
2
Network
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
30
31
32
33
16 Overall Architecture
16.1 Wavelet Decomposition . . . . .
16.2 Neural Network Structure . . .
16.2.1 Feature Set versus Pitch
16.2.2 Training and Test . . . .
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Implementation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
36
36
37
37
39
17 Software design
39
17.1 Userguide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
17.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 41
18 Real time considerations
41
19 FANN – Fast Artificial Neural Network library
42
20 Port Audio
50
IV
55
C Source Code
21 Makefile
55
22 main.c
55
23 fileio.h
60
24 fileio.c
60
25 waveread.h
61
26 waveread.c
62
27 wavelet.h
64
28 wavelet.c
65
29 ann.h
66
30 ann train.c
68
3
31 ann test.c
69
32 ann run.c
70
V
71
Matlab Source Code
33 pianocomp.m
71
34 pianomix.m
71
35 featureextraction.m
73
36 NNgen.m
74
37 resultpresentation.m
75
4
1
Preface
Transcription of musical scores has through time been a tedious manual task
to be taken on only by highly trained musicians. As the computer technology
in the 80’s matured to produce the “personal computer”, sporadic research in
automated music transcription suddenly became more focused; a relatively
costeffective platform was now available.
Today there is still room for both improvements and development of the
methods used, as a universal method of transcription has not been discovered.
Not only the instrument(s) to be transcribed, but also the style of music has
a profound impact on the efficiency of a given method.
The (grand) piano is by most considered the “reference instrument”, probably due to conventions inherrited from classical music. It is assumed that
transcription of music played on the piano will have the broadest interest to
potential customers. Based on this assumption, this documentation will analyze potentially efficient methods for transcription of piano generated music
and describe the implementation of one such. This actual implementation
will be called Musician’s Transcription Tool (MTT).
2
General Guidelines
This documentation is to be wieved as a collection of work sheets. The aim
will be to arrange these in a plausible manner, but this will not necessarily
always be the case. It is suggested to use the table of contents to look up
relevant information regarding a subject.
Quotations and references to other works will be put in the footnote on a
given page. Also, a complete compilation of used litterature will be included
at the end of this document.
5
This project has been composed by:
Esben Madsen
Johnni Thomsen Pedersen
Louise Baldus Vestergaard
6
3
List of Abbreviations
ANN
D4
DWT
Dyadic
MIDI
MTT
NN
Artificial Neural Network
commonly called Neural Network (NN)
Daubechies 4-tap wavelet
Discrete Wavelet Transform
Related by a factor 2 (like octaves)
Musical Instrument Digital Interface
Musicians Transcription Tool
Neural Network
sometimes referred to as Artificial Neural Network (ANN)
7
Part I
Analysis
4
External Restrains
Since this is an University Project for 7th semester, there are certain frames
primarily set by the Study Guidelines concerning objectives and documentation.1 The purpose of this semester project is design, implementation and
analysis of a solution to a practically occurring problem, which naturally
requires stochastic signal processing methods and/or transmission of signals.
The project period runs from September 1st to December 19th 2008.
Project is documented in three ways:
• A scientific article
• A poster with presentation on SEMCON 08.
• Edited worksheets, which document the details of the project
From the study guidelines further information states the goals for this project
unit1 :
“
The project unit takes its starting point in a practical problem,
which reflects the students’ chosen specialization, and where signal processing methods and/or transmission of signals is a natural
aspect.
• Through a stepwise refinement process of the given application, a set of specifications are generated. There is no requirement for a real-time implementation (HW and/or SW),
thus the specification can related to the behavioural level only.
However, a real-time implementation is allowed in the projects,
and the specification therefore has to be extended at all relevant points, in case such an implementation is included.
1
esn.aau.dk/fileadmin/esn/Studieordning/Cand_SO_ed_aalborg_sep08.pdf
14
8
p.
• Algorithms for the complete functionality (or parts hereof )
are designed, and are next being applied for 1) a functional
simulation, and possibly 2) a real-time implementation.
• In terms of the design phase, an analysis of the algorithmic
computational and numerical properties is conducted.
• The implementation is next compared to the specification,
and a comparison and evaluation is performed.
”
9
5
Initial Specification of Requirements
The following will be a description of the specific demands desired for the
required base functionality of the MTT. This specification has not been used
for the actual implementation, but serves to document the process.
5.1
Platform
The MTT must be able to run on a PC or laptop with minimum specs:
• 1.8 GHz P4 processor or equivalent
• 1 GB ram
• Soundcard capable of recording and playback in 16 bits @ 44100 KHz
sample rate.
• Either windows XP or Linux installed
5.2
Instrument
The MTT are to be able to detect and transcribe notes played on both
upright and grand pianos.
5.3
Harmony
The MTT are to be able to detect and transcribe up to 10 simultaneous
notes.
5.4
Detection Speed
The MTT are to be able to detect and transcribe notes played with a time
resolution of 50 ms.
5.5
Success Rate
The overall success rate is to be able to transcribe at least 80% correct
detection over a broad range of musical styles.
10
6
Detection of Note Onset
6.1
Purpose
It is assumed that to best detect pitch of a note, a proper placement of said
note in time is needed. The begining of a note is called the note “onset”.
This document will propose different methods for detecting that onset.
6.2
Methods
Effecient onset detection methods vary considerably with the instrument in
question. If an instrument have a large transient at onset (e.g. percussion
instrument, piano and guitar), it is suggested to view the music from a power
perspective2 . The suggested algorithm is:
X
|ST F TxW (n, k)|2
(1)
Ej (n) =
k∈kj
Where:
ST F T is the short time Fourrier transform of x(n)
k
is the discrete frequency index
W
is the window used to weigh x(n)
n
is the time, at which the window is centered
To further optimize equation 1, a three point linear regression is proposed2 . It is of interest to find the gradient of Ej (n) in order to detect the
start of the transient. For the specific three point linear regression, following
equations define this gradient:
Ej (n + 1) − Ej (n − 1)
(2)
3
Ej (n) is the energy envelope function
Where:
Dj (n) is the gradient of Ej (n)
Although this method allows power measurement in distinct frequencybands, it seems rather complex to calculate. It would be a good idea to
compare this method to a simpler method. It would be interesting to deDj (n) =
2
Klapuri & Davy 2006, p. 107 - 109
11
termine if the onset information can be obtained using the broadspectered
signal. This could be done as in equation 3:
n+ 2j
E(j, n) =
X
x(k)2
(3)
k=n− 2j
In both cases a decision algorithm is needed to descriminate onset periods
from rest.
12
7
Methods for Detection of Monophonic and
Polyphonic Signals
7.1
Purpose
To detect the notes in music, it is necessary to correctly identify and, in the
case of multiple notes, separate the fundamental frequencies that the signal
consists of.
The main source for this worksheet is chapter 7 of Klapuri & Davy3 .
7.2
Methods
The methods for detecting the fundamental frequencies (F0) in a polyphonic
signal can roughly be separated in the statistical approach, which this worksheet will focus on, and an approach based on an auditory model. The
auditory model is based on the way human perception and separation of
concurrent sounds is done, and will not be examined further – chapter 8 of
Klapuri3 can be used as a reference on this topic.
The statistical methods can basically be separated into the off-line approaches, which are based on analysis of a constant signal, and on-line approaches, which only uses the current sample or frame to estimate the signals.
7.2.1
Off-line
Off-line methods rely on analyzing a signal, that does not change in the chosen interval (no new or lost notes). As there must be no transition between
notes in the processed waveform, this means, that an onset and offset detection must be made beforehand. The signal is then modeled with a parameter
estimation. Due to the signals being “complete” (no transitions), these methods are very accurate, but also prove rather computationally heavy.
The Bayesian off-line model is mathematical and probabilistic, and it
leads to the simplest model that explains a given waveform. The estimation
of multiple fundamental frequencies (F0’s) is complex and possibly computationally heavy, which is probably the reason this method has not been given
much attention3 (p. 203-204). Often the estimation is a maximum a posteriori (MAP) or minimum mean square error (MMSE) estimation . Apart from
3
Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,
2006
13
signal detection, the model may also be used for source separation (detection
of instruments), compression, pitch correction and other useful applications3
(p. 203-204). A proof of the performance is seen in the article “Bayesian
analysis of polyphonic western tonal music”4 which reports a 100% accuracy
on one F0 and 71% on four fundamental frequencies.
7.2.2
On-line
The on-line methods uses only the current sample or frame in a sampled
signal for the analysis, and therefore has no requirement for a separate onset/offset detection.
The Cemgil on-line processing is a MAP estimation, where the frequencies
are divided in a grid and then a “piano roll”5 estimation is performed for each
frequency in the grid3 (p. 220).
On-line methods based on sliding windows include an approach by Dubois
and Davy, where the signal is percieved as a Gaussian random walk for both
frequencies and amplitude (number of notes can increase, decrease or remain
constant)3 (p. 221-223). Another approach is described by Vincent and
Plumbey: Frequencies are divided in a fixed grid like in the Cemgil model, but
the parameter priors are independent of neighbouring frames. The unknown
parameters are then MAP estimated and finally, the parameters of different
frames are linked together and reestimated3 (p. 223-225).
There are also on-line methods based on the bayesian model, which mostly
consists of modeling the signal spectrogram and following harmonic trajectories3 (p. 225).
Yeh and Röbel has proposed a model that is based on generation of “candidate notes”, that are evaluated with a score function. Further examination
of this, requires a look at the external sources, as the short text in the book3
(p. 225) is rather confusing.
Dubois and Davy has introduced a method based on spectrogram modeling
with a zero-mean white Gaussian noise. This method is an extension to their
model based on the sliding window.
Thornburg et al. has proposed a method for melody extraction, and it is
therefore only possible to do monotone recognition.
4
M. Davy, S. Godsill & J. Idier, “Bayesian analysis of polyphonic western tonal music”,
Journal of the Acoustical Society of America, 2005
5
Derived from “self playing” pianos, the piano roll is a representation of whether each
single note is present on a time scale.
14
Sterian et al. uses, in their model, a Kalman filter to extract sinusoidal
partials and grouping these into their sources.
15
8
Pitch Detection
The main sources for this worksheet is a web page by Professor Marina Bosi
from Standford University6 , and chapter 4 in the book by Anssi Klapuri and
Manuel Davy7 .
According to Klapuri & Davy7 there are four key characters for music,
which is important when working with sound signals: pitch, loudness, duration and timbre. The topic of this worksheet is pitch detection.
Pitch is defined as
a perceptual attribute which follows the ordering of sounds on a
frequency-related scale extending from low to high. More exactly,
pitch is defined as the frequency of a sine wave that is matched to
the target sound by the human listener. Fundamental frequency,
(F0) is the corresponding physical term, and is defined for periodic
or nearly periodic sounds only.
There are various ways to detect pitches in music. One way is to simulate
the human ear, since this is one of the most complex yet precise detectors.
However, the complexity of this model is out of scope for this project, hence
it will not be examined.
8.1
Approaching the Problem from a Physical Angle
Time domain detection can be done by observing the signal to detect periodicity. One could count the numbers of zero crossings, but though this is an
easy and cheap method, it is also very inaccurate, since small variations of
the signal around the zero line might induce fatal errors. One more complex,
yet also more precise way of time domain detection is autocorrelation.
Autocorrelation is a tool to find patterns in a signal and determine fundamental frequencies. If the input is periodic, the autocorrelation function
will be as well. If the signal is harmonic, the autocorrelation function will
have peaks in multiples of the fundamental frequency. This method is suitable for e.g. speech recognition due to the low frequency range of speech
6
http://ccrma-www.stanford.edu/~pdelac/154/m154paper.htm
Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,
2006
7
16
signals. This method might be very expensive because it includes a lot of
multiply-add calculations.
frequency domain detection is Another approach. Here the signal is examined in the frequency domain in order to detect the frequency spectrum
of the signal. Here, there are also some different ways to detect pitch.
The signal can be broken down to small segments, which can each be
evaluated by multiplying the signal with a window to get a Short Time
Foruier Transformation(STFT) of the segment. One of the disadvantages of
this method is that the signal is broken into equally sized segments which
is disadvantageous, since the spacing between the notes are nonlinear. This
means that less information is available in the high frequencies than in the
low.
8.2
Previous Pitch Detection Studies
Various scientists and acoustic engineers have examined the problem of transcribing pitches in music. Some of the more interesting results are derived
by (sources for the following was found in: Klapuri and Davy, section 8.4)7 :
• Martin, who applied Ellis’s model to process signals consisting of more
than two simultanious sounds.
• Godsmark and Brown, who examined ways of auditory scene analysis
models. They discovered, that by applying these models, they were
able to transcribe 4 simultanious sounds.
• Marolt, which examined ways to transcribe piano music. Since this
is our main topic, his discoveries will be examined further later on.
For now, it will be enough to know, that he applied time-delay neural
networks to identify each piano key sound, and by doing this carefully,
he was able to transcribe with a good precision.
17
9
Wavelets and Assement of Efficiency
9.1
Purpose
The Fourrier transform is the decomposition of a given signal into a series of
sines. Each sine in the decomposition will feature both infinite energy and
extremely strong autocorrelation. A consequence of the Fourrier transform
is the lack of time/frequency information, meaning that greater resolution in
frequency requires more samples, thereby making it imposible to determine
at what instant a given component is added. In the attempt to decide which
notes are played at a given time, the Fourier transform may not be suitable.
Another approach to signal decomposition was suggested around 1910 by
Haar8 . He concluded that if a signal was to be decomposed without suffering
from the lack of time/frequency information, the waveform to be the key
element needed three main features: Finite energy, a weak autocorrelation
and scaleability. A wavelet is one such waveform. The following will be an
analysis of wavelet ability and effeciency/complexity. An illustration of the
time/frequency resolution of the FFT algorithm and wavelets can be seen on
figure 1.
Low
frequencies
are better
resolved in
frequency
Frequency
Frequency
High
frequencies
are better
resolved in
time
Time
Time
Figure 1: Comparison of time-frequency resolution for wavelets and fft.
8
A wavelet tour of signal processing(1999) p. 7, Stéfane Mallat
http://en.wikipedia.org/wiki/Image:Wavelet_-_Morlet.png, july 15 2005, all
copyrights declined
9
18
Figure 2: A Morlet wavelet9
9.2
Analysis
A wavelet is scalable and can be placed arbitrarily in time. Therefore the
“generic” wavelet is dubbed the mother wavelet ψ. All wavelets in a given
decomposition stems from this wavelet and are called child wavelets. These
are written as
t−b
1
)
ψa,b (t) = √ ψ(
a
a
(4)
a
is the scaling factor, governs the frequency represented
by the wavelet
b is the placement in time
Different mother wavelets have been proposed, and some types of wavelets
are often more suited than others for a given application. A “mother wavelet”
is then the generic wavelet of a given type, f.ex. Haar, Daubechies, the
“mexican hat” or Morlet, which is seen on figure 2.
A discrete wavelet transform (DWT) is the decomposition of the (discrete)
signal into various child wavelets. The shorter wavelets will be able to easily
represent very fast signal transitions, while the longer wavelets represents
lower frequencies. A very nice feature, in relation to audio processing, is the
dyadic nature of the decomposition. This means that analysis in octaves can
easily be accomodated.
When seing the actual implementation, this becomes apparent. The type
of mother wavelet to be used, determines the filter coefficients.
Where:
10
http://en.wikipedia.org/wiki/Image:Wavelets_-_DWT_Freq.png, july 15 2005
19
Figure 3: Wavelet decomposition is dyadic10
Figure 4: Continous downsampling by a factor 211
As far as efficiency goes, the computational complexity is O(n), i.e. a
linear rise. This means that Discrete Time Wavelet Transform (DTWT) is
even more efficient than the FFT12 .
9.3
Conclusion
The DWT could be used to determine diverse features of a music signal. A
discussion and choice of mother wavelet is needed. Wether or not a specific
decomposition is suitable as input to a neural network is to be determined.
11
12
en.wikipedia.org/wiki/Image:Wavelets_-_Filter_Bank.png, july 15 2005
Introduction to wavelets and wavelet transforms(1998) p. 40, C. Sidney Burrus et al.
20
10
Neural Network
This worksheet is about the Neural Network method. The concept will be
described, and the applicability for polyphonic music transcription will be
considered. The purpose of this worksheet is to get an overview of the NN
method, in order to determine whether or not it is a suitable tool for music
transcription in this project.
10.1
Method Overview
An NN is a system, which can be trained to recognize or identify nonlinearities when processing a signal. The method is suitable for systems, where the
user has some preliminary knowledge relevant for the classification. Before
the network block a feature extraction of the input signal must be made. It
can be done in various ways, e.g. wavelets or ear models13 . The NN method
is inspired by the biological nervous system. It consists of weighted neuron
signals and a comparison algorithm. Neurons are models for the way the
biological nervous system perceive what they are exposed to. The weighting
algorithm is adjusted by training the system, using data and known output
values. A comparison is done between the output of the weighted neuron
signals, and these are compared to the known output. In each iteration the
weighting function is adjusted. The result of these iterations will be the
trained system. By training the system to recognize the nervous signals to
the extend possible, the system should be able to process any related input
by its achieved algorithms. Figure 5 shows a block diagram of these relations.
10.2
Suitability
There are both advantages and disadvantages by applying the NN method
as a tool in this project. Earlier studies of various scientists14 have shown
that it is possible to accomplish useable results of music transcription by
applying a feature extraction and the NN method. However, there are no
13
Article: ”Automatic music transcription and audio source separation”, M. D. Plumbley et. al., 2002
14
e.g. Matija Marolt(phd from University of Ljubljana), A. Klapuri(Tampere University
of Technology, Finland
21
Figure 5: Block diagram showing the training of a neural network; Source: Matlab
documentation on Neural Networks
lectures regarding this method on this semester, and the complexity of NN
is quite high.
It seems the method is suitable for music transcription, and when examining the feature extraction analysis block, considering e.g. wavelets, it
might be possible to develop a more suitable transcription system than the
already achieved results by other scientists.
22
11
Data Preprocessing for a Neural Network
Proposed by Others
Through studies of litterature regarding the field of music transcription it
seems the ammount of research is somewhat scattered on genre -, instrument
- or note recognition.
One of the sources of information for this project has been Matija Marolt
from University of Ljubljana, Slovenia15 . He has, the last decade published
articles on music transcription using neural networks. The aim of the transcription has varied a bit, but the emphasis has been on Piano transcription. He has, together with a colleague, Marko Privosnik, from University of
Ljubljana, worked on a piano transcription system called SONIC16 . In their
publication, they describe how they extract partial(meaning the data, they
feed into the neural network for training) by feeding the piano signal through
following steps16 :
1. A Gammatone filterbank, which split the signal into several frequency
channels.
2. Meddis hair cell model, which converts each gammatone filter output
into a probalistic representation of firing activity in the auditory nerve.
3. Network of up to ten adaptive oscillators, which has phase, frequency
and output as adjustable variables, and extracts partials for the note
recognition.
Their system was tested with different piano pieces in different recordings,
and it were able to detect up to 95% of the notes, with 13% extra notes
detected by fault. More test results can be viewed in their article16 . The
preprocessing seems quite complex, and induces thoughts on whether it could
be done in a simpler way.
15
Source:http://www.fri.uni-lj.si/en/personnel/271/oseba.html
Source: M. Marolt, M. Privosnik, SONIC : a system for transcription of piano music, in
Kluev, V., D’Attelis, C. E., Mastorakis, N. E. (eds.), Advances in automation, multimedia
and video systems and modern computer science, WSES Press, cop. 2001. (http://lgm.
fri.uni-lj.si/matic/clanki/malta2001.pdf)
16
23
12
Deciding Method for Further Analysis
After a preliminary analysis of different methods for piano transcription,
a decision has to be made on which methods are to analyze further and
ultimately implement.
This document attempts to summarize the results of the initial analysis,
in order to form a basis for the decision.
The requirements to the system, stated that a real time implementation
is wanted, so methods with high computational complexity of the running
system is unwanted. The system is also required to give a representation for
every combination of played tones, and hence whether each individual tone
has been played at a given time.
12.1
Preliminary Analysis
The initial analysis has been focused on different ways to attack the problem,
ranging from solutions like a purely statistical approach and auditory based
models to methods based on wavelets and neural networks.
12.1.1
Statistical Methods
In the analysis of statistical methods to estimate fundamental frequencies,
a wide range of different methods is explained in the book by Klapuri &
Davy17 .
From the analysis it is concluded that a wide range of different methods
are available, and many of them also quite usable, but for most of them,
a heavy computational load is to be expected, and therefore a real time
implementation may not easily be achieved.
12.1.2
Methods based on Auditory Models
The auditory models are models based on how the human ear works and how
the human perception of music is. Chapter 8 of Klapuri & Davy17 gives a
good introduction to a range of these.
The concrete methods have by now only been examined superficially,
but includes for instance separation of the tone bands using a filter bank or
17
Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,
2006
24
channel and peak selection as well as pitch-perception models. A method
used by Matija Marolt was to identify tones using adaptive oscillators18 for
preprocessing of the signal to use as input to an neural network.
12.1.3
Neural Networks
An neural network takes as input a feature set, which will have to be generated from the input signal, and a result is calculated via a weighting vector.
Neural networks have a huge advantage when it comes to the computational complexity of running the trained networks, but due to their nature,
a large amount of training data is needed in order to make the results sufficiently reliable, and the training requires a lot of computation.
Earlier studies by Matija Marolt have shown significant results on using neural networks19 with preprocessing of the data by groups of adaptive
oscillators18 .
12.1.4
Wavelets
The wavelets provides a way to transform a given signal into frequency components, like the Fourier transform, and provides the opportunity to study
each frequency component with a resolution that matches the scale. This feature of the wavelet along with the fact that it, unlike the statistical methods,
computationally is not very complex, makes it a good candidate for feature
extraction from a recorded signal.
12.2
Decision
Based on the key points of the above, it has been decided that the further
analysis will focus on the use of neural networks for the transcription. To use
NN, a preprocessing of the data is necessary to minimize the computational
load. This preprocessing is a feature extraction, and further analysis of
wavelets will be performed in order to decide whether these can be used
for the preprocessing of data for the neural networks.
18
Matija Marolt, “Networks of Adaptive Oscillators for Partial Tracking and Transcription of Music Recordings”, Journal of New Music Research, 2004, vol. 33, no. 1, pp.
49-59, 2004.
19
Matija Marolt, “A connectionist approach to automatic transcription of polyphonic
piano music”, IEEE Transactions on Multimedia, 2004, Vol. 6, no. 3, pp. 439-449, 2004.
25
The project will from this point on focus on the combination of neural
networks and wavelets.
26
13
13.1
General Construction
Purpose
To clarify what building blocks are needed to realize the software of the
transciption tool.
13.2
Analysis
A generic construction, based on a neural network (NN), is viewed on 6.
Figure 6: Blockdiagram of a system based on neural networks.
13.3
Sampled Piano Music
This is a sampled piece of piano music of arbitrary length. It is assumed
that bitresolution is 16 and samplerate is 44100 Hz, so as to comply with the
wave-format featured on industrial manufactured CD’s.
13.4
Feature Extraction
The music signal has to be transformed to another representation, one that
somehow makes it easier to differentiate the different piano notes. The ob27
vious repesentation would be frequency components, eg. via Fast Fourier
Transform (FFT) or Discrete Wavelet Transform (DWT). The optimal feature set would be one that was easily recogniceable/unique for a given note
and also one that has minimal variation, from piano to piano.
13.5
NN
The NN could handle all note detection simultaneously or be split up in
88 different networks, each handling a specific note. It is to be determined
whether a given type of neuron is the most optimal, so the number of neurons
can be minimized, without compromising detection effectivity.
13.6
Decision/Discrimination
As some piano notes have a somewhat strong correlation, especially the octaves of a given note, it is very likely that a “false hit” will will be registered
from time to time. A discrimination algorithm should be able to remove som
errors. Some errors can be detected and rectified by simple rules. If notes
from C120 , C3, E3, G3 and G5 was detected, G5 would most likely be a false
reading, as the first four notes would require two hands to play. The key
would be to find the best balance between false hits and no dection of notes
actually played.
13.7
Presentation
Some kind of representation is needed. The notes could simply be written
directly to a file, a score or be presented on the PC monitor. It is not to be
a focal point, but should be effective as a diagnostics tool during the design
phase.
20
The representation of notes is called the scientific pitch notation. A4 is the note with
fundamental frequency 440 Hz.
28
14
General Thoughts
The purpose of this worksheet is to brainstorm on the different possibilities
for implementation and further development.
14.1
optimizing the networks
The most optimal method for making an optimal neural network would be to
make an input neuron for each point of data in the decomposition and train
the network. It would take a massive RAM storage, to accomodate this, and
lots of time. To optimize execution speed of the networks in run-time, it
could be determined which of the input neurons where asociated with the
weights holding the biggest absolute value. An example would be to keep
the 100 most sensitive inputs and then retrain.
14.2
Optimizing the Wavelet Decomposition
At the same time as network optimization is carried out, it would be interesting to try different wavelet decompositions, to determine if some types
where more effective than others. It would also be very interesting too see
if some wavelets were more appropriate at a given interval. Perhaps it is a
good idea to detect higher pitched notes using a shorter decomposition?
14.3
Optimization of the Decision Algorithm
If it was possible to make some kind of “music style detection”, the knowledge
could make base for a probabilistic decision. Jazz would most likely feature
some signature chord modulation that is not seen in classical music and vice
versa.
29
Part II
Design
15
Architectural Considerations for the Neural Network
This worksheet contains an overview of possible methods for structuring the
network of neurons and types neurons. The main sources for this worksheet
are ”Neural Networks, a Comprehensive Foundation” by Simon Haykin, chapter 1 21 and the article:”An Introduction to Computing with Neural Nets” by
Richard P. Lippmann22 .
15.1
Adjustable Network Elements
A network is a construction of neurons and links between them, like shown
in figure 9, page 32. It can be constructed in various ways. Adjustment of
the structure of neurons and links in the network can be done to achieve the
best results for the network.
The network consists of one input layer with a number of neurons, one
output layer with a number of neurons and possibly a number of hidden
layers not necessarily including the same number of neurons per layer. There
are three parameters to adjust:
• Number of input neurons(source nodes)
• Number of hidden layers and neurons(computation nodes) in these
• Number of output neurons(computation nodes)
These numbers of hidden layers and neurons in all layers, as well as the type
of neurons must be decided in order to design the network, but before this is
done, some considerations regarding the structure must be made.
21
Source info: ”Neural Networks, a Comprehensive Foundation”, second edition, 1999,
Simon Haykin, Prentice Hall, isbn: 0-13-273350-1, chapter 1
22
Source info: An Introduction to Computing with Neural Nets, IEEE ASSP magazine,
April 1987, Richard P. Lippmann
30
15.2
Classes of Networks
The most simple network is a single layer network, with an input layer and
an output layer. According to which kind of data the network should handle,
different kinds of network types are available. Figure 7 shows a tree diagram
of some different types of networks23 . For more information on the specific
classes and their algorithms, Richard P. Lippmann has a more profound
description in his article on neural nets22 .
First thing to determine is whether the input signal is binary or continous. This clarifies which kind of algorithms are most suitable for solving
the problem.
Second thing to determine is whether or not there are data to train the
system. If data are avaliable, it is possible to apply supervised learning. If
there is no training data, the system must be trained unsupervised. This
is done by initializing the system with a very simple structure, and then
gradually optimizing it by feeding the output data into the system to adjust
the structure.
Figure 7: A taxonomy of six neural nets that can be used as classifiers. Classical
algorithms which are most similar to the neural net models are listed along the
bottom. The figure and caption text is from Richard P. Lippmanns article ”An
Introduction to Computing with Neural Nets”, p. 6, figure 3 23
23
Richard P. Lippmann,An Introduction to Computing with Neural Nets, IEEE ASSP
magazine, April 1987
31
15.3
Layers
Figure 8: Neural network structure for a fully connected feed forward single-layer
network, consisting of an input layer and an output layer24 .
Figure 9: Neural network structure for a fully connected feed forward multi layer
network, consisting of an input layer, one hidden layer and an output layer25 .
There are two significant categories of layered networks:
• Single-layer networks, as shown in figure 8
• Multi layer networks, as shown in figure 9
Both net structures can be used for binary as well as continous inputs22 .
In multi-layer networks, the hidden layers are included to enable the
posibility to extract higher-order statistics21 .The multilayer network, with
hidden layers, are beneficial when the size of the input layer is large21 .
24
Source:http://commons.wikimedia.org/wiki/Image:
SingleLayerNeuralNetwork_english.png
25
Source: http://commons.wikimedia.org/wiki/Image:MultiLayerNeuralNetwork_
english.png
32
On both figure 8 and 9 the networks are structured as feed forward networks. It is possible to use feedback in networks, but this will not be examined further in this project. For more information, see the studies of e.g.
Matija Marolt26 .
15.4
Types of Neurons
According to Haykin, a neuron is defined as: An information-procesing unit
that is fundamental to the operation of a neural network 21 .
Figure 10 shows a nonlinear model of a neuron. On the figure three
Figure 10: Nonlinear model of a neuron. Source:Haykin, p. 11, fig. 1.5
elements are shown21 :
• A set of connecting links
• An adder, including a possible bias
• An activation function
Together these three elements form the neuron. The connecting links each
has their own weighting to their respective input signals. After the input
26
”Connectionist Approach to Automaic transcription of Polyphonic Piano Music”,
IEEE Transaction on multimedia, vol. 6, no. 3, June 2004, Matija Marolt
33
signals are weighted, they are added, and perhaps biased21 :
uk =
m
X
wjk xj
(5)
j=1
vk = uk + bk
(6)
(7)
The bias will be described further below. The next element is the activation
function:
yk = ϕ(vk )
= ϕ(uk + bk )
(8)
In the activation function the output signal is normalized. According to
Haykin21 , the typically normalizing range is [0,1] or [-1,1].
There are three main categories of activation functions:
• Threshold function, also known as Heaviside function
• Piece-wise linear function
• Sigmoid function
The three function types is shown in figure 11.
34
Figure 11: (a) Threshold function, (b) Piecewise-linear function, (c) Sigmoid function for varying slope parameter a. Source:Haykin, p. 13, fig. 1.8
16
Overall Architecture
It has been decided that the core functionality, namely the ability to detect
certain piano notes, is to be implemented as a NN. The network basically
consists of 88 smaller, parallel NN’s, each governing the detection of a single
piano note. Feature extraction for the NN’s is to be done via discee wavelet
decomposition. Prior to implementation, it is to be decided which given
wavelet expresses the most aggressive response, in terms of energy, from a
35
given note. This is done to reduce the data to the NN’s. As these two building
blocks are considered essential to the project, these are the only focal points
from now on. Figure 12 is a graphical representation of the architecture.
Figure 12: The implemented principle of notedetection. The wavefile is decomposed into 12 wavelets (the last 1-bit output is a residual) of dyadically declining
lengths. The NN that detects a given pitch is fed with the decomposition that
produces the most power if the note is played. A detection threshhold value is set
to determine hit/no hit.
16.1
Wavelet Decomposition
Not written yet...
16.2
Neural Network Structure
Structuring a neural network is not an exact science, hence choices must be
made by qualified guessing.
36
Since the project group has not previously worked with neural networks,
a meeting with Uwe Hartmann was arranged27 . Below is listed some of the
advises he gave on choice of neuron type:
• Go for the sigmoid function. It is simple and commonly used.
• Choose the soft curve; the specific function is less important.
• Range between -1 and +1, not 0 and 1
Based on literature studies and the meeting with Uwe Hartmann, following choices have been made:
Her skal stå noget om valg af neurontype, antal lag samt antal neuroner
pr. lag... - Når det er blevet skrevet.
16.2.1
Feature Set versus Pitch
Er ikke sikker på at vi skal ha’ afsnittet med, men eet eller andet sted bør
nævnes, hvordan vi har besluttet at benytte de wavelettaps vi bruger til det
enkelte NN.
16.2.2
Training and Test
As training (and test) set for a given NN, a wavefile and detection vector
need to be constructed. The wavefile should be a sequence of different monoand polyphonic notes. The detection vector is simply the input for the back
propagation in the NN. The training notes are acousticly recorded piano
notes, played on an arbitrarily chosen piano. The test notes are a second
recording of the same piano. As the Musician Transcription Tool needs to
be able to handle 10 different simultaneous notes, it is suggested to make an
a composition algorithm like this:
1. Decide whether target note is to be included in composition or not
(P=0.5)
2. Decide randomly the number of notes to be composed (between 0 and
10)
27
Uwe Hartmann is a Senior Professor at Aalborg University, and has conducted research
and held courses in Neural Networks.
37
3. If target note is included and more than 0 notes are to be composed,
write 1 to the detection vector
4. Decide randomly what other notes to be played
5. Add all the needed notes and normalize
6. If a longer sequence is needed, continue from step 1.
38
Part III
Implementation
17
Software design
This worksheet aims to describe the structure of the transcription program.
The training, test and run functionality will be implemented in the same
program, hereby reducing the amount of written code, as most of the actual
actions are identical for all three cases. A simplified diagram of the program
can be seen on figure 13. As a design tool, a userguide has been written.
Figure 13: A simplified chart of the principle of the transcription program
39
17.1
Userguide
The program, which is called
transcribe ,
can be run in three ways:
train For training the system
test For testing the system
run To get transcription results from the program
The runtype and files to use, are chosen by using input arguments to the
program. There are no limits to the number of arguments, however only one
runtype is allowed.
To use the program for training a network, the first argument must be
train and the remaining argument must be in groups of three:
1. The wavefile to be analysed
2. The correct results for training (a file containing ones and zeros fitting
the wavefile – each one or zero must be the result for a block of 4096
samples28 )
3. The wanted name of the data (combination of wavelet data and correct
results for training) and net-file (where the neural network is saved)
Example: transcribe train A4.wav A4 hi Net/NN A4 C8.wav C8 hit Net/NN C8
This will train the networks for the .wav-files by comparing the calculated
wavelets with the defined results, and the resulting networks will be saved in
the folder Net with the names NN A4.net and NN C8.net
The syntax for testing the network is identical, but the first argument
must then be test and the argument groups now mean:
1. The wavefile to be analysed
2. The correct results for testing (a file in the same format as for training)
3. The name net-file (where the neural net was is saved)
28
4096 samples for decomposition is seleced from a wish to have a sufficiently low time
resolution, sufficiently high frequency resolution and, due to the wavelet transform method,
it must be a power of 2.
40
Example: transcribe test A4tst.wav A4tst hit Net/NN A4 C8tst.wav C8tst hit Net/NN C8
This will test the two networks, that were trained before, and return the
mean square error of each.
Finally, to run the trascription, use the argument run followed by all the
.wav-files to analyse. The results will be written in textfiles with a fixed
name for each neural network.
Example: transcribe run Musik.wav Muzak.wav
This will output two files for each of the 88 neural networks containing the
float output values of the networks, of the format “[filename] [netname].test”,
eg. Musik.wav A4.test will contain the results from the network trained for
recognizing A4, after processing the file Musik.wav.
17.2
Implementation
The software has been implemented in C, and the decision of combining
the functionality into one program has prevented a lot of double work. In
particular the main part of training and test only differ by a few lines of
code.
The overall implementation is working, however the output of the neural networks seems to be wrong. Reading a wavefile and decomposing it into
wavelets is working and has been succesfully compared with matlab results.
The neural networks can be trained and tested with a tesfile, but when inputting data for classifying, the output values are not very well spread over
the range of -1 to 1 as they ought to be. In fact a very great deal of different
inputs can generate the exact same output, and from a wavefile with 7000
decompositions, only 123 different output values were detected.
Since the used neural networks are implemented via an external library,
which we have not developed, a quick solution was to skip further development
of the program and use corresponding algorithms in matlab. In a realtime
implementation, as was originally wanted, the C implementation can be used
as a good basis.
18
Real time considerations
An initially desired feature of the system was the possibility to transcribe
the music in real time on “regular hardware”. Although interaction with the
41
sound card has not been implemented, the analysis of wave files indicate that
it is indeed possible to run the transcription in real time.
The wav files used for training, was mono, 44,1 kHz, 16 bit with a length
of 10 minutes and 50 seconds. The program in this case only runs a single
network, but running the trained networks takes almost no time compared
to the wavelet transform and file writing, and thus this is not assumed to be
a significant contribution. On a 3 year old laptop with 1.5 GHz single core
processor and 512 MB DDR Ram, with Ubuntu linux, the performance was
measured by processing a single note while using the programs top (displaying
resource usage) and time (measuring “active” time for a brocess). The results
were:
• CPU usage: about 90% (88.5-92.9)
• Memory usage: max 114 MB (the entire wave, which is 55 MB is loaded
and parts of it buffered)
• Time usage: 59 seconds (real time) – 54 seconds of actual processing
Since the processing time is less than a 10th of the playing time on this
hardware, a real time implementation should be achievable, even without
optimization of algorithms.
19
FANN – Fast Artificial Neural Network
library
This is the documentation for the FANN library, that was made as a quickguide for relevant part to use it in the project. Because the output had
strange results (a lot of similar numbers as output - only a handful of different outputs in the range -1 to 1 with several thousand different inputs), the
C-code based on this was not finished completely.
The library can be found on www.leenissen.dk/fann/.
On www.leenissen.dk/fann/html/, a reference manualexists for all functions.
The files mentioned below can be found as a zip archive on http://kom.
aau.dk/group/08gr742/fann.zip
42
Indhold:
-------1
Indhold i mappen
1.1 How to read
2
Kompilering af FANN
2.1 Linux
2.2 Windows
3
3.1
3.2
3.3
Brug af
Træning
Brug af
Test af
FANN
af ANN
ANN
ANN
4
Kompilering af kode
***************************************************************
* 1 * Indhold i mappen
*
***************************************************************
fann-2.0.0.zip
C and Python Library Source Code
for all platforms
Support for:
Gnu Makefile, Visual Studio 6/.Net,
Borloand C++ Builder and other
standard compilers.
fann_doc_complete_1.0.pdf
Complete documentation of V1.0
Test/
Mappe til test-kode, der viser
hvordan FANN bruges
datafiler til input i FANN
genereing af datafiler til eksemplet
(ud fra pattern recognition filer)
Makefile, der muliggør bygning og
linking automatisk
Test/*.data
Test/make_testdata.m
Test/Makefile
43
Test/test_train.c
Test/test_test.c
Test/test_run.c
Test/*.net
Program der træner ANN ud fra
train[1-4].data-filerne
Program der tester ANN ud fra
test[1-4].data-filerne
Program der klassificerer ud fra
input - IKKE LAVET FÆRDIG
Trænede netværk
1.1 How to read
- Når der i denne fil bruges betegnelsen $FANNDIR, så er det
den mappe fann-2.0.0.zip er udpakket i.
- kodeeksempler er indrykket med 1 tabulator
***************************************************************
* 2 * Kompilering af FANN
*
***************************************************************
Udpak fann-2.0.0.zip og kompiler til den platform hvorpå den
skal bruges
2.1 Linux
Gå ind i mappen og udfør flg.:
./configure
make
sudo make install
sudo ldconfig
(ldconfig er for at sikre at biblioteket også findes når
programmet køres)
2.2 Windows
i mapperne "MicrosoftVisualC++6.0" og "MicrosoftVisualC++.Net"
ligger projektfiler til de respektive Visual Basic
44
***************************************************************
* 3 * Brug af FANN
*
***************************************************************
FANN inkluderer funktioner, der direkte benyttes til at træne,
gemme og bruge et netværk. Når man er færdig med at bruge et
ANN, skal det nedlægges igen for ikke at optage al hukommelsen.
3.1 Træning af ANN
Før træningen kan foretages, skal træningsdata foreligge i
korrekt form i en fil - se 3.1.1
Træningen foregår herefter i 3 trin (se 3.1.2):
- Definer parametre
- træn netværk på filen
- gem og nedlæg netværket
3.1.1 Træningsdata
Træningsdata skal gemmes i en fil i følgende format
[antal træningsmøstre] [input pr mønster] [outputs pr mønster]
[input 1,1] [input 1,2] [...]
[output 1,1] [output 1,2] [...]
[input 2,1] [input 2,2] [...]
[output 2,1] [output 2,2] [...]
[...]
f.eks. ser træningsdata for træning af en xor-funktion således
ud (4 sæt, 2 input, 1 output)
4 2 1
45
-1 -1
-1
-1 1
1
1 -1
1
1 1
-1
3.1.2 Eksempel på brug:
Et ANN defineres som en struct - tallene her er fra et eksempel
const unsigned int num_input = 2;
const unsigned int num_output = 1;
const unsigned int num_layers = 3;
const unsigned int num_neurons_hidden = 3;
const float desired_error = (const float) 0.001;
const unsigned int max_epochs = 500000;
const unsigned int epochs_between_reports = 1000;
struct fann *ann = fann_create_standard(num_layers,
num_input, num_neurons_hidden, num_output);
Herefter defineres overføringsfunktionerne for neuronerne:
fann_set_activation_function_hidden(ann,
FANN_SIGMOID_SYMMETRIC);
fann_set_activation_function_output(ann,
FANN_SIGMOID_SYMMETRIC);
Andre muligheder for overføringsfunktioner er: FANN_LINEAR,
FANN_LINEAR_PIECE, FANN_LINEAR_PIECE_SYMMETRIC, FANN_SIGMOID,
FANN_SIGMOID_SYMMETRIC, FANN_SIGMOID_SYMMETRIC_STEPWISE,
FANN_SIGMOID_STEPWISE, FANN_THRESHOLD,
FANN_THRESHOLD_SYMMETRIC, FANN_GAUSSIAN,
FANN_GAUSSIAN_SYMMETRIC, FANN_ELLIOT, FANN_ELLIOT_SYMMETRIC
46
(se beskrivelser i $FANNDIR/src/include/fann_data.h)
fann_set_activation_function_hidden sætter
overføringsfunktionen for samtlige skjulte neuroner.
I stedet kan sættes en overføringsfunktion for en enkelt neuron
med funktionen
fann_set_activation_function(ann, FUNKTION, layer, neuron)
eller for et helt lag med
fann_set_activation_function_layer(ann, FUNKTION, layer)
Der loades en fil og trænes på denne:
fann_train_on_file(ann, "xor.data", max_epochs,
epochs_between_reports, desired_error);
Som standard benyttes træningsalgoritmen FANN_TRAIN_RPROP, men
denne kan ændres ved at kalde
fann_set_training_algorithm(ann, ALGORITME)
hvor ALGORITME er en af følgende: FANN_TRAIN_INCREMENTAL,
FANN_TRAIN_BATCH, FANN_TRAIN_RPROP, FANN_TRAIN_QUICKPROP
(se beskrivelser i $FANNDIR/src/include/fann_data.h)
Netværket gemmes:
fann_save(ann, "xor_float.net");
Netværket nedlægges for at frigøre hukommelsen:
fann_destroy(ann);
3.2 Brug af ANN
Når et ANN skal bruges, skal det først enten trænes som ovenfor
eller loades fra en fil, der er trænet på denne vis.
Derefter, defineres input og output udregnes.
47
Variable oprettes til in- og outputs:
fann_type *calc_out;
fann_type input[2];
fann_type er typen af vægte, og er enten float, double eller
int alt efter om fann.h/floatfann.h, doublefann.h eller
fixedfann.h - i vores tilfælde bliver det nok float...
Opret netværket fra den gemte fil:
struct fann *ann = fann_create_from_file("xor_float.net");
Definer inputs:
input[0] = -1;
input[1] = 1;
Udregn og print output:
calc_out = fann_run(ann, input);
printf("xor test (%f,%f) -> %f\n", input[0], input[1],
calc_out[0]);
Netværket nedlægges for at frigøre hukommelsen:
fann_destroy(ann);
3.3 Test af ANN
ANN kan testes ved at give et enkelt input med funktionen
fann_test(ann, input, desired_output);
Der kan også testes på et helt datasæt med funktionen
48
fann_test_data(ann, data);
Begge opdaterer MSE for netværket, der kan aflæses med
fann_get_MSE(ann)
Der findes ikke en funtion til at teste fra en fil, men
inspireret ud fra hvordan fann_train_on_file er realiseret
(i $FANNDIR/src/fann_train_data.c), bør det kunne lade sig
gøre således:
struct fann_train_data *data =
fann_read_train_from_file("filename.data");
fann_test_data(ann, data);
***************************************************************
* 4 * Kompilering af kode
*
***************************************************************
Når en kode skal kompileres, er det væsentligt at de rette
libraries loades.
Der er oprettet en Makefile til formålet, der fungerer på
linux:
make
make
make
make
make
all
clean
filnavn
runtest
kompilerer alle filer
kompilerer alle filer
rydder op
kompiler filnavn.c
kompilerer og kører test_train og test_test
49
20
Port Audio
Port audio is a cross-platform audio API, and was examined as an option for
realtime implementation of the system. The time did not allow a real time
implementation, but an unfinished worksheet of how to use the API is seen
below.
Port Audio can be found on www.portaudio.com/
The files mentioned below can be found as a zip archive on http://kom.
aau.dk/group/08gr742/port_audio.zip
Indhold:
-------1
Indhold i mappen
2
Kompilering af PortAudio
2.1 Linux
2.2 Windows
3
Kompilering af eksempler
3.1 Linux
3.2 Windows
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Opbygning af programmer
At skrive en callback-funktion
At initialisere PortAudio
At åbne en stream
At starte, stoppe og afbryde en stream
At lukke en stream og terminere PortAudio
Diverse funktioner
At søge efter enheder
Blocking I/O funktioner (alternativ til callback)
***************************************************************
* 1 * Indhold i mappen
*
***************************************************************
50
pa_stable_v19_20071207.tar.gz
selve PortAudio distributionen
skal kompileres til det system der bruges (Windows/Linux)
Eksempler
mappe med eksempler og guides
til at kompilere dem
***************************************************************
* 2 * Kompilering af PortAudio
*
***************************************************************
2.1 Linux
Guide:
www.portaudio.com/trac/wiki/TutorialDir/Compile/Linux
1)
2)
3)
4)
Udpak pa_stable_v19_20071207.tar.gz
Åben mappen i en terminal
Skriv ./configure
Skriv make
hvis alt går vel, vil det nu være bygget
2.2 Windows
Guide (Visual studio):
www.portaudio.com/trac/wiki/TutorialDir/Compile/Windows
Guide (Gratis tools) :
www.portaudio.com/trac/wiki/TutorialDir/Compile/WindowsMinGW
1) Udpak pa_stable_v19_20071207.tar.gz (hvis ikke du kan åbne
51
tar-filer, så hent f.eks. http://peazip.sourceforge.net/)
2) Følg guide
hopefully all goes well...
***************************************************************
* 3 * Kompilering af eksempler
*
***************************************************************
3.1 Linux
Generering af savtand (eksempel fra distributionen - lyd ud):
* Hovedfil: patest_saw.c
* portaudio.h og libportaudio.a skal ligge i samme mappe
*
* Kommando:
* gcc -lasound -ljack -lpthread -o patest_saw.bin
patest_saw.c libportaudio.a
*
* Forklaring:
* -lasound : link med asond library (ALSA lyd)
* -ljack : link med jack lib (JACK sound)
* -lpthread : link med pthread lib (Posix threads - trådet
programmering ikke strengt nødvendig for denne fil...)
* -o <filnavn> : eksekverbar fil gemmes som <filnavn>
* patest_saw.c : denne fil
* libportaudio.a : bibliotek, der gør at den eksekverbare fil
kan benyttes uden at portaudio skal være installeret
Optag en lyd og afspil den
(fra distributionen: lyd ind --> gem i fil --> lyd ud)
* Hovedfil: patest_read_record.c
* portaudio.h og libportaudio.a skal ligge i samme mappe
*
* Kommando:
* gcc -lasound -ljack -lpthread -o patest_saw.bin
52
patest_saw.c libportaudio.a
*
* Forklaring - se ovenfor
3.2 Windows (fra guide - ikke testet)
in any project, in which you require portaudio, you can just
link with portaudio_x86.lib, (or _x64) and of course include
the relevant headers (portaudio.h, and/or pa_asio.h,
pa_x86_plain_converters.h) Your new exe should now use
portaudio_xXX.dll.
3.2.1 MinGV
3.2.2 Visual Studio o.l.
***************************************************************
* 4 * Opbygning af programmer
*
***************************************************************
Der er taget udgangspunkt i den officielle tutorial på
http://www.portaudio.com/trac/wiki/TutorialDir/TutorialStart
Det kan også anbefales at læse portaudio.h, der indeholder
informationer om alle funktioner.
Generel opbygning af et PortAudio program:
* Skriv en "callback"-funktion som PortAudio kalder, når der
skal processeres lyd - denne må ikke være for
beregningstung!
* Initialiser PA library og åben en stream til audio I/O.
* Start streamen. Callback-funktionen bliver nu kaldt
gentagne gange af PortAudio i baggrunden.
53
* I callback’et kan der læses lyddata fra inputBuffer
og/eller skrives data til outputBuffer.
* Stop streamen ved at returnere 1 fra callback’et, eller ved
at kalde en stop-funktion.
* Luk streamen og terminer PA library.
4.1 At skrive en callback-funktion
4.2 At initialisere PortAudio
4.3 At åbne en stream
4.4 At starte, stoppe og afbryde en stream
4.5 At lukke en stream og terminere PortAudio
4.6 Diverse funktioner
4.7 At søge efter enheder
4.8 Blocking I/O funktioner (alternativ til callback)
54
Part IV
C Source Code
This is the collection of source code for the off-line transcription system.
Except for the actual execution of the neural networks, which gives “funny”
results, everything is working as expected.
21
Makefile
The Makefile is used for building and linking the program with GNU make.
1 # The make f i l e r e q u i r e s t h a t t h e f a n n l i b r a r y i s i n s t a l l e d
2
3 GCC=g c c
4
5
6 SOURCES. c= main . c w a v e l e t . c waveread . c a n n t r a i n . c a n n t e s t . c ann run . c
fileio .c
7 INCLUDES=
8 CFLAGS= −O3 −lm −l f a n n
9 SLIBS=
10 PROGRAM= t r a n s c r i b e
11
12 OBJECTS= $ (SOURCES. c : . c =. o )
13
14 . KEEP STATE :
15
16 debug := CFLAGS= −g
17
18 a l l debug : $ (PROGRAM)
19
20 $ (PROGRAM) : $ (INCLUDES) $ (OBJECTS)
21
$ (LINK . c ) −o $@ $ (OBJECTS) $ ( SLIBS )
22
23 c l e a n :
24
rm −f $ (PROGRAM) $ (OBJECTS)
22
main.c
The main part links the entire program and deals with the input arguments,
that determines whether the program is used for training, testing or running
as well as which files to operate on.
55
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include
#include
#include
#include
<s t d i o . h>
< s t d l i b . h>
<math . h>
< s t r i n g . h>
#include
#include
#include
#include
” w a v e l e t . h”
” waveread . h”
” ann . h”
” f i l e i o . h”
i n t main ( i n t a r g c , char ∗∗ a r g v ) {
char ∗ f i l e n a m e , ∗ netname , ∗ datname , ∗ checkname , ∗ a c t i o n , run [ ] = ” run ” , t r a i n
[]= ” train ” , t e s t []= ” t e s t ” ;
wavheader whd ;
i n t argnum , i , j , k , ∗ b u f f e r , b u f s i z e , s i g l e n , h a l f , s i g s t a r t ;
float ∗ wavelet , ∗ indata , c a l c o u t ;
FILE ∗ i f p , ∗ ofp , ∗ f t e s t ;
i f ( a r g c ==1){
p r i n t f ( ” This program n e e d s arguments t o work . . . \ nThe f i r s t argument
must be one o f : \ nrun \ t t o run t h e program \ n t e s t \ t t o t e s t t h e
program \ n t r a i n \ t t o t r a i n t h e program \n” ) ;
e x i t (EXIT FAILURE) ;
}
a c t i o n=a r g v [ 1 ] ;
i f ( strcmp ( a c t i o n , run )&&strcmp ( a c t i o n , t r a i n )&&strcmp ( a c t i o n , t e s t ) ) {
p r i n t f ( ”The f i r s t argument must be one o f : \ nrun \ t t o run t h e program \
n t e s t \ t t o t e s t t h e program \ n t r a i n \ t t o t r a i n t h e program \n\
n r u n n i n g program , s i n c e no o t h e r a c t i o n i s s p e c i f i e d . . . \ n\n” ) ;
a c t i o n=run ;
i =1;
} else {
i =2;
}
f o r ( argnum=i ; argnum<a r g c ; argnum++){
f i l e n a m e= a r g v [ argnum ] ;
p r i n t f ( ” f i l e %d o f %d : %s \n” , argnum −1 , a r g c −2 , f i l e n a m e ) ;
i f ( ( i f p = f o p e n ( f i l e n a m e , ” rb ” ) )==NULL) {
f p r i n t f ( s t d e r r , ” Could not open t h e f i l e %s f o r r e a d i n g \n” ,
filename ) ;
} e l s e { // p r e v e n t d o i n g t h i n g s w i t h n o n e x i s t i n g f i l e
36
37
38 /∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
39 ∗ In t h i s p a r t , t h e wave− f i l e i s r e a d . . . ∗
40 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗/
41
42
// Read t h e wave h e a d e r :
43
whd=wavread head ( i f p , f i l e n a m e ) ;
44
b u f s i z e = whd . d a t a s i z e / whd . b l o c k a l i g n ;
45
46
// P r i n t r e l e v a n t i n f o :
47
p r i n t f ( ”%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n
%s : %d\n” , ” F i l e s i z e ” , whd . f i l e s i z e , ”Number o f c h a n n e l s ” , whd .
num chan , ” S a m p l e r a t e ” , whd . s a m p l e r a t e , ” B y t e r a t e ” , whd . b y t e r a t e ,
” Block a l i g n m e n t ” , whd . b l o c k a l i g n , ” B i t s p e r sample ” , whd . b i t s ,
” Data s i z e ” , whd . d a t a s i z e , ”Number o f s a m p l e s ” , b u f s i z e ) ;
56
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// Read t h e c o n t e n t s i n t o an i n t b u f f e r :
b u f f e r =( i n t ∗ ) m a l l o c ( s i z e o f ( i n t [ b u f s i z e ] ) ) ;
// 8 b i t wave i s u n s i g n e d − 16 & 24 b i t a r e s i g n e d − l e f t and r i g h t
s h i f t to get the s i g n b i t correct
i f ( whd . b i t s >8){
f o r ( i =0; i <b u f s i z e ; i ++){
b u f f e r [ i ]= g e t f i l e i n t s ( i f p , whd . b l o c k a l i g n ) << ( s i z e o f
( unsigned long ) ∗8−whd . b i t s ) >> ( s i z e o f ( unsigned long )
∗8−whd . b i t s ) ;
}
} else {
f o r ( i =0; i <b u f s i z e ; i ++){
b u f f e r [ i ]= g e t f i l e i n t s ( i f p , whd . b l o c k a l i g n ) ;
}
}
fclose ( ifp ) ;
#i f d e f DEBUG
// Write t h e v a l u e s i n t h e f i l e tmp f o r comparing w i t h t h e o r i g i n a l
wave i n m a t l a b
i n t o u t ( b u f f e r , b u f s i z e , ” debug wave ” ) ;
// The t e s t was a s u c c e s s : v a r i a n c e o f c r e a d . / o r i g i n a l ( w i t h
c o r r e c t i o n f o r 0/0− i n s t a n c e s ) was 0
#e n d i f
p r i n t f ( ”wave− f i l e l o a d e d . . . \ n” ) ;
69
70
71
72 /∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
73 ∗ Here t h e check− f i l e i s opened and a h e a d e r i s w r i t t e n t o t h e
∗
74 ∗ t e s t / t r a i n i n g data−f i l e , i f a n e t w o r k i s t o be t r a i n e d or t e s t e d
∗
75 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗/
76
i f ( ( ! strcmp ( a c t i o n , t r a i n ) ) | | ( ! strcmp ( a c t i o n , t e s t ) ) ) {
77
// n e x t v a l u e must be t h e r e s u l t s f o r t r a i n i n g / t e s t i n g
78
i f ( ( argnum+1)<a r g c ) {
79
checkname=a r g v [ argnum + 1 ] ;
80
i f ( ( i f p = f o p e n ( checkname , ” r ” ) )==NULL) {
81
f p r i n t f ( s t d e r r , ” Could not open t h e check− f i l e %s f o r
r e a d i n g \n” , checkname ) ;
82
} else {
83
// n e x t v a l u e must be t h e d a t a / net−name
84
i f ( argnum+2<a r g c ) {
85
netname=(char ∗ ) m a l l o c ( ( s t r l e n ( a r g v [ argnum +2])+
s t r l e n ( ” . n e t ” ) +1)∗ s i z e o f ( char ) ) ;
86
s p r i n t f ( netname , ”%s . n e t ” , a r g v [ argnum +2]) ;
87
datname=(char ∗ ) m a l l o c ( ( s t r l e n ( a r g v [ argnum +2])+
s t r l e n ( ” . data ” ) +1)∗ s i z e o f ( char ) ) ;
88
s p r i n t f ( datname , ”%s . data ” , a r g v [ argnum +2]) ;
89
i f ( ( o f p = f o p e n ( datname , ”w” ) )==NULL) {
90
f p r i n t f ( s t d e r r , ” Could not open t h e f i l e %s
f o r w r i t i n g ” , datname ) ;
91
} e l s e { // w r i t e t r a i n / t e s t − f i l e h e a d e r #number
#i n p u t #o u t p u t
92
93
f p r i n t f ( ofp , ”%d %d %d\n” , 6 9 9 9 , A4 LEN , 1 ) ;
94
f c l o s e ( ofp ) ;
57
95
96
97
98
99
100
101
102
}
} else {
f p r i n t f ( s t d e r r , ”Not enough arguments − n e t / data
name m i s s i n g \n” ) ;
}
}
} else {
f c l o s e ( i f p ) ; // c l o s e check− f i l e
f p r i n t f ( s t d e r r , ”Not enough arguments − check− f i l e m i s s i n g \n”
);
}
103
104
}
105 /∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
106 ∗ Here t h e wav− f i l e i s ” chopped i n p i e c e s ” and each
∗
107 ∗ p i e c e t r a n s f o r m e d u s i n g a D4 w a v e l e t d e c o m p o s i t i o n ∗
108 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗/
109
110
// D e f i n e t h e s i g n a l l e n g t h
111
s i g l e n =4096;
// 0 ,093 s v . 44100 Hz − 0 ,043 s v . 96000 Hz
112
w a v e l e t =( f l o a t ∗ ) m a l l o c ( s i z e o f ( f l o a t [ s i g l e n ] ) ) ;
113
114 p r i n t f ( ” e x e c u t i n g w a v e l e t t r a n s f o r m :
”) ;
115
i =0;
116
while ( ( ( i +1)∗ s i g l e n ) <= b u f s i z e ) { // Go t h r o u g h t h e b u f f e r
u n t i l a w h o l e s i g l e n cannot be used
117
s i g s t a r t=i ∗ s i g l e n ;
118
p r i n t f ( ” \b\b\b\b\b%5d” , i ) ;
119
// Copy p a r t o f t h e s i g n a l t o w a v e l e t a r r a y
120
f o r ( j =0; j < s i g l e n ; j ++){
121
w a v e l e t [ j ]=( f l o a t ) b u f f e r [ s i g s t a r t+j ] ;
122
}
123
// Perform t h e w a v e l e t d e c o m p o s i t i o n ( implemented i n w a v e l e t . c )
− the r e s u l t of the decomposition i s in the wavelet array
124
wavelet db4 ( wavelet , s i g l e n ) ;
125
// Write o u t p u t o f each d e c o m p o s i t i o n t o f i l e s
126
f i l e n a m e =(char ∗ ) m a l l o c ( ( s t r l e n ( ” w l 1 2 3 4 ” ) + s t r l e n ( a r g v [
argnum ] ) + 5 ) ∗ s i z e o f ( char ) ) ;
127
s p r i n t f ( f i l e n a m e , ”%s w l%d” , a r g v [ argnum ] , i ) ;
128
i f ( ( f t e s t=f o p e n ( f i l e n a m e , ” r ” ) )==NULL) { // o n l y w r i t e i f t h e
f i l e does not e x i s t . . .
129
f l o a t o u t ( wavelet , s i g l e n , f i l e n a m e ) ;
130
} else {
131
fclose ( ftest ) ;
132
}
133
free ( filename ) ;
134
// Perform t r a i n i n g / t e s t / run−a c t i o n s f o r t h e s p e c i f i c t r a n s f o r m
135
i f ( ! strcmp ( a c t i o n , run ) ) {
136
i n d a t a =( f l o a t ∗ ) m a l l o c ( s i z e o f ( f l o a t [ A4 LEN ] ) ) ;
137
f o r ( k=A4 ST ; k<A4 END ; k++){
138
i n d a t a [ k−A4 ST]= w a v e l e t [ k ] ;
139
}
140
// D e f i n e names o f t h e n e t t o l o a d and r e s u l t f i l e t o w r i t e
141
f i l e n a m e =(char ∗ ) m a l l o c ( ( s t r l e n ( ”A#4. t e s t ” )+ s t r l e n ( a r g v
[ argnum ] ) +1)∗ s i z e o f ( char ) ) ;
142
s p r i n t f ( f i l e n a m e , ”%s %s%d . t e s t ” , a r g v [ argnum ] , ”A” , 4 ) ;
143
netname=(char ∗ ) m a l l o c ( ( s t r l e n ( ”A#4. n e t ” ) +1)∗ s i z e o f ( char
));
58
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196 }
s p r i n t f ( netname , ”%s%d . n e t ” , ”A” , 4 ) ;
ann run ( netname , i n d a t a , f i l e n a m e ) ;
free ( filename ) ;
f r e e ( netname ) ; // ∗/
} e l s e i f ( ( ! strcmp ( a c t i o n , t r a i n ) ) | | ( ! strcmp ( a c t i o n , t e s t ) ) ) {
// h e r e we must g e n e r a t e a t e s t / t r a i n i n g f i l e
// A4/440Hz/ p i t c h 69 −−> wl−i n t e r v a l : 32−63 (32 v a l u e s )
k=f s c a n f ( i f p , ”%d” ,& j ) ;
if (! feof ( ifp ) ){
i f ( ( o f p = f o p e n ( datname , ” a ” ) )==NULL) {
f p r i n t f ( s t d e r r , ” \ nCould not open t h e f i l e %s f o r
w r i t i n g \n” , datname ) ;
} e l s e { // w r i t e t r a i n / t e s t − f i l e l i n e s [ i n p u t s ] \n
[ output ]
f o r ( k=A4 ST ; k<A4 END ; k++){
f p r i n t f ( ofp , ”%f ” , w a v e l e t [ k ] ) ;
}
f p r i n t f ( ofp , ” \n%d\n” , j ) ;
f c l o s e ( ofp ) ;
}
}
}
i ++;
} // end w a v e l e t −l o o p
p r i n t f ( ” \n\n” ) ;
// Perform t r a i n i n g / t e s t / run−a c t i o n s f o r t h e e n t i r e wave f i l e
i f ( ( ! strcmp ( a c t i o n , t r a i n ) ) | | ( ! strcmp ( a c t i o n , t e s t ) ) ) {
f c l o s e ( i f p ) ; // c l o s e check− f i l e
argnum=argnum +2; // i n t r a i n and t e s t , 2 e x t r a a r g s a r e
needed f o r d a t a / netname and c h e c k f i l e
}
i f ( ! strcmp ( a c t i o n , run ) ) {
} e l s e i f ( ! strcmp ( a c t i o n , t r a i n ) ) {
p r i n t f ( ” i n p u t s from %d t o %d (%d t o t a l ) \n” , A4 ST , A4 END ,
A4 LEN ) ;
a n n t r a i n ( datname , netname , A4 LEN ) ;
f r e e ( netname ) ;
f r e e ( datname ) ;
} e l s e i f ( ! strcmp ( a c t i o n , t e s t ) ) {
a n n t e s t ( datname , netname ) ;
f r e e ( netname ) ;
f r e e ( datname ) ;
}
free ( buffer ) ;
f r e e ( wavelet ) ;
} // end e l s e ( t o s k i p bad arguments / f i l e n a m e s )
} // f o r −l o o p r u n n i n g t h r o u g h a r g s
e x i t (EXIT SUCCESS) ;
59
23
fileio.h
This file merely holds the prototypes for the file input and output functions.
1
2
3
4
5
6
7
8
9
10
11
12
#i f n d e f FILEIO H
#define FILEIO H
// s u b f u n c t i o n f o r r e t u r n i n g an i n t e g e r o f s i z e b y t e s from t h e f i l e
i n t g e t f i l e i n t s ( FILE ∗ i f p , i n t s i z e ) ;
void f l o a t o u t ( f l o a t ∗ b u f f e r , i n t b u f s i z e , char ∗ f i l e n a m e ) ;
void i n t o u t ( i n t ∗ b u f f e r , i n t b u f s i z e , char ∗ f i l e n a m e ) ;
#endif
24
fileio.c
This file contains 3 functions: one for reading an integer of different length
from a binary file and two for writing an array of either integers or floats to
a text file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <s t d i o . h>
#include < s t d l i b . h>
#include ” f i l e i o . h”
/∗ Returns an i n t e g e r o f ” s i z e ” b y t e s from t h e f i l e ∗/
i n t g e t f i l e i n t s ( FILE ∗ i f p , i n t s i z e ) {
i n t r e t v a l =0;
i f ( f r e a d (& r e t v a l , s i z e , 1 , i f p ) != 1 ) {
if ( feof ( ifp )) {
p r i n t f ( ” Premature end o f f i l e . ” ) ;
} else {
p r i n t f ( ” F i l e read e r r o r . ” ) ;
}
e x i t (EXIT FAILURE) ;
}
return ( r e t v a l ) ;
}
60
30 /∗ The f o l l o w i n g f u n c t i o n s a r e used f o r o u t p u t t i n g d a t a t o f i l e s f o r r e a d i n g
i n m a t l a b or a r e g u l a r t e x t e d i t o r . . .
31 ∗/
32
33 void f l o a t o u t ( f l o a t ∗ b u f f e r , i n t b u f s i z e , char ∗ f i l e n a m e ) {
34 // Write t h e v a l u e s i n t h e f i l e [ f i l e n a m e ] f o r comparing w i t h t h e o r i g i n a l
r e s u l t s in matlab
35
int i ;
36
FILE ∗ o f p ;
37
i f ( ( o f p = f o p e n ( f i l e n a m e , ”w” ) )==NULL) {
38
f p r i n t f ( s t d e r r , ” Could not open t h e f i l e %s f o r w r i t i n g \n” ,
filename ) ;
39
} else {
40
f o r ( i =0; i <b u f s i z e ; i ++){
41
f p r i n t f ( ofp , ”%f \n” , b u f f e r [ i ] ) ;
42
}
43
f c l o s e ( ofp ) ;
44
}
45 }
46
47 void i n t o u t ( i n t ∗ b u f f e r , i n t b u f s i z e , char ∗ f i l e n a m e ) {
48 // Write t h e v a l u e s i n t h e f i l e [ f i l e n a m e ] f o r comparing w i t h t h e o r i g i n a l
r e s u l t s in matlab
49
int i ;
50
FILE ∗ o f p ;
51
i f ( ( o f p = f o p e n ( f i l e n a m e , ”w” ) )==NULL) {
52
f p r i n t f ( s t d e r r , ” Could not open t h e f i l e %s f o r w r i t i n g \n” ,
filename ) ;
53
} else {
54
55
f o r ( i =0; i <b u f s i z e ; i ++){
56
f p r i n t f ( ofp , ”%d\n” , b u f f e r [ i ] ) ;
57
}
58
f c l o s e ( ofp ) ;
59
}
60
61 }
25
waveread.h
The header for the waveread functions contains a typedef of a struct to hold
relevant information from the header of the wave file, as well as prototypes
for the functions.
1
2
3
4
5
6
7
8
9
/∗ Header f i l e
f o r waveread . c ∗/
#i f n d e f WAVEREAD H
#define WAVEREAD H
typedef struct wh {
char ∗ f i l e n a m e ;
int f i l e s i z e ;
int samplerate ;
61
10
i n t num chan ;
11
int b y t e r a t e ;
12
int b l o c k a l i g n ;
13
int b i t s ;
14
int d a t a s i z e ;
15 } wavheader ;
16
17
18
19 // p r o t o t y p e s :
20
21 // ”main” f u n c t i o n − r e t u r n s a s t r u c t c o n t a i n i n g r e l e v a n t i n f o from t h e
h e a d e r and q u i t s t h e program i f t h e f i l e i s n o t a v a l i d wave− f i l e .
22 wavheader wavread head ( FILE ∗ i f p , char ∗ f i l e n a m e ) ;
23
24 // s u b f u n c t i o n f o r c h e c k i n g s t r i n g p a r t s o f t h e h e a d e r − e x i t s i f t h e y a r e
n o t as e x p e c t e d .
25 void c h k h e a d s t r ( FILE ∗ i f p , char ∗ f i l e n a m e , char ∗ h e a d e r ) ;
26
27
28 #endif
26
waveread.c
The waveread file contains functions to read the header of a wave file and
check that file follows a supported format.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <s t d i o . h>
#include < s t d l i b . h>
#include ” waveread . h”
#include ” f i l e i o . h”
/∗ The m a i n f u n c t i o n b e l o w i s n e c e s s a r y t o c o m p i l e t h i s
f i l e s t a n d a l o n e ∗/
/∗
i n t main ( i n t argc , c h a r ∗∗ a r g v ) {
char ∗ filename ;
int f i l e s i z e ;
wavheader whd ;
FILE ∗ i f p , ∗ o f p ;
f i l e n a m e= a r g v [ 1 ] ;
p r i n t f (”%d : %s \n ” , argc −1, f i l e n a m e ) ;
i f ( ( i f p = f o p e n ( f i l e n a m e , ” r b ”) )==NULL) {
f p r i n t f ( s t d e r r , ” Could n o t open t h e f i l e %s f o r r e a d i n g ” ,
filename ) ;
21
22
}
23
24 // Read t h e h e a d e r :
25
whd=w a v r e a d h e a d ( i f p , f i l e n a m e ) ;
62
26
27 // P r i n t r e l e v a n t i n f o :
28
p r i n t f (” F i l e s i z e : %d\n ” , whd . f i l e s i z e ) ;
29
p r i n t f (” Number o f c h a n n e l s : %d\n ” , whd . num chan ) ;
30
p r i n t f (” S a m p l e r a t e : %d\n ” , whd . s a m p l e r a t e ) ;
31
p r i n t f (” B y t e r a t e : %d\n ” , whd . b y t e r a t e ) ;
32
p r i n t f (” B l o c k a l i g n m e n t : %d\n ” , whd . b l o c k a l i g n ) ;
33
p r i n t f (” B i t s p e r sample : %d\n ” , whd . b i t s ) ;
34
p r i n t f (” Data s i z e : %d\n ” , whd . d a t a s i z e ) ;
35
36 // Read t h e c o n t e n t s i n t o an i n t b u f f e r :
37
b u f s i z e = whd . d a t a s i z e / whd . b l o c k a l i g n ;
38
p r i n t f (” Number o f s a m p l e s : %d\n ” , b u f s i z e ) ;
39
40
int buffer [ buf size ] ;
41
f o r ( i =0; i <b u f s i z e ; i ++){
42
b u f f e r [ i ]= g e t h e a d i n t ( i f p , whd . b l o c k a l i g n ) << ( s i z e o f (
u n s i g n e d l o n g ) ∗8−whd . b i t s ) ;
43
}
44
45
fclose ( ifp ) ;
46
e x i t (EXIT SUCCESS) ;
47 }
48 ∗/
49
50 wavheader wavread head ( FILE ∗ i f p , char ∗ f i l e n a m e ) {
51
52
char headchk [ 4 ] ;
53
i n t s a m p l e r a t e , f i l e s i z e , num chan ;
54
wavheader wavhead ;
55
56 // Check f o r RIFF h e a d e r
57
c h k h e a d s t r ( i f p , f i l e n a m e , ”RIFF” ) ;
58
59 // Get f i l e s i z e ( r e s t o f f i l e )
60
wavhead . f i l e s i z e = g e t f i l e i n t s ( i f p , 4 ) ;
61
62 // Check f o r t h e WAVE and ” fmt ” h e a d e r s
63
c h k h e a d s t r ( i f p , f i l e n a m e , ”WAVE” ) ;
64
c h k h e a d s t r ( i f p , f i l e n a m e , ” fmt ” ) ;
65
66 // Check f o r w h e t h e r t h e c o d e c i s PCM:
67
i f ( g e t f i l e i n t s ( i f p , 4 ) !=16) {
68
p r i n t f ( ”The f i l e %s i s not a v a l i d Wave− f i l e ( Not PCM c o d e c )
\n” , f i l e n a m e ) ;
69
fclose ( ifp ) ;
70
e x i t (EXIT FAILURE) ;
71
}
72 // Check w h e t h e r t h e d a t a i s uncompressed
73
i f ( g e t f i l e i n t s ( i f p , 2 ) !=1) {
74
p r i n t f ( ”The f i l e %s i s compressed , and ca nn ot be used \n” ,
filename ) ;
75
fclose ( ifp ) ;
76
e x i t (EXIT FAILURE) ;
77
}
78
79 // g e t t h e number o f c h a n n e l s
63
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
wavhead . num chan = g e t f i l e i n t s ( i f p , 2 ) ;
// g e t s a m p l e r a t e
wavhead . s a m p l e r a t e = g e t f i l e i n t s ( i f p , 4 ) ;
// g e t b y t e r a t e
wavhead . b y t e r a t e = g e t f i l e i n t s ( i f p , 4 ) ;
// g e t B l o c k a l i g n m e n t
wavhead . b l o c k a l i g n = g e t f i l e i n t s ( i f p , 2 ) ;
// g e t B i t s p e r sample
wavhead . b i t s = g e t f i l e i n t s ( i f p , 2 ) ;
// Check SubChunk2 ID (” d a t a ”)
c h k h e a d s t r ( i f p , f i l e n a m e , ” data ” ) ;
// g e t Data S i z e
wavhead . d a t a s i z e = g e t f i l e i n t s ( i f p , 4 ) ;
return ( wavhead ) ;
}
/∗ This f u n c t i o n c h e k s t h e c u r r e n t h e a d e r f o r matching t h e c o r r e c t s t r i n g ∗/
void c h k h e a d s t r ( FILE ∗ i f p , char ∗ f i l e n a m e , char ∗ h e a d e r ) {
char headchk [ 4 ] ;
124
125
126
127
128 }
27
i f ( f r e a d ( headchk , 4 , 1 , i f p ) != 1 ) {
if ( feof ( ifp )) {
p r i n t f ( ” Premature end o f f i l e . ” ) ;
} else {
p r i n t f ( ” F i l e read e r r o r . ” ) ;
}
e x i t (EXIT FAILURE) ;
}
i f ( memcmp( headchk , header , 4 ) ) {
p r i n t f ( ”The f i l e %s i s not a v a l i d Wave− f i l e (No \”% s \”
h e a d e r ) \n” , f i l e n a m e , h e a d e r ) ;
fclose ( ifp ) ;
e x i t (EXIT FAILURE) ;
}
wavelet.h
This file just contains the prototype for the wavelet decomposition.
64
1
2
3
4
5
6
7
8
#i f n d e f WAVELET H
#define WAVELET H
void w a v e l e t d b 4 ( f l o a t ∗ w a v e l e t , i n t s i g l e n ) ;
#endif
28
wavelet.c
The implemented wavelet decomposition, using the lifting scheme on the D4
algorithm.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include
#include
#include
#include
<s t d i o . h>
< s t d l i b . h>
<math . h>
< s t r i n g . h>
#include ” waveread . h”
#include ” w a v e l e t . h”
// r e t u r n t h e D a u b e c h i e s D4 w a v e l e t d e c o m p o s i t i o n from { b u f f e r [ s i g s t a r t ] t o
b u f f e r [ s i g s t a r t+s i g l e n ] } i n w a v e l e t
15 void w a v e l e t d b 4 ( f l o a t ∗ w a v e l e t , i n t s i g l e n ) {
16
int n , i , f i r s t , l a s t , h a l f ;
17
f l o a t tmp , s q r t 3=s q r t ( 3 ) , s q r t 2=s q r t ( 2 ) ; // c a l c u l a t e t h e s q u a r e
r o o t s o n l y once i n s t e a d o f d o i n g i t i n e v e r y i t e r a t i o n t o s a v e
computation .
18
19
20 // The D4 d e c o m p o s i t i o n
21
f o r ( n=s i g l e n ; n > 1 ; n = n >> 1 ) { // f o r w a r d t r a n s f o r m − l e n g t h
i s h a l v e d each i t e r a t i o n
22
23
// S p l i t s t e p ( even e l e m e n t s a r e p l a c e d i n t h e f i r s t h a l f and odd
e l e m e n t s i n t h e s e co n d h a l f )
24
f i r s t =1;
25
l a s t=n−1;
26
while ( f i r s t <l a s t ) {
27
f o r ( i= f i r s t ; i <l a s t ; i=i +2){
28
tmp=w a v e l e t [ i ] ;
29
w a v e l e t [ i ]= w a v e l e t [ i + 1 ] ;
30
w a v e l e t [ i +1]=tmp ;
31
}
32
f i r s t ++;
33
l a s t −−;
34
}
65
35
36
// Forward t r a n s f o r m s t e p − coded from e q u a t i o n s i n s e c t i o n ”A
L i f t i n g Scheme V e r si o n o f t h e D a u b e c h i e s D4 Transform ” a t h t t p
: / /www. b e a r c a v e . com/ m i s l / m i s l t e c h / w a v e l e t s / d a u b e c h i e s / i n d e x . html
// The t r a n s f o r m i s per forme d i n 4 s t e p s :
// 1) u p d a t e ( add u1 ( odd ) t o even )
// 2) P r e d i c t ( s u b t r a c t p ( even ) from odd )
// 3) u p d a t e ( add u2 ( odd ) t o even )
// 4) Normalize
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
half = n/2;
// Update 1
f o r ( i = 0 ; i < h a l f ; i ++){
w a v e l e t [ i ] = w a v e l e t [ i ] + s q r t 3 ∗ w a v e l e t [ h a l f+i ] ;
}
// P r e d i c t
wavelet [ h a l f ] = wavelet [ h a l f ] − ( sqrt3 /4.0) ∗ wavelet [ 0 ] − ( ( (
s q r t 3 −2) / 4 . 0 ) ∗ w a v e l e t [ h a l f −1]) ;
f o r ( i = 1 ; i < h a l f ; i ++){
w a v e l e t [ h a l f+i ] = w a v e l e t [ h a l f+i ] − ( s q r t 3 / 4 . 0 ) ∗
w a v e l e t [ i ] − ( ( ( s q r t 3 −2) / 4 . 0 ) ∗ w a v e l e t [ i −1]) ;
}
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// Update 2
f o r ( i = 0 ; i < h a l f −1; i ++){
w a v e l e t [ i ] = w a v e l e t [ i ] − w a v e l e t [ h a l f+i + 1 ] ;
}
w a v e l e t [ h a l f −1] = w a v e l e t [ h a l f −1] − w a v e l e t [ h a l f ] ;
66
67
68
69
70
71 }
29
// Normalize
f o r ( i = 0 ; i < h a l f ; i ++){
w a v e l e t [ i ] = ( ( s q r t 3 −1.0) / s q r t 2 ) ∗ w a v e l e t [ i ] ;
w a v e l e t [ i+h a l f ] = ( ( s q r t 3 +1.0) / s q r t 2 ) ∗ w a v e l e t [ i+
half ] ;
}
}
ann.h
The header file for the neural networks, contain definitions of relevant parameters, that can be changed at compile time, like the sigmoid functions,
training algorithms, network characteristics etc. Also the prototypes for the
training, testing and run functions are specified.
1 #i f n d e f ANN H
2 #define ANN H
66
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ For t r a i n i n g :
∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗/
// Comment i n t h e t r a i n i n g a l g o r i t h m
//#d e f i n e ANN TRAIN ALG FANN TRAIN INCREMENTAL
//#d e f i n e ANN TRAIN ALG FANN TRAIN BATCH
#define ANN TRAIN ALG FANN TRAIN RPROP // t h i s i s t h e d e f a u l t s e t t i n g
//#d e f i n e ANN TRAIN ALG FANN TRAIN QUICKPROP
// Comment i n t h e wanted t r a n s f e r f u n c t i o n
//#d e f i n e ANN NEURON TF FANN LINEAR
//#d e f i n e ANN NEURON TF FANN LINEAR PIECE
//#d e f i n e ANN NEURON TF FANN LINEAR PIECE SYMMETRIC
//#d e f i n e ANN NEURON TF FANN SIGMOID
#define ANN NEURON TF FANN SIGMOID SYMMETRIC
//#d e f i n e ANN NEURON TF FANN SIGMOID SYMMETRIC STEPWISE
//#d e f i n e ANN NEURON TF FANN SIGMOID STEPWISE
//#d e f i n e ANN NEURON TF FANN THRESHOLD
//#d e f i n e ANN NEURON TF FANN THRESHOLD SYMMETRIC
//#d e f i n e ANN NEURON TF FANN GAUSSIAN
//#d e f i n e ANN NEURON TF FANN GAUSSIAN SYMMETRIC
//#d e f i n e ANN NEURON TF FANN ELLIOT
//#d e f i n e ANN NEURON TF FANN ELLIOT SYMMETRIC
#define NUM OUTPUT
1
// − o u t p u t neurons
#define NUM LAYERS
3
// − l a y e r s
#define NUM HIDDEN
5
// − h i d d e n neurons
0.001
// d e s i r e d e r r o r
#define DES ERR
#define MAX EPOCHS
20000
// 500000
// maximum number o f
training steps
36 #define EPOCHS BETWEEN REPORTS 1000
// how many s t e p s b e t w e e n r e p o r t s ? (
display current error )
37
38
39 /∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
40 ∗ L i m i t s f o r t h e n e t w o r k s
∗
41 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗/
42 #define A4 ST
16 // 32
43 #define A4 END 128 // 64
44 #define A4 LEN A4 END−A4 ST
45
46 /∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
47 ∗ P r o t o t y p e s
∗
48 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗/
49
50 i n t a n n t r a i n ( char ∗ i n f i l e n a m e , char ∗ o u t f i l e n a m e , const unsigned i n t
num input ) ;
51 i n t a n n t e s t ( char ∗ t e s t f i l e n a m e , char ∗ n e t f i l e n a m e ) ;
52 f l o a t ann run ( char ∗ n e t f i l e n a m e , f l o a t ∗ i n d a t a , char ∗ o u t f i l e n a m e ) ;
53
54 #endif
67
30
ann train.c
This function is called, when a network is to be trained. It reads from the
specified input file, trains with the selected number of inputs and writes
the network in the output file. The files must be opened before calling the
function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <s t d i o . h>
#include ” f a n n . h”
#include ” ann . h”
i n t a n n t r a i n ( char ∗ i n f i l e n a m e , char ∗ o u t f i l e n a m e , const unsigned i n t
num input ) {
const
const
const
const
const
const
unsigned i n t num output = NUM OUTPUT;
unsigned i n t n u m l a y e r s = NUM LAYERS;
unsigned i n t n u m n e u r o n s h i d d e n = NUM HIDDEN;
f l o a t d e s i r e d e r r o r = ( const f l o a t ) DES ERR ;
unsigned i n t max epochs = MAX EPOCHS;
unsigned i n t e p o c h s b e t w e e n r e p o r t s = EPOCHS BETWEEN REPORTS;
// C r e a t e FANN s t r u c t , d e f i n i n g t h e ANN
struct f a n n ∗ ann ;
// ANN i s i n i t i a l i z e d from d e f i n i t i o n s i n ann . h
ann = f a n n c r e a t e s t a n d a r d ( n u m l a y e r s , num input , num neurons hidden
, num output ) ;
24
25 // D e f i n e t r a n s f e r f u n c t i o n s f o r t h e neurons − TEST NEURON TF i s d e f i n e d i n
ann . h
26
f a n n s e t a c t i v a t i o n f u n c t i o n h i d d e n ( ann , ANN NEURON TF) ;
27
f a n n s e t a c t i v a t i o n f u n c t i o n o u t p u t ( ann , ANN NEURON TF) ;
28
29 // D e f i n e t r a i n i n g a l g o r i t h m ( n o t n e c e s s a r y − RPROP i s d e f a u l t ) −
TEST TRAIN ALG i s d e f i n e d i n ann . h
30
f a n n s e t t r a i n i n g a l g o r i t h m ( ann , ANN TRAIN ALG) ;
31
32 // t r a i n n e t w o r k
33
p r i n t f ( ” T r a i n i n g on %s : \ n” , i n f i l e n a m e ) ;
34
f a n n t r a i n o n f i l e ( ann , i n f i l e n a m e , max epochs ,
epochs between reports , d e s i r e d e r r o r ) ;
35
36 // Save n e t w o r k
37
p r i n t f ( ”The network i s s a v e d i n %s : \ n\n” , o u t f i l e n a m e ) ;
38
f a n n s a v e ( ann , o u t f i l e n a m e ) ;
39
40 // D e s t r o y n e t w o r k t o f r e e memory :
41
f a n n d e s t r o y ( ann ) ;
42
43 return ( 0 ) ;
68
44
45 }
31
ann test.c
This function does almost the same as the training, except that the network
isn’t trained, but the mean square error of running the network on data of
the same form as the training data is calculated.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <s t d i o . h>
#include ” f a n n . h”
#include ” ann . h”
i n t a n n t e s t ( char ∗ t e s t f i l e n a m e , char ∗ n e t f i l e n a m e ) {
int i ;
// C r e a t e 4 s t r u c t s , d e f i n i n g t h e ANNs − t h e e s e a r e t r e a t e d one a t
t h e time
struct f a n n ∗ ann ;
struct f a n n t r a i n d a t a ∗ data ;
// Load n e t w o r k
p r i n t f ( ” Opening n e t : %s \n” , n e t f i l e n a m e ) ;
ann = f a n n c r e a t e f r o m f i l e ( n e t f i l e n a m e ) ;
i f ( ann == NULL)
{
f p r i n t f ( s t d e r r , ” E r r o r : The n e t f i l e %s ca nn ot be
opened f o r r e a d i n g \n” , n e t f i l e n a m e ) ;
return ( 1 ) ;
}
// Test on f i l e
p r i n t f ( ” T e s t i n g on f i l e : %s \n” , t e s t f i l e n a m e ) ;
data = f a n n r e a d t r a i n f r o m f i l e ( t e s t f i l e n a m e ) ;
i f ( data == NULL)
{
f p r i n t f ( s t d e r r , ” E r r o r : t h e t e s t f i l e %s c an no t be
opened f o r r e a d i n g \n” , t e s t f i l e n a m e ) ;
return ( 1 ) ;
}
f a n n t e s t d a t a ( ann , data ) ;
p r i n t f ( ” Test r e s u l t : MSE o f ANN %s = %f \n” , n e t f i l e n a m e ,
fann get MSE ( ann ) ) ;
// D e s t r o y n e t w o r k t o f r e e memory :
f a n n d e s t r o y ( ann ) ;
return ( 0 ) ;
69
41
42 }
32
ann run.c
This function is used for giving an output from an already trained network.
There appear to be something strange somewhere in the neural network part,
as the output always gives a lot of floating point numbers of the exact same
value.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <s t d i o . h>
#include < s t d l i b . h>
#include ” f a n n . h”
#include ” ann . h”
f l o a t ann run ( char ∗ n e t f i l e n a m e , f l o a t ∗ i n d a t a , char ∗ o u t f i l e n a m e ) {
float ∗ calc out ;
struct f a n n ∗ ann ;
FILE ∗ o u t f i l e ;
// p r i n t f (” ann : run : i n d a t a [31]=% f \n ” , i n d a t a [ 3 1 ] ) ;
// Load n e t w o r k
//
p r i n t f (” Opening n e t : %s \n ” , n e t f i l e n a m e ) ;
ann = f a n n c r e a t e f r o m f i l e ( n e t f i l e n a m e ) ;
// Open f i l e f o r w r i t i n g r e s u l t s
//
p r i n t f (” Open o u t f i l e %s f o r w r i t i n g \n ” , o u t f i l e n a m e ) ;
i f ( ( o u t f i l e = f o p e n ( o u t f i l e n a m e , ” a ” ) )==NULL)
{
f p r i n t f ( s t d e r r , ” E r r o r : %s ca nn ot be opened f o r w r i t i n g \n” ,
outfilename ) ;
// D e s t r o y n e t w o r k t o f r e e memory :
f a n n d e s t r o y ( ann ) ;
return ( 1 ) ;
} else {
26
27
28
29
30
31
c a l c o u t=f a n n r u n ( ann , i n d a t a ) ;
32
f p r i n t f ( o u t f i l e , ”%f \n” , c a l c o u t [ 0 ] ) ;
33
34 //
p r i n t f (” r e s u l t a t : %f \n ” , c a l c o u t [ 0 ] ) ;
35
fclose ( outfile ) ;
36
}
37 // D e s t r o y n e t w o r k t o f r e e memory :
38
f a n n d e s t r o y ( ann ) ;
39
40
return ( c a l c o u t [ 0 ] ) ;
41 }
70
Part V
Matlab Source Code
33
pianocomp.m
1 % s c r i p t e t t a g e r en w a v e s e k v e n s a f de 88 o p t a g e d e t e s t og t r a e n i n g s t o n e r ind
, s k a e r e r dem i b i d d e r a f 3 0 . 0 0 0 og l a v e r en m a t r i c e med
2 % ” backup ” e r den raa w a v e f i l d e r i n d e h o l d e r hhv t r a e n i n g s og t e s t d a t a
3 % f o e r s c r i p t e t k o e r e s k o p i e r e s n o t e s t r i n g o v e r i backup
4
5 d e t e c t e d n o t e s = 8 8 ; % kan a e n d r e s h v i s d e r e k s e m p e l v i s i k k e ” h i t t e s ” paa
a l l e 88 t o n e r . e k s . pga s t o e j .
6 n o t e l e n g t h = 3 0 0 0 0 ; % de 3 0 . 0 0 0 s a m p l e s d e r s k a l u d t a g e s a f h v e r e n k e l t t o n e
7 maxarray = zeros ( 1 , d e t e c t e d n o t e s ) ;% a r r a y d e r i n d e h o l d e r i n d e k s paa de
s t o e r s t e peaks
8
9 n o t e s t r i n g = [ zeros ( 1 , 5 0 0 0 0 ) backup ] ; %z e r o s p a d d e s
10
11 n e w s t r i n g = zeros ( 1 , d e t e c t e d n o t e s ∗ n o t e l e n g t h ) ; % kommer t i l a t i n d e h o l d e en
s t r e n g med a l l e de t i l k l i p p e d e t o n e r
12
13 f o r i = 1 : d e t e c t e d n o t e s
14
[m p ] = max( n o t e s t r i n g ) ; %r e t u r n e r e r i n d e k s e t paa s t o e r s t e ( p o s i t i v e )
amplitude
15
maxarray ( i ) = p ; %i n d e k s e t gemmes
16
n o t e s t r i n g ( p −50000: p+50000) = zeros ( 1 , 1 0 0 0 0 1 ) ; %Peaken o v e r s k r i v e s med
0 , for ikke at give f l e r e h i t s
17 end
18
19 maxarray = sort ( maxarray ) ; %i n d e k s e n e s o r t e r e s
20
21 % a l l e t o n e r n e s m id e s i en l a n g s t r e n g
22 f o r i = 1 : d e t e c t e d n o t e s
23 n e w s t r i n g ( ( ( i −1)∗ n o t e l e n g t h ) +1: i ∗ n o t e l e n g t h ) = backup ( maxarray ( i ) −54999:
maxarray ( i ) −25000) ;
24 end
25
26 t e s t m a t r i x = zeros ( 8 8 , n o t e l e n g t h ) ;% ” t e s t ” m a t r i x u d s k i f t e s med ” t r a i n i n g ”
matrix , og s c r i p t e t k o e r e s i g e n
27
28 f o r i = 1 : 88
29
t e s t m a t r i x ( i , : ) = n e w s t r i n g ( ( ( i −1)∗ n o t e l e n g t h ) +1: i ∗ n o t e l e n g t h ) ; %t o n e r n e
l a e g g e s o v e r i en m a t r i c e
30 end
34
pianomix.m
1 function [ w a v e f i l e , d e t e c t i o n v e c t o r ] = pianomix ( p i t c h , n o t e s ,
no of simultaneous , notematrix )
71
2 %Tager ” p i t c h ” , ” n o t e s ” = a n t a l s e k v e n s e r d e r o e n s k e s g e n e r e r e t , ” n o o f . . . ”
maks . a n t a l s a m t i d i g e t o n e r , m a t r i c e n med t o n e r
3 %r e t u r n e r e r en w a v e f i l og en v e k t o r d e r f o r t a e l l e r i h v i l k e s e k v e n s e r den
oenskede tone er t i l s t e d e
4
5 p i t c h = p i t c h − 2 0 ; %k o n d i t i o n e r e s t i l m a t r i c e n
6
7 i f ( ( pitch < 1) | | pitch > 88)
8
disp ( ’ P i t c h out o f r a n g e ! ’ ) ;
9
return
10 end
11
12 i f ( ( n o t e s < 1 ) | | n o t e s > 5 0 0 0 1 )
13
disp ( ’ Notes out o f r a n g e ! ’ ) ;
14
return
15 end
16
17 w a v e f i l e = zeros ( 1 , 4 0 9 6 ∗ n o t e s ) ;
18
19 d e t e c t i o n v e c t o r = zeros ( 1 , n o t e s ) ; % 0 h v i s tonen i k k e e r t i l s t e d e , 1 e l l e r s
20
21 f o r i = 1 : n o t e s
22
23
p i t c h v e c t o r = zeros ( 1 , 1 0 ) ; % a l l e i k k e −n u l v a e r d i e r u d t r y k k e r en
komposant
24
25
i f rand < 0 . 5 %den o e n s k e d e t o n e medtages
26
detectionvector ( i ) = 1;
27
pitchvector (1) = pitch ;
28
29
e x t r a n o t e s = f l o o r ( n o o f s i m u l t a n e o u s ∗rand ) ; % t i l f a e l d i g t h e l t a l
mellem 0 og 9 , b e g g e i n k .
30
31
i f extranotes ;
32
j = 1;
33
while j <= e x t r a n o t e s
34
p i t c h v e c t o r ( j +1) = c e i l ( 8 8 ∗ rand ) ; % t i l f a e l d i g t h e l t a l mellem
1 og 88 , b e g g e i n k .
35
i f ( length ( u n i q u e ( nonzeros ( p i t c h v e c t o r ( 1 : j +1) ) ) )==j +1)
36
j = j +1; %f i n d ny t i l f a e l d i g tone , h v i s den fundne
a l l e r e d e er medtaget
37
end
38
end
39
end
40
41
e l s e %den o e n s k e d e t o n e medtages i k k e
42
detectionvector ( i ) = 0;
43
44
e x t r a n o t e s = f l o o r ( ( n o o f s i m u l t a n e o u s +1)∗rand ) ; % t i l f a e l d i g t h e l t a l
mellem 0 og 10 , b e g g e i n k .
45
46
i f extranotes ;
47
j = 1;
48
while j <= e x t r a n o t e s
49
p i t c h v e c t o r ( j ) = c e i l ( 8 8 ∗ rand ) ; % t i l f a e l d i g t h e l t a l mellem 1
og 88 , b e g g e i n k .
72
50
i f ( ( length ( u n i q u e ( nonzeros ( p i t c h v e c t o r ( 1 : j ) ) ) )==j ) &&
p i t c h v e c t o r ( j )˜= p i t c h )
j = j +1; %f i n d ny t i l f a e l d i g tone , h v i s den fundne
a l l e r e d e e r medtaget , e l l e r den o e n s k e d e t o n e e r
medtaget
end
51
52
53
54
55
56
57
58
59
60
61
end
end
end
%de 4096 s a m p l e s u d t a g e s f o r s k e l l i g e s t e d e r f r a de 3 0 . 0 0 0 s a m p l e s
f o r k = 1 : length ( nonzeros ( p i t c h v e c t o r ) )
time = round ( 1 5 0 0 0 ∗ rand ) ;
w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i ∗ 4 0 9 6 ) = w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i ∗ 4 0 9 6 ) +
n o t e m a t r i x ( p i t c h v e c t o r ( k ) , time +10001: time +14096) ; %mix
komposition
end
62
63
64
65
66
67
68
69 end
%n o r m a l i s e r i n g
i f (max( w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i ∗ 4 0 9 6 ) ) >1)
w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i ∗ 4 0 9 6 ) = w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i ∗ 4 0 9 6 ) / (
max( abs ( w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i ∗ 4 0 9 6 ) ) + 0. 01 ) ) ; %n o r m a l i s e r i n g
end
35
featureextraction.m
1 function [ o ut pu t ] = f e a t u r e e x t r a c t i o n ( s i g n a l , p i t c h )
2
3 % S i g n a l s k a l v a e r e 4096 s a m p l e s
4
5
%−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− w a v e l e t .m
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
6
7
% Based on m a t l a b code from h t t p : / /www. c o n t r o l . auc . dk /˜ a l c / Fnct −31.m
8
% which i s a s s o c i a t e d w i t h t h e book
9
% ” R i p p l e s i n Mathematics − The D i s c r e t e Wavelet Transform ”
10
% Arne Jensen , Anders l a Cour−Harbo , S p r i n g e r −V e r l a g 2 0 0 1 .
11
% ISBN 3−540−41662−5.
12
% See a l s o h t t p : / /www. c o n t r o l . auc . dk /˜ a l c / r i p p l e s . html
13
14
S=s i g n a l ( 1 : 4 0 9 6 ) ;
15
wl = [ ] ;
16
17
N = length ( S ) ;
18
19
while N>1
20
s 1 = S ( 1 : 2 : N−1) + sqrt ( 3 ) ∗S ( 2 : 2 : N) ;
% update 1
21
d1 = S ( 2 : 2 : N) − sqrt ( 3 ) /4∗ s 1 − ( sqrt ( 3 ) −2) / 4 ∗ [ s 1 (N/ 2 ) s 1 ( 1 :N/2 −1) ] ;
% predict
22
s 2 = s 1 − [ d1 ( 2 :N/ 2 ) d1 ( 1 ) ] ;
% update 2
73
s = ( sqrt ( 3 ) −1)/ sqrt ( 2 ) ∗ s 2 ;
% normalize
d = ( sqrt ( 3 ) +1)/ sqrt ( 2 ) ∗ d1 ;
% normalize
23
24
25
26
27
28
29
30
31
32
wl =[d wl ] ;
N=N/ 2 ;
S=s ;
% s a v e WL t r a n s f o r m i n v e c t o r ( l i k e t h e c−code )
% prepare for next step . .
end
wl =[ s wl ] ;
%
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
33
34
35
o ut pu t = [ wl ( 1 7 : 3 2 ) wl ( 3 3 : 6 4 ) wl ( 6 5 : 9 6 ) wl ( 1 2 9 : 1 6 0 ) ] ;
36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
NNgen.m
%S c r i p t e t o p r e t t e r , t r a e n e r og g e n e r e r e r t e s t d a t e f o r n e t v a e r k
p i t c h = [ 2 1 1 0 8 ] ; %Det i n t e r v a l a f p i t c h e s d e r s k a l t r a e n e s og t e s t e s
Antal testtoner = 1000;
max simultane toner = 10;
R e s u l t = zeros ( 8 8 , A n t a l t e s t t o n e r ) ; %d e t f a k t u e l l e r e s u l t a t
Netoutput = zeros ( 8 8 , A n t a l t e s t t o n e r ) ; %n e t v a e r k s r e s p o n s e t
w a v e l e t a r r a y = zeros ( A n t a l t e s t t o n e r , 1 1 2 ) ;
for m = p i t c h ( 1 ) : p i t c h ( 2 )
load t r a i n i n g m a t r i x ;
%w a v e f i l t i l t r a e n i n g l a v e s og v e k t o r med 0 og 1 l a v e s
[ w a v e f i l e , d e t e c t i o n v e c t o r ] = pianomix ( m, A n t a l t e s t t o n e r ,
max simultane toner , trainingmatrix ) ;
%w a v e l e t d e k o m p o s i t i o n
for i = 1 : A n t a l t e s t t o n e r
w a v e l e t a r r a y ( i , : ) = f e a t u r e e x t r a c t i o n ( w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i
∗ 4 0 9 6 ) ,m) ;
end
clear w a v e f i l e t r a i n i n g m a t r i x
n e t = n e w f f ( w a v e l e t a r r a y ’ , d e t e c t i o n v e c t o r , [ 2 0 30 3 0 ] ) ; % o p r e t t e r e t n y t
%n e u r a l t n e t v a e r k a f f e e d f o r w a r d typen , med 112 i n p u t n e u r o n e r , 3 s k j u l t e
%l a g med hhv .
20 , 30 og 30 neroner , og 1 o u t p u t n e u r o n
net = i n i t ( net ) ; % soerger f o r d e f a u l t
inden t r a e n i n g
i n i t i a l i s e r i n g af vaegte
og b i a s
29
30 n e t . trainParam . e p o c h s = 5 ; % n e t v a e r k e t u d s a e t t e s f o r d a t a s a e t t e t 5 gange
31
32 n e t = t r a i n ( net , w a v e l e t a r r a y ’ , d e t e c t i o n v e c t o r ) ; %n e t t e t t r a e n e s
74
33
34 %n e t t e t gemmes
35 save ( s t r c a t ( ’NN ’ , num2str (m) ) ) ;
36
37 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−t e s t
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
38
39 load t e s t m a t r i x ;
40
41 %w a v e f i l t i l t e s t g e n e r e r e s og d e t e c t i o n v e k t o r e n s k t i v e s t i l en m a t r i c e
42 [ w a v e f i l e , R e s u l t ( i , : ) ] = pianomix ( m, A n t a l t e s t t o n e r ,
max simultane toner , testmatrix ) ;
43
44 f o r i = 1 : A n t a l t e s t t o n e r
45
%w a v e l e t d e k o m p o s i t i o n
46
w a v e l e t a r r a y ( i , : ) = f e a t u r e e x t r a c t i o n ( w a v e f i l e ( ( ( i −1) ∗ 4 0 9 6 ) +1: i ∗ 4 0 9 6 ) ,m)
;
47
48 end
49
50 c l e a r w a v e f i l e t e s t m a t r i x
51
52 %N e t v a e r k e t s i m u l e r e s med t e s t d a t a og r e t u r n e r e n e t v a e r k s r e s p o n s e t
53 Netoutput ( i , : ) = sim ( net , w a v e l e t a r r a y ’ ) ;
54 c l e a r n e t
55
56
57 m
58 end
37
resultpresentation.m
1
2 e r r o r t y p e = zeros ( 7 ) ; %1 : c o r r e c t h i t s , 2 : c o r r e c t m i s s e s , 3 : f a l s e h i t s
3 % 4 : f a l s e miss , 5 ; LSE , 6 : mean e r r o r , 7 : s i g n e x a m i n a t i o n
4
5 f o r i = 21 : 108
6
7
true hits = 0;
8
true misses = 0;
9
f a l s e h i t s = 0;
10
false misses = 0;
11
12
f o r j = 1 : 1000
13
14
i f ( round ( Netoutput ( i , j ) )==1 && R e s u l t ( i , j )==1 )
15
true hits = true hits + 1;
16
end
17
i f ( round ( Netoutput ( i , j ) )==0 && R e s u l t ( i , j )==0 )
18
true misses = true misses + 1;
19
end
20
i f ( round ( Netoutput ( i , j ) )==1 && R e s u l t ( i , j )==0 )
21
f a l s e h i t s = f a l s e h i t s + 1;
22
end
23
i f ( round ( Netoutput ( i , j ) )==0 && R e s u l t ( i , j )==1 )
75
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
false misses = false misses + 1;
end
end
errortype (1 ,
errortype (2 ,
errortype (3 ,
errortype (4 ,
errortype (5 ,
errortype (6 ,
errortype (7 ,
i)
i)
i)
i)
i)
i)
i)
=
=
=
=
=
=
=
true hits ;
true misses ;
false hits ;
false misses ;
sum( power ( Netoutput ( i , : ) −R e s u l t ( i , : ) , 2 ) ) / 1 0 0 0 ;
sum( abs ( Netoutput ( i , : ) −R e s u l t ( i , : ) ) ) / 1 0 0 0 ;
sum( R e s u l t ( i , : ) −Netoutput ( i , : ) ) / 1 0 0 0 ;
end
%
%
%
%
%
%
%
%
%
%
%
%
%
x = [21:108];
plot formatering ;
plot (x , errortype (3 ,:) , ’ r ’ , x , errortype (4 ,:) , ’b ’) ;
s e t ( gca , ’XLim ’ , [ 2 1 1 0 8 ] ) ;
plot formatering ;
plot (x , errortype (5 ,:) , ’ r ’ , x , errortype (6 ,:) , ’b ’) ;
s e t ( gca , ’XLim ’ , [ 2 1 1 0 8 ] ) ;
plot formatering ;
plot (x , errortype (7 ,:) , ’ r ’) ;
s e t ( gca , ’XLim ’ , [ 2 1 1 0 8 ] ) ;
76

Download Report

Piano Transcription using Wavelet Decomposition and - VBN

Paperzz.com

Your Paperzz