Download Documentation

Documentation for the MARL-NYU file format
Description of the HRIR repository
Areti Andreopoulou and Agnieszka Roginska
[email protected], [email protected]
Music and Audio Research Laboratory
New York University
October 2011
1
Introduction
Every publicly available HRTF database has been captured with different standards (varying
azimuth and elevation increments, covering different ranges, at several filter lengths and sample
rates), a fact that makes the compilation of a repository a challenge. This document describes the
R
MARL-NYU MATLAB
file format for storing Head-Related Impulse Responses, implemented
at the Music and Audio Research Laboratory, at NYU.
2
The MARL-NYU file format
In the MARL-NYU format, presented in the 131st AES Convention in New York [2], all
measurements are organized in two different components: the data array of structures and the
specs structure. Data holds all the location-specific information of the measurements, while specs
hold all the general information of a measured HRTF set. An outline of the format can be found in
Figure 1.
Our format is currently fully supported by ScanIR [3], an application for multi-channel impulse
R
response measurements in MATLAB
, which is available for download at:
http://marl.smusic.nyu.edu/projects/scanir/.
1
The MARL-NYU File Format
data
azimuth
sample rate
elevation
filter type
distance
subject name
specs
IR
ITD
database
signal type
comments
comments
Figure 1: Outline of the MARL-NYU file format for storing HRTF datasets
2.1
The data array
Every location in a subject0 s set of measurements is stored in an array called data. Each element
in the data array is a structure (struct) that encapsulates all the location-specific information,
namely: the azimuth and elevation positions, the distance of the sound source from the subject, the
left and right ear Head-Related-Impulse-Responses (HRIRs) or Head-Related-Transfer-Functions
(HRTFs) and the corresponding ITD value. Every consecutive measurement is stored in a separate
struct and is appended to the original array.
2.1.1
Azimuth and Elevation coordinates
An HRTF can be realized as a function of azimuth and elevation over time. Azimuth is the
angle θ between the vector to the sound source and the median plane, while elevation is the angle
φ between the vector to the sound source and the horizontal plane (see Figure 2).
In the MARL-NYU format azimuth ranges between -180◦ and + 179◦ , and elevation between
-90◦ and +90◦ , such that:
(0◦ , 0◦ ) corresponds to a location directly in the front
(90◦ , 0◦ ) corresponds to a location directly on the right
(-90◦ , 0◦ ) corresponds to a location directly on the left
2
[ θ, φ ]
φ
θ
Figure 2: Azimuth angle θ and Elevation angle φ
(-180◦ , 0◦ ) corresponds to a location directly in the back
(0◦ , 90◦ ) corresponds to a location directly above head
(0◦ , -90◦ ) corresponds to a location directly below
The θ and φ values are stored in the azimuth and elevation fields of each data structure.
2.1.2
Distance
Distance is a scalar that denotes the length of the HRIR vector to the sound source. In the
MARL-NYU format it is measured in meters (m). For cases where this information is not available,
the default value is 1m .
2.1.3
HRIRs or HRTFs
The IR field in the data array of structs is an N x 2 matrix, where N denotes the length of the
filters. The 1st column corresponds to the response of the left ear and the 2nd to that of the right ear.
In our suggested format the responses can be stored either as HRIRs or as their frequency-domain
equivalent HRTFs.
3
2.1.4
Interaural Time Differences (ITDs)
The ITD values are specified in samples. A positive sign is assigned to the ITD for positive
azimuths, and a negative for negative ones. More specifically:
(IT D < 0) corresponds to sounds coming from the left
(IT D > 0) corresponds to sounds coming from the right
In cases were the Interaural Delay is incorporated in the IR filter set, the ITD field defaults to 0.
2.1.5
Comments
A comments filed is also included in every struct of the data array, allowing the user to store
any location specific notes.
2.2
The specs structure
All information that is particular to a whole set of HRIR / HRTF measurements (dataset) is
stored in a separate structure called specs. This information is most likely to remain unchanged
throughout the measurement process. The specs struct also contains all the dataset identifiers, such
as the subject and database names, the type of the HRIR filters etc. An important point is that all
fields in specs, except for Sample Rate, are of type string. A description of all the fields follows.
2.2.1
Sample rate
The sampling frequency of the HRIR filters is stored in the sample rate field.The MARL-NYU
format assumes that the same sample rate will be used for a full dataset measurement. In the cases
when that is not the case, per-location sample rate values can be stored in the data.comments fields.
2.2.2
Filter type
Currently there are two ways of storing HRIRs: either in their original recorded form or
eliminating all phase information. These two options are denoted as either “Minimum-Phase”
or “Fixed Filters” filters. This field is used as an identifier for the type of filters in each dataset.
4
2.2.3
Subject name
An identifier of the subject that each dataset corresponds to can be stored in the specs.subjectName
field.
2.2.4
Database name
Similarly, an identifier of the database that each dataset originates from can be stored in the
specs.database field. This information can be useful when operating on a collection of HRTF sets
from different databases.
2.2.5
Signal type
Information regarding the excitation signal used in each HRIR dataset measurement can be
stored in the Signal-type field. In the MARL-NYU format the most common excitation signals are
labeled as follows: “Sine Sweep”, “MLS” and “Golay Codes”.
2.2.6
Comments
The comments field in the specs struct can be used to store any further dataset-specific
information.
3
HRIR Repository
The HRIR repository is a collection of 113 dataset from 4 publicly available HRTF
databases, namely the LISTEN, CIPIC, FIU and KEMAR-MIT. A more detailed description of the
characteristics of each database can be found in the following section. All datasets were converted
to the MARL-NYU file format. No standardization process was applied to the measurements.
More specifically, datasets:
S001 marl-nyu
to
S051 marl-nyu
originate from the LISTEN database
S052 marl-nyu
to
S096 marl-nyu
originate from the CIPIC database
S097 marl-nyu
to
S111 marl-nyu
originate from the FIU database
S112 marl-nyu
to
S113 marl-nyu
originate from the KEMAT-MIT database
5
The MARL-NYU file format avoids any data redundancy by storing the filters only in their
original, recorded form, reducing therefore significantly the size of the data to be stored and
handled. Variations in the range, and in the azimuth and elevation increments, among different
databases are fully preserved in this HRIR Repository.
Note:
Files S112 marl-nyu.mat and S113 marl-nyu.mat correspond to the KEMAR-MIT versions with
“normal” and “large” pinnae respectively. These files are not equalized. For users interested in
using them, a complete set of the speaker and headphone responses can be found in the KEMAR
equalization.mat file. Two additional sets of measurements of the KEMAR dummy-head can be
found in files S096 marl-nyu.mat (normal pinnae) and S065 marl-nyu.mat (large pinnae).
3.1
Databases
3.1.1
LISTEN
The Institute for Research and Coordination Acoustic/Music (IRCAM) in collaboration with AKG
has released an HRIR measurement database, as part of the Listen research project [6]. The set
that consists of 51 subjects was captured using logarithmic sine-sweep signals at 44100 Hz. 10
different elevations were measured starting at -45◦ and ending at 90◦ in 15◦ vertical increments.
The number of azimuth locations varies from 24 (15◦ azimuth increments in 0◦ elevation) to just
1 (in 90◦ elevation). The Impulse Responses are publicly available as 512-point minimum-phase
filters with the corresponding ITD values.
3.1.2
CIPIC
The CIPIC database was captured at the Center for Image Processing and Integrated Computing,
University of California Davis [1]. The set consists of 43 human subjects plus 2 KEMAR
mannequins, measured at 50 different elevations from -45◦ to 230.625◦ in 5.625◦ increments, and
at 25 azimuth locations (±80◦ , ±65◦ , ±55◦ , and from -45◦ to +45◦ in 5◦ increments). Each
Impulse Response is 200 samples long and was captured using Golay-Code signals at 44100 Hz.
The distance from the speakers to the subject was adjusted to 1m.
6
3.1.3
FIU
The Florida International University DSP Lab has released its own HRTF database in 2010 [5]. The
Impulse Responses were captured with the HeadZap system from AuSIM 3D using Golay-Code
signals at 96000 Hz. The set includes measurements of 15 human subjects in 6 different elevations
(-36◦ , -18◦ , 0◦ , 18◦ , 36◦ , and 54◦ ) and at 12 azimuth locations (every 30◦ ). The deliverable
responses are 256-point minimum phase filters with their corresponding ITD values.
3.1.4
KEMAR-MIT
The KEMAR-MIT database was captured at the MediaLab Institute of Technology of Massachusetts (MIT) [4], using Maximum Length pseudo-random binary Sequences (MLS), with
the speaker placed 1.4m away from the mannequin. A total of 710 locations were recorded; 14
different elevations from -40◦ to 90◦ in 10◦ vertical increments, with the number of corresponding
azimuth positions varying from 72 (5◦ horizontal increments), to just 1 (in 90◦ elevation). The
resulting Impulse Responses are 512-points long at a sampling frequency of 44100 Hz.
3.2
Functions
R
6 MATLAB
functions are offered along with the HRIR repository, which allow for basic
interaction with the different datasets. All functions were implemented and thoroughly tested in a
mac OS X environment, running MATLAB 2010b.
The offered functions are the following:
findIR.m: Returns the HRIR pair, the sampling rate and the corresponding ITD value, given a
specific azimuth-elevation location.
viewIR.m: Plots an HRIR pain, given a specific azimuth-elevation location, in both Time and
Frequency domains.
soundIR.m: Plays back the binaural response of a specific azimuth-elevation location convolved
with white noise.
findSubject.m: Performs a search given a subject name and returns the corresponding data array
and specs struct.
findDatabase.m: Performs a search given a database name and returns a cell array with the names
of the files that originate from the specific database.
7
exportAudio.m: Converts a given .mat file to a series of audio (.wav) files.
Note:
It is assumed that the .mat files with the datasets are located in a folder called HRIRrepository
whose path is one level up from the function0 s folder.
Selected References
[1] Algazi, V., Duda, R., Thompson, D., and Avendano, C. (2001). The CIPIC HRTF database. In
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 99–102.
[2] Andreopoulou, A. and Roginska, A. (2011). Towards the creation of a standardized HRTF
repository. In 131st AES Convention, New York, NY.
[3] Boren, B. and Roginska, A. (2011).
Multichannel Impulse Response Measurement in
MATLAB. In 131st AES Convention, New York, NY.
[4] Gadner, B. and Martin, K. D. (1995). HRTF Measurements of a KEMAR. Journal of the
Acoustical Society of America, 97(6):3907–3908.
[5] Gupta, N., Barreto, A., Joshi, M., and Agudelo, J. (2010). HRTF database at FIU DSP lab. In
International Conference on Acoustics Speech and Signal Processing (ICASSP), pages 169–172,
Dallas, TX. IEEE.
[6] Warusfel, O. (2003). LISTEN HRTF database, http://recherche.ircam.fr/equipes/salles/listen/.
8