Documentation for the MARL-NYU file format Description of the HRIR repository Areti Andreopoulou and Agnieszka Roginska [email protected], [email protected] Music and Audio Research Laboratory New York University October 2011 1 Introduction Every publicly available HRTF database has been captured with different standards (varying azimuth and elevation increments, covering different ranges, at several filter lengths and sample rates), a fact that makes the compilation of a repository a challenge. This document describes the R MARL-NYU MATLAB file format for storing Head-Related Impulse Responses, implemented at the Music and Audio Research Laboratory, at NYU. 2 The MARL-NYU file format In the MARL-NYU format, presented in the 131st AES Convention in New York [2], all measurements are organized in two different components: the data array of structures and the specs structure. Data holds all the location-specific information of the measurements, while specs hold all the general information of a measured HRTF set. An outline of the format can be found in Figure 1. Our format is currently fully supported by ScanIR [3], an application for multi-channel impulse R response measurements in MATLAB , which is available for download at: http://marl.smusic.nyu.edu/projects/scanir/. 1 The MARL-NYU File Format data azimuth sample rate elevation filter type distance subject name specs IR ITD database signal type comments comments Figure 1: Outline of the MARL-NYU file format for storing HRTF datasets 2.1 The data array Every location in a subject0 s set of measurements is stored in an array called data. Each element in the data array is a structure (struct) that encapsulates all the location-specific information, namely: the azimuth and elevation positions, the distance of the sound source from the subject, the left and right ear Head-Related-Impulse-Responses (HRIRs) or Head-Related-Transfer-Functions (HRTFs) and the corresponding ITD value. Every consecutive measurement is stored in a separate struct and is appended to the original array. 2.1.1 Azimuth and Elevation coordinates An HRTF can be realized as a function of azimuth and elevation over time. Azimuth is the angle θ between the vector to the sound source and the median plane, while elevation is the angle φ between the vector to the sound source and the horizontal plane (see Figure 2). In the MARL-NYU format azimuth ranges between -180◦ and + 179◦ , and elevation between -90◦ and +90◦ , such that: (0◦ , 0◦ ) corresponds to a location directly in the front (90◦ , 0◦ ) corresponds to a location directly on the right (-90◦ , 0◦ ) corresponds to a location directly on the left 2 [ θ, φ ] φ θ Figure 2: Azimuth angle θ and Elevation angle φ (-180◦ , 0◦ ) corresponds to a location directly in the back (0◦ , 90◦ ) corresponds to a location directly above head (0◦ , -90◦ ) corresponds to a location directly below The θ and φ values are stored in the azimuth and elevation fields of each data structure. 2.1.2 Distance Distance is a scalar that denotes the length of the HRIR vector to the sound source. In the MARL-NYU format it is measured in meters (m). For cases where this information is not available, the default value is 1m . 2.1.3 HRIRs or HRTFs The IR field in the data array of structs is an N x 2 matrix, where N denotes the length of the filters. The 1st column corresponds to the response of the left ear and the 2nd to that of the right ear. In our suggested format the responses can be stored either as HRIRs or as their frequency-domain equivalent HRTFs. 3 2.1.4 Interaural Time Differences (ITDs) The ITD values are specified in samples. A positive sign is assigned to the ITD for positive azimuths, and a negative for negative ones. More specifically: (IT D < 0) corresponds to sounds coming from the left (IT D > 0) corresponds to sounds coming from the right In cases were the Interaural Delay is incorporated in the IR filter set, the ITD field defaults to 0. 2.1.5 Comments A comments filed is also included in every struct of the data array, allowing the user to store any location specific notes. 2.2 The specs structure All information that is particular to a whole set of HRIR / HRTF measurements (dataset) is stored in a separate structure called specs. This information is most likely to remain unchanged throughout the measurement process. The specs struct also contains all the dataset identifiers, such as the subject and database names, the type of the HRIR filters etc. An important point is that all fields in specs, except for Sample Rate, are of type string. A description of all the fields follows. 2.2.1 Sample rate The sampling frequency of the HRIR filters is stored in the sample rate field.The MARL-NYU format assumes that the same sample rate will be used for a full dataset measurement. In the cases when that is not the case, per-location sample rate values can be stored in the data.comments fields. 2.2.2 Filter type Currently there are two ways of storing HRIRs: either in their original recorded form or eliminating all phase information. These two options are denoted as either “Minimum-Phase” or “Fixed Filters” filters. This field is used as an identifier for the type of filters in each dataset. 4 2.2.3 Subject name An identifier of the subject that each dataset corresponds to can be stored in the specs.subjectName field. 2.2.4 Database name Similarly, an identifier of the database that each dataset originates from can be stored in the specs.database field. This information can be useful when operating on a collection of HRTF sets from different databases. 2.2.5 Signal type Information regarding the excitation signal used in each HRIR dataset measurement can be stored in the Signal-type field. In the MARL-NYU format the most common excitation signals are labeled as follows: “Sine Sweep”, “MLS” and “Golay Codes”. 2.2.6 Comments The comments field in the specs struct can be used to store any further dataset-specific information. 3 HRIR Repository The HRIR repository is a collection of 113 dataset from 4 publicly available HRTF databases, namely the LISTEN, CIPIC, FIU and KEMAR-MIT. A more detailed description of the characteristics of each database can be found in the following section. All datasets were converted to the MARL-NYU file format. No standardization process was applied to the measurements. More specifically, datasets: S001 marl-nyu to S051 marl-nyu originate from the LISTEN database S052 marl-nyu to S096 marl-nyu originate from the CIPIC database S097 marl-nyu to S111 marl-nyu originate from the FIU database S112 marl-nyu to S113 marl-nyu originate from the KEMAT-MIT database 5 The MARL-NYU file format avoids any data redundancy by storing the filters only in their original, recorded form, reducing therefore significantly the size of the data to be stored and handled. Variations in the range, and in the azimuth and elevation increments, among different databases are fully preserved in this HRIR Repository. Note: Files S112 marl-nyu.mat and S113 marl-nyu.mat correspond to the KEMAR-MIT versions with “normal” and “large” pinnae respectively. These files are not equalized. For users interested in using them, a complete set of the speaker and headphone responses can be found in the KEMAR equalization.mat file. Two additional sets of measurements of the KEMAR dummy-head can be found in files S096 marl-nyu.mat (normal pinnae) and S065 marl-nyu.mat (large pinnae). 3.1 Databases 3.1.1 LISTEN The Institute for Research and Coordination Acoustic/Music (IRCAM) in collaboration with AKG has released an HRIR measurement database, as part of the Listen research project [6]. The set that consists of 51 subjects was captured using logarithmic sine-sweep signals at 44100 Hz. 10 different elevations were measured starting at -45◦ and ending at 90◦ in 15◦ vertical increments. The number of azimuth locations varies from 24 (15◦ azimuth increments in 0◦ elevation) to just 1 (in 90◦ elevation). The Impulse Responses are publicly available as 512-point minimum-phase filters with the corresponding ITD values. 3.1.2 CIPIC The CIPIC database was captured at the Center for Image Processing and Integrated Computing, University of California Davis [1]. The set consists of 43 human subjects plus 2 KEMAR mannequins, measured at 50 different elevations from -45◦ to 230.625◦ in 5.625◦ increments, and at 25 azimuth locations (±80◦ , ±65◦ , ±55◦ , and from -45◦ to +45◦ in 5◦ increments). Each Impulse Response is 200 samples long and was captured using Golay-Code signals at 44100 Hz. The distance from the speakers to the subject was adjusted to 1m. 6 3.1.3 FIU The Florida International University DSP Lab has released its own HRTF database in 2010 [5]. The Impulse Responses were captured with the HeadZap system from AuSIM 3D using Golay-Code signals at 96000 Hz. The set includes measurements of 15 human subjects in 6 different elevations (-36◦ , -18◦ , 0◦ , 18◦ , 36◦ , and 54◦ ) and at 12 azimuth locations (every 30◦ ). The deliverable responses are 256-point minimum phase filters with their corresponding ITD values. 3.1.4 KEMAR-MIT The KEMAR-MIT database was captured at the MediaLab Institute of Technology of Massachusetts (MIT) [4], using Maximum Length pseudo-random binary Sequences (MLS), with the speaker placed 1.4m away from the mannequin. A total of 710 locations were recorded; 14 different elevations from -40◦ to 90◦ in 10◦ vertical increments, with the number of corresponding azimuth positions varying from 72 (5◦ horizontal increments), to just 1 (in 90◦ elevation). The resulting Impulse Responses are 512-points long at a sampling frequency of 44100 Hz. 3.2 Functions R 6 MATLAB functions are offered along with the HRIR repository, which allow for basic interaction with the different datasets. All functions were implemented and thoroughly tested in a mac OS X environment, running MATLAB 2010b. The offered functions are the following: findIR.m: Returns the HRIR pair, the sampling rate and the corresponding ITD value, given a specific azimuth-elevation location. viewIR.m: Plots an HRIR pain, given a specific azimuth-elevation location, in both Time and Frequency domains. soundIR.m: Plays back the binaural response of a specific azimuth-elevation location convolved with white noise. findSubject.m: Performs a search given a subject name and returns the corresponding data array and specs struct. findDatabase.m: Performs a search given a database name and returns a cell array with the names of the files that originate from the specific database. 7 exportAudio.m: Converts a given .mat file to a series of audio (.wav) files. Note: It is assumed that the .mat files with the datasets are located in a folder called HRIRrepository whose path is one level up from the function0 s folder. Selected References [1] Algazi, V., Duda, R., Thompson, D., and Avendano, C. (2001). The CIPIC HRTF database. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 99–102. [2] Andreopoulou, A. and Roginska, A. (2011). Towards the creation of a standardized HRTF repository. In 131st AES Convention, New York, NY. [3] Boren, B. and Roginska, A. (2011). Multichannel Impulse Response Measurement in MATLAB. In 131st AES Convention, New York, NY. [4] Gadner, B. and Martin, K. D. (1995). HRTF Measurements of a KEMAR. Journal of the Acoustical Society of America, 97(6):3907–3908. [5] Gupta, N., Barreto, A., Joshi, M., and Agudelo, J. (2010). HRTF database at FIU DSP lab. In International Conference on Acoustics Speech and Signal Processing (ICASSP), pages 169–172, Dallas, TX. IEEE. [6] Warusfel, O. (2003). LISTEN HRTF database, http://recherche.ircam.fr/equipes/salles/listen/. 8
© Copyright 2024 Paperzz