The advent of Communication Acoustics in retrospect

Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
PROCEEDINGS of the 22nd International Congress on Acoustics
Communication Acoustics: Paper ICA2016-187
The advent of Communication Acoustics in retrospect
Jens Blauert
Ruhr-Universität Bochum, Germany, [email protected]
Abstract
Communication Acoustics is a cover label for those aspects of acoustics that involve relations
between the classical fields of acoustics and the information and communication technologies.
The usage of the term started around 1974, but it took 42 year until it finally became an explicit
topic at the International Congress of Acoustics, namely, here in Buenos Aires at the ICA 2016.
In the current talk, the history of Communication Acoustics will be recalled, considering the roles
of electro-acoustics, auditory perception and audio-signal processing in the course of the development of this field. In this context, two areas of application will be taken as examples to discuss the essence of Communication Acoustics, namely, (a) Virtual-Reality (VR) generation and
(b) Computational Auditory-Scene Analysis (CASA) ─ both dealing with parametric representations of auditory scenes. In both of these fields the trend can identified of including more explicit
knowledge as well as learning algorithms into Communication-Acoustics systems and their components. For this purpose, proficiency in computational symbol processing is required in terms
of scientific craftsmanship, besides pure signal-processing skills
Keywords: communication acoustics; communication acoustics, definition of;
communication acoustics, history of;
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
The advent of Communication Acoustics in retrospect
1 Introduction
This paper does not present scientific results. It is, in essence, a subjective report by an eyewitness, namely, the author himself, on how he experienced the advent of Communication
Acoustics. He had written a PhD thesis and an inaugural thesis on spatial hearing in the 60th
and was, in 1974, appointed professor in Bochum, Germany, with teaching obligations in electrical-field and network theory. When he disclosed to his faculty colleagues that he intended to
start a research program in Perceptual Acoustics, they were quite concerned as they did not
accept sensory perception as a topic of scientific research. The relevance of this field for the
information technologies was not yet recognized, although this author had already proposed the
basic idea of perceptual coding [1] at that time. Although it is long established that Acoustics
has two aspects to it, the physical, see [2], and the perceptual one, see [3] ─ in fact, the word
Acoustics derived from the ancient Greek word for "to hear" (AKOÝEIN ... ak’u:in) ─ engineers had
strong reservation regarding the perceptual side of it. Thus, the term Perceptual Acoustics was not
accepted, nor was the term Communication Acoustics (sic!), for a research field in engineering,
and we had to settle for Electroacoustics. A took years until this attitude changed, and finally the
Institute of Communication Acoustics in Bochum was officially established ─ the first of its kind
in those days. Nevertheless, this put this author into the position of recognizing his own professional activities as being flanked by two important milestones of modern acoustics, with Communication Acoustics right in the middle between the two.
2 Milestone #1 ─ Electroacoustics
When this author received his basic university education, most academic teachers in Acoustics
were specialized in Electroacoustics ─ for the reason that Acoustics had now taken advantage
Figure 1: Milestone #1 − Electrical engineering joined forces with Acoustics
of technologies from electrical engineering. However, this could only happen at a large scale
after the independent invention of the vacuum triode in by Robert von Lieben and Lee de Forest
2
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
[4], although important communication-technology-related inventions (telephone, telegraphone,
photographophone) had been made much earlier (Fig. 1). But only now a device was finally
available for amplifying “weak” currents. This paved the way for developing applications for a
broader public, such as radio, television, public-address systems, and many relevant military
applications (e.g., radar). Consequently, the adoption of electrical-engineering technologies by
classical acoustics (physical acoustics and perceptual acoustics) marks a milestone of modern
acoustics and lead to an enormous upswing in the field.
3 Milestone #2 ─ Communication Acoustics
At the beginning of the 60th, laboratory computers became available, and it was most likely
M. Schroeder at Bell Labs, who started their application for acoustic-signal processing at a larger scale. After having listened to his famous talk at the Tokyo ICA (Fig. 2) many of us realized
that this will shape the future of acoustics. This author, by the way, spend all his Bochum startup money (about 600,000 $ in today’s value) for acquiring an 8-bit computer with a one-screen
DOS system and 16k (!) floppy disks. For this new field in Acoustics, which developed from an
integration of physics, electrical engineering, computers, and perception, the term Communication Acoustics was soon accepted. An operational definition reads as follows: “Communication
Acoustics deals with those areas of acoustics which relate to the modern communication and information
sciences and technologies.” At least two comprehensive books are meanwhile available in print [5, 6].
Figure 2: Milestone #2 ─ Computers and digital signal processing entered the game
Looking at the essence of communication-acoustics research, a foremost task appears to be
the analysis and synthesis of auditory objects and scenes, and their representation in parametric form [7, 8]. This leads to two prominent application areas, namely, computational auditoryscene analysis (CASA) and generation of (so-called) virtual reality (VR). We start here with the
discussion of the schematic of a bimodal (audio-tactile) VR generator (Fig. 3). Controlled by a
world-model, the system renders acoustic and tactile stimuli to the human observer. Thereby it
continuously receives information from position trackers [9] mounted on head and hand of the
observer. This makes the system interactive ─ what is an important feature, as only interactive
3
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
Figure 3: Schematic of a bimodal VR generator [8]
systems provide VR ─ what makes them more than just displays. But note that interactivity requires fast processing speeds, ideally processing in almost real time. The world model includes
parametric representation of the space to be generated, typically based on a tray-tracing [10] or
image-source model [11], or a combination of the two [12]. The acoustic signals are presented
via headphones [13], and for the tactile rendering a special data glove is needed that employs
tactile and thermal actuators.
In many applications of virtual-reality generators it is aimed at exposing the observers to situations such that they feel perceptively “present” in them. This is especially important for scenarios in which the observers are supposed to act intuitively ─ as they would do in a respective
real environment. Human–system interfaces which base on the principle of virtual reality have
the potency of simplifying human–system interaction considerably. One may think of teleoperation systems, design systems and dialog systems in this context, also of computer games. The
effort involved in creating perceptual presence is task-depending and depends on user requirements. For example, for vehicle simulators the perceptual requirements are less stringent than
for virtual control rooms for sound engineers. In general, virtual reality must appear sufficiently
“plausible” to the observer in order to provide perceptual presence. Since VR systems are just
about to enter the consumer market, various solutions to this problem can be expected.
The schematic shown in Fig. 3 houses, as its core, a “world model”. This is basically a repository that contains detailed descriptions of the space and of all objects which are to exist in the
virtual realty. In one layer of the world model, termed application, rules are listed which regulate
the interaction of the virtual objects with respect to the specific applications intended. Further, a
central-control layer collects the reactions of the subjects which use the virtual-reality
interactively and prompts the system to render appropriate responses. It goes without saying
that, in order to render suitable stimuli to the human observer, the system has to make decision
based on the actual situation and the tasks assigned to them. Depending on the tasks, this requires specific world knowledge and/or capabilities for autonomous learning in order to enable
4
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
suitable cognitive functions. Indeed, current implementation of such systems can be distinguished by the level of intelligence and knowledge that they are furnished with.
We shall now discuss the second representative application area of Communication Acoustics
as announced above, namely, systems for computational auditory-scene analysis (CASA). As
an example a system is taken, the architecture of which was originally conceptualized in
Bochum [14, 15] and, after substantial refinement, was adopted by both the AABBA initiative [16]
and the EU-project “TWO!EARS” < www.twoears.eu >. Fig. 5 provides a block diagram of it.
Figure 4: Architecture of an advanced CASA system [15]
5
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
(A) The input signals to the systems are given by the two ear signals as recorded from a
dummy head (head-and-torso simulator). The head is mounted on a robotic platform
and is movable in 3 degrees of freedom (2 translatory ones plus head rotation).
(B) Filters that mimic the middle ears (sloppy band-pass filters) follow.
(C) The next processing step consists of a simulation of the two cochlea (spectral
decomposition in critic bands, compression, generation of neural spikes and their
probability of appearance as a function of time).
(D) The outputs of the two cochlea modules enter a binaural processor that computes
binaural activity, providing information on binaural attributes, such as loudness, pitch,
interaural arrival-time differences (ITDs) and interaural level differences (ILD). Monaural processing is considered in parallel.
(E) Here the output of module (D) is visualized as a time-variant binaural-activity map with
the coordinates time, laterality, and intensity. There is evidence that similar representations exist in biological systems.
(F) The information from the binaural-activity map is analyzed to the end of extracting relevant features from it, suited for the further analysis. Since Gestalt rules are considered
in this process, this stage is labelled “Gestalt experts”. Please note that here a
transition from signal processing to symbol processing takes place, since features are
denoted by labels.
(G) From the feature sets rendered by stage (F), “proto-events” are formed by applying appropriate rule sets and machine-learning procedures. The output of this stage is evidence for specific events being identified, including confidence data on whether they
actually occur. This stage is thus denoted “event experts”.
(H)
All information available from lower stages on proto-events, the features that
characterize them, and the respective confidence intervals, are stored on a “blackboard”. The backboard consists of various graphical models, in which these items and
their mutual probabilistic relationships are stored.
(I)
The blackboard is not only accessible from the lower model stages, but also from a
stage on top of it that consists of a set of expert programs, each of which is knowledgeable with regard to specific scenes and/or tasks ─ for instance, search-and-rescue scenarios, or the assessment of quality of aural experience in multi-channel loudspeaker
settings. The experts do not only have specific world knowledge but also know the
rules which govern their specific scenes and tasks. Under the control of a scheduler
program – so-to-say, the chairperson of the experts – the experts check the information
available on the blackboard, try to make sense out of it, and infer back into it.
(K) Finally the blackboard puts out a task-specific response, for example, a scene description or a quality judgement.
6
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
An important property of the architecture plotted in Fig. 4 is that it allows for feedback loops in
the course of processing. We know that feedback also exist in biological system. The following
feedback loops are currently discussed. Some of them are cognitively controlled [15].
‒
To improve localization accuracy, head movements are performed, properly controlled
by mimicking human strategies when exploring aural scenes.
‒
Feedback is used to change processing parameters in stages B─D, like adjusting auditory-filter bandwidths, changing spectral weights in combining information across
filters, adjustment of operating points of temporal adaptation processes, or providing additional information to support auditory-stream segregation.
‒
On the “cognitive” level of our model, feedback can be integrated by treating the graphical models as active blackboard architecture. Higher level processes in applicationspecific subsystems – such as an expert on scene analysis – can set variables according to their specific intentions, and after an inference in the graphical model has been
carried out accordingly, it will be visible how higher-level feedback corresponds with the
rules and observations of the system – and what implications can be drawn from it.
‒
Feedback has to be employed in sound-quality assessment, where auditory percepts are
to be compared to internal references that represent listeners’ preferences [16, 17, 18].
While former binaural models for scene analysis consisted of model corresponding to the stages
A─E, leaving it to human experts to further analyse and judged on the basis of the binauralactivity maps [8], the approach reported in Fig. 5 includes cognitive processing. The goal is to
finally substitute the human expert. Yet, although it has to be admitted that the realization of the
cognitive part is still in its early stage, it is clear to see that the new approach contains element
that reach beyond the scope of today’s Communication Acoustics ─ and thus marks a further
milestone of this field.
4 Milestone #3 ─ Communication Acoustics becomes
cognitive!
From what has been discussed above, it becomes clear that Communication Acoustics is about
to break the limits of acoustic-signal processing. New topics are joining the traditional mix of
physics, sensory perception, electrical engineering, and acoustic-signal processing. In the first
place, the systems get more intelligent. This means that they are now equipped with cognitive
functions, or in other words, “brains” are implemented on them. Referring to our two examples,
namely, VR and CASA, the good news is that different systems may share knowledge, for instance, as regards ground-truth data. Fig. 6 illustrates that in the CASA system such groundtruth, for instance, regarding auditory scenes, is embedded in the experts, while in the VR generator it is part of the world model.
i
7
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
Figure 6: Milestone #3: Communication Acoustics becomes cognitive!
(a) CASA system, cognitive part, (b) VR generator
A further issue is that systems with inherent knowledge need be able to adapt their knowledge
according to the specific situations and tasks at stake. Machine-learning techniques [19] are
increasingly applied for this purpose. Fortunately, cognitive functions as well as learning algorithms are basically independent of specific sensory modalities. Further, starting from the assumption that human beings form their perceptual world in an active explorative process and
use all their senses to acquire information for this purpose, it goes without saying that advanced
CASA system have to consider multi- and cross-modal information. This strongly supports a
current demand in technology, that is, the development of multimodal and multi-media
applications.
This third milestone is rather an ongoing process and can be associated with a general trend in
the information and communication technologies. Actually, in some sub-areas of Communication Acoustics it has already been passed by.
For example, modern speech-recognition systems incorporate information such as domain
knowledge, semantic networks, language models, word models, grammatical, syntactic,
phonotactic and phonetic models, being represented in form of rules, fuzzy logics, transition
probabilities, look-up tables, dictionaries, and so on. It was probably also in speech technology
that it first became obvious that, for task of speech recognition, bottom-up (signal-driven)
processing does not suffice but had to be complemented by top-down (hypotheses-driven)
procedures, and consequently, the modelling of functions which are located more centrally in
the human nervous system. An important task in this context is the collection of data and the
8
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
training of the intelligent systems. It is not far from the mark to expect the upcoming Big-Data
technologies to initiate a major boost in this respect.
A further major challenge for CASA is the assignment of meaning to auditory objects [20, 21].
Human beings do not react according to what they perceive, but rather, they react on the
grounds of what their percepts mean to them in their current action-specific, emotional and
cognitive situation. Again it is to be expected that the Big-Data technologies will contribute
essentially to this kind of tasks, that is, assigning meaning to objects and scenes.
5 Conclusions
Modern information, communication and control systems frequently contain components which
deal with the analysis and synthesis of auditory scenes, whereby these components are commonly "embedded" in more complex systems. To evaluate their function in isolation is often impossible or it leads to irrelevant results. Thus, sophisticated test-beds are needed for this
purpose. In any case, Communication Acoustics represents an integral constituent of the
modern information technologies and should be seen and rated in the context of these.
Taking these current trends into account, it is obvious that Communication Acoustics will have
to open the doors for new relevant topics, such as cognition, multimodality, and interactivity. It
certainly would not survive as a stringently bounded discipline. This also means that just
42 years after the introduction of the term Communication Acoustics, the meaning of it will
change considerably towards a broader concept.
Students that aim at working their way into Communication Acoustics are strongly advised to
acquire skills not only in signal processing but also in symbol processing. Further, besides on
acoustics, EE and perceptual acoustics, they should keep an open eye on machine learning,
cognitive psychology, cognitive physiology and, last but not least, on modern trends in robotics.
Acknowledgments
The author thanks his former PhD students ─ for a complete list see [5]. The compilation of this
paper was supported by the EU-Project “Two!Ears” (FP7-ICT-2013-C-#618075).
References
[1]
Blauert, J., Trittart, P., (1975), Ausnutzung von Verdeckungseffekten bei der Sprachkodierung
(Exploiting masking in speech coding), Fortschr. Akustik, DAGA'75, 377−380, Physiker-Verlag,
Weinheim, Germany
[2]
Lord Rayleigh (J.W. Strutt) (1869, 1877) The theory of sound, Vols. 1, 2. MacMillan, New York
[3]
Von Helmholtz, H. (1863) Die Lehre von den Tonempfindungen als physiologische Grundlage
für die Theorie der Musik (Sensation of tone als a physiological basis for the theory of music)
Vieweg und Sohn, Braunschweig, Germany
[4]
Bosch, B. (2001) Lee de Forest – “Vater des Radios” (Lee de Forest – “father of radio”). Funk
Gesch. 24:5–22 and 24:57–73
9
nd
22 International Congress on Acoustics, ICA 2016
Buenos Aires – 5 to 9 September, 2016
st
Acoustics for the 21 Century…
[5]
Blauert, J. (ed.), Communication Acoustics, Springer, Berlin−Heidelberg
[6]
Pullki, V., Karjalainen, M. (2015) Communication Acoustics: An introduction to speech, audio
and psychoacoustics (2015), Wiley, Hoboken NJ
[7]
Blauert, J. (2002) Instrumental analysis and synthesis of auditory scenes: “Communication
nd
Acoustics”, Proc. 22 Int. Conf. Audio Engr. Soc. Virtual Synthesis, Entertainment and Audio,
387-395, Audio Engr. Soc, New York NY
[8]
Blauert, J. (2005), Analysis and synthesis of auditory scenes, in: J. Blauert (ed.), Communication Acoustics, 1−26, Springer, Berlin−Heidelberg
[9]
Börger,G., Blauert J, Laws, P. (1977) Stereophone Kopfhörerwiedergabe mit Steuerung bestimmter Übertragungsfaktoren durch Kopfdrehbewegungen (Stereophonic headphone reproduction with variations of specific transfer factors by head rotations. Acustica 39:22–26
[10] Krokstadt, A., Strøm, S., Sørsdahl, S. (1968) Calculating the acoustical room response by use
of a ray-traycing technique. J. Sound Vibr. 8:118–125
[11] Allen, J. B., Berkley, D. A. (1979) Image method for efficiently simulating small-room acoustics,
J. Acoust. Soc. Am. 65, 943−950
[12] Lehnert, H. (1992) Binaurale Raumsimulation: Ein Computermodell zur Erzeugung virtueller
auditiver Umgebungen (A computer model for the generation of auditory virtual environments).
Doct diss, Ruhr-Univ. Bochum, Shaker, Aachen, Germany
[13] Hammershøi, D., Møller, H. (2005) Binaural technique: Basic methods for recording, synthesis
and reproduction, Chap. 9 in: Blauert, J. (ed.), Communication Acoustics, Springer,
Berlin−Heidelberg−New York NY
[14] Blauert, J. (1999) Binaural auditory models: architectural considerations. Proc 18
Symp. 189–206. Scanticon, Kolding, Denmark
th
Danavox
[15] Blauert. J. and Obermayer, K. (2012), Rückkopplungswege in Binauralmodellen (Feedback
loops in binaural models), Fortschr. Akust. DAGA’12, 2015–2016, Dtsch. Ges. Akustik, Berlin,
Germany
[16] Blauert, J., Braasch, J., Buchholz, J., Colburn, H.S., Jekosch, U., Kohlrausch, A., Mourjopoulos,
J., Pulkki, V. and Raake, A. (2010), Aural assessment by means of binaural algorithms – the
AABBA project. In: Buchholz, J.M., Dau, T., Dalsgaard, J.C. & Poulsen, T. (eds.) Binaural Prond
cessing and Spatial Hearing, Proc. 2 Int. Symp. Auditory & Audiolog. Res. − ISAAR’09, 113–
124, Danavox Jubilee Foundation, Ballerup, Denmark
[17] Raake, A., Wierstorf, H., Blauert, J. (2014), A case for TWO!EARS in audio-qualiy assessment.
th
Proc. 7 FORUM ACUSTICUM, Paper SS16-19, Krakòw, Poland
[18] Raake, A., Blauert, J. (2013), Comprehensive modeling of the formation process of sound-quality. Proc. QoMEX 2013. Klagenfurt, Austria
[19] Blauert, J., Kolossa, D. Obermayer, & Adiloglu, K. (2013) Further challenges and the road
ahead. In J. Blauert (ed.), The technology of binaural listening, 477—502. Springer, Berlin–
Heidelberg–New York−Dordrecht−London, and ASA Press, New York NY
[20] Jekosch, U. (2005) Assigning meaning to sounds – Semiotics in the context of product sound
design In: Blauert, J. (ed.), Communication Acoustics, 193−221, Springer, Berlin−Heidelberg
[21] Jekosch, U. (1999) Meaning in the context of sound quality (1999), Acta Acustica united with
Acustica 85:681−684
10