APLcomp Oy 17.10.2017 Mariankatu 17 00170 Helsinki

APLcomp Oy
17.10.2017
Mariankatu 17
00170 Helsinki
AUTHENTICATED IP TELEPHONY AS WEB SERVICE
Two or more parties can conduct authenticated IP based calls, which they initiate by performing
strong authentication utilizing automatic speaker verification (ASV) technology. Discussions are
transcribed to text using automatic speech recognition (ASR) technology. The resulting text
document can be signed or notarized by the intermediating server, which acts as a signing agent, as
well as possibly by each participant’s private keys. In this way, the discussion is documented and
can be made non-reputable between parties. It can also regarded as a mutual binding agreement in
certain cases. The transcribed content can also be translated into each recipient’s own language and
synthesized at recipient’s end. The telephony session can be between multiple persons using
different languages.
The necessary web services are today available from multiple providers, like IBM, Google,
Microsoft, Baidu, Amazon, and ResponsiveVoice. ASV is obviously most immature of the needed
services, especially if the paradigm described in current patent application are utilized. The
invention is a novel combination of
a) hardware that combines a new prototype throat mic;
b) associated collection interface for recording dual-channel audio using a web browser; and
c) associated back-end signal processing and pattern classification modules for ASV, ASR,
speech activity detection (SAD), and voice liveness detection (VLD) module.
The component c) uses a known technique used in automatic speaker verification, referred to as the
Gaussian mixture model – universal background model (GMM-UBM) but adopted with a
specifically engineered “domain adaptation” technique of the acoustic speaker models for the
secondary (throat mic) data. The use of domain adaptation enables utilization of large prior datasets
of conventional acoustic microphone data to be used efficiently for modeling throat-microphone
speech.
These web services will be integrated into a single web browser user interface, using JavaScript
libraries for accessing the all utilized web services, by using their published web APIs. Essential of
this invention is token-free, strong authentication, which can be produced by utilizing speaker’s
voice. A combined dual channel input (close-talking acoustic + throat microphone) can be used to
provide more robust identification as well as to polish speech input for more accurate transcribing
and translation. The voice related functions performed in this environment consist of:






An ASV module responsible for speaker authentication
A voice liveness detection (VLD) module that utilizes both acoustic and throat-microphone
to detect potential ‘replay attacks’ by a fraudster
A method for continuous user authenticity monitoring in a “background listening” mode
Automatic speech recognition
Text-to-text language translation
Text-to-speech synthesis (TTS)
The potential end users might include, for instance, banks for customer interaction over telephony,
health care personnel for dictating medical reports, technical user support, emergency call center
personnel, stock brokers, and remote examination surveillance to verify a student’s identity. In
general, this technology can be applied for any IP telephony, which can make use of the above web
services.
Throat microphone trunk can be complemented by other sensors, like heart rate sensor. The
combined equipment, using bluetooth connection, can be used for remote monitoring of elderly
people, in order to detect fledgling coronary symptoms, for example.
FIGURE 1: Diagram of the envisioned invention from the viewpoint of the user.