APLcomp Oy 17.10.2017 Mariankatu 17 00170 Helsinki AUTHENTICATED IP TELEPHONY AS WEB SERVICE Two or more parties can conduct authenticated IP based calls, which they initiate by performing strong authentication utilizing automatic speaker verification (ASV) technology. Discussions are transcribed to text using automatic speech recognition (ASR) technology. The resulting text document can be signed or notarized by the intermediating server, which acts as a signing agent, as well as possibly by each participant’s private keys. In this way, the discussion is documented and can be made non-reputable between parties. It can also regarded as a mutual binding agreement in certain cases. The transcribed content can also be translated into each recipient’s own language and synthesized at recipient’s end. The telephony session can be between multiple persons using different languages. The necessary web services are today available from multiple providers, like IBM, Google, Microsoft, Baidu, Amazon, and ResponsiveVoice. ASV is obviously most immature of the needed services, especially if the paradigm described in current patent application are utilized. The invention is a novel combination of a) hardware that combines a new prototype throat mic; b) associated collection interface for recording dual-channel audio using a web browser; and c) associated back-end signal processing and pattern classification modules for ASV, ASR, speech activity detection (SAD), and voice liveness detection (VLD) module. The component c) uses a known technique used in automatic speaker verification, referred to as the Gaussian mixture model – universal background model (GMM-UBM) but adopted with a specifically engineered “domain adaptation” technique of the acoustic speaker models for the secondary (throat mic) data. The use of domain adaptation enables utilization of large prior datasets of conventional acoustic microphone data to be used efficiently for modeling throat-microphone speech. These web services will be integrated into a single web browser user interface, using JavaScript libraries for accessing the all utilized web services, by using their published web APIs. Essential of this invention is token-free, strong authentication, which can be produced by utilizing speaker’s voice. A combined dual channel input (close-talking acoustic + throat microphone) can be used to provide more robust identification as well as to polish speech input for more accurate transcribing and translation. The voice related functions performed in this environment consist of: An ASV module responsible for speaker authentication A voice liveness detection (VLD) module that utilizes both acoustic and throat-microphone to detect potential ‘replay attacks’ by a fraudster A method for continuous user authenticity monitoring in a “background listening” mode Automatic speech recognition Text-to-text language translation Text-to-speech synthesis (TTS) The potential end users might include, for instance, banks for customer interaction over telephony, health care personnel for dictating medical reports, technical user support, emergency call center personnel, stock brokers, and remote examination surveillance to verify a student’s identity. In general, this technology can be applied for any IP telephony, which can make use of the above web services. Throat microphone trunk can be complemented by other sensors, like heart rate sensor. The combined equipment, using bluetooth connection, can be used for remote monitoring of elderly people, in order to detect fledgling coronary symptoms, for example. FIGURE 1: Diagram of the envisioned invention from the viewpoint of the user.
© Copyright 2026 Paperzz