Understand you conveyed meaning, and talk expressively

Agent-AI
A-7
Diversified input/output (DIO) speech recognition and speech synthesis technologies for automatic
interactive agent
Understand you conveyed meaning,
and talk expressively
We are developing diversified input/output (DIO) speech recognition technologies that can understand not only text information
but also a speaker’s emotion and intention from input speech, and DIO speech synthesis that can put thoughtful feeling into
output speech. Our technologies can contribute to an automatic interactive agent that empathizes with users.
Senior caring robot
Contact center automatic answering agent
Hmm…
■ Speech nuance extraction: it can understand the speaker’s
What is this
regarding?
(Anxiously)
What’s the matter, grandpa?
emotion and true intention from speech input.
■ Speaker identification for families: it can identify from a short
Hmm…,
let me see…
Oh, I’m looking for my glasses.
speech which family member is speaking.
■ DNN-based speech recognition: it can achieve accurate and fast
dictation based on deep learning technology.
(Slowly and kindly)
(Cheerfully)
You always put them on your desk.
Do you mean an
advance booking?
Nuance: distress
User: grandpa
■ User-designed speech synthesis: it can synthesize expressive
Nuance: impatience
What’s the matter,
grandpa?
Dictation
“Hmm”
Nuance extraction
Distress
Speaker identification
Grandpa
DIO speech
recognition
DIO speech
synthesis
speech suited to each user.
Application Scenarios
Diversified input/output (DIO) speech technologies
Hmm…
Features
■ Interactive agent that can enrich daily family life
■ Robot to provide care for senior citizens
■ Automatic answering agent in contact center.
Dialogue
manager
“What’s the matter,
grandpa?”
Anxiously
Role in Agent-AI
Our technologies aim to contribute to interactive dialogue agents
by utilizing various kinds of information in speech.
〈Contact〉[email protected]
Copyright © 2016 NTT. All Rights Reserved.