Real-Time Speech-to-Text Services FAQs about real-time

December 2015
Real-Time Speech-to-Text Services
Speech-to-text services increase the real-time access options for students who are deaf or hard
of hearing. While speech-to-text services may not be the solution for all students, those who
depend visual communication and do not know sign language will definitely benefit. Other
students may choose interpreters in highly interactive setting but prefer captioning in more
lecture based classes.
FAQs about real-time speech-to-text
What is speech-to-text?
“Speech-to-text” is an umbrella term used to describe
an accommodation where spoken communication and
other auditory information are translated into text in
real-time. A service provider types what is heard and
the text appears on a screen for the consumer to read.
Are there different types of speech-to-text?
Speech-to-text services can be divided into three
general categories: verbatim, meaning-for-meaning, and automatic speech recognition (ARS).
Verbatim speech-to-text service providers type every word spoken, including false starts, misspeaks, and
filler phrases. One hour of lecture will produce approximately 25 pages of transcript. CART
(Communication Access Realtime Translation) is a verbatim system that utilizes technology the
same as that used in court reporting.
Meaning-for-meaning service providers listen to the spoken language then translate it into
grammatically correct written language. They will typically eliminate false starts and misspeaks.
They use visual formatting such as bold, italics, and lists to reduce the number of words typed.
One hour of lecture will produce approximately 15 pages of transcript. C-Print and TypeWell
are meaning-for-meaning systems.
Automatic Speech Recognition (ASR) is software that translates spoken words into text. The
most effective method is to use a shadow voicer who repeats everything that is said, by all
members of the class, into a microphone. A shadow voicer also verbalizes all speaker
identification and punctuation. Current technology is not at the point where a microphone can
be put on an instructor and produce an accurate transcript.
Each system then displays the text on a computer monitor or other device for the consumer to
read.
How do I choose between a verbatim and a meaning-for-meaning system?
Choosing verbatim or meaning-for-meaning will depend on the needs of the consumer.
Some consumers will prefer verbatim because they can use their residual hearing to follow the
speaker and use the transcript as support for words or phrases that are missed. Also, students
in higher level classes may prefer to see every word that is spoken.
Some consumers will prefer meaning-for-meaning because the transcript is less dense and
easier to follow since spoken English is translated into more standard written English.
Before choosing, review the pros and cons of each system, in the specified setting, with the
consumer. It is also important to hire qualified service providers since the quality of the
captioning depends on the service provider, not the system being used.
Speech-to-Text in the College Setting
Traditionally students who are deaf have used
interpreting services in the educational setting.
However, there were few accommodations available
for students who do not use interpreters. They might
have sat in the front of the classroom, used an
assistive listening device, and received copies of
notes from a classmate but there were few options for
real-time access. Speech-to-text services have
opened up new options in communication access for
many individuals who have a significant hearing loss
but do not know sign language.
In the 1980’s CART began to be used in the postsecondary setting. C-Print was developed in
the early 1990’s and TypeWell was founded in 1999. With each new option the demand for
speech-to-text in the classroom has grown. Research shows that real-time captioning benefits
not only the student with a hearing loss but all students who have access to the captions. A
study done by Aaron Steinfeld compared recall accuracy of students in a traditional lecture
environment with those who were in a lecture with the addition of real-time captions. The
results showed that the recall accuracy of hearing students went up by 9.8% while the recall
accuracy of deaf students increased by 149% (https://www.dcmp.org/caai/nadh275.pdf).
Here is what students and disability service professionals have to say about speech-to-text
services:
“When I went to the disability services office at the college, I couldn’t believe they actually had
captioning. I wish they would have had that in high school. Academically the closed captioning
helped me keep up with the class, the discussions in class, the open discussions. Because it
was challenging to kind of keep up with a group and kind of look at everybody’s lips to read their
lips. So the captions helped me out a lot in that area. It was very convenient.” (Student)
http://www.pepnet.org/resources/speech-to-text
“In a university setting, I’m held to the same high standards as all the other students yet, I do not
receive the same amount of information without captioning. With captioning, I was able to keep
up and compete on the same level as my fellow students. When put on the same level as my
fellow students, I obtained my goal by graduating from a major state university with top
honors!” (Student)
https://weconnectnow.wordpress.com/2011/08/27/the-impact-of-c-print-captioning-for-college-studentswho-are-deaf-or-hard-of-hearing/
“Most students who are deaf or hard of hearing have not seen speech-to-text services until they
came to our university. They are surprised and impressed with how much the real-time
captioning helps them in their classes. I wish all colleges would offer this accommodation. It
makes a big different for the students.” (University Disability Services Coordinator)
If you’d like to learn more check out these resources:
Speech to Text Services: An Overview of Real-Time Captioning
http://www.pepnet.org/resources/speech-to-text
A Guide to Speech-to-Text Services in the Postsecondary Setting
http://www.pepnet.org/resources/guidespeechtextservices
National Task Force on Quality Services in the Postsecondary Education of Deaf and Hard of
Hearing Students: Report on Real-Time Speech-to-Text Services
http://www.pepnet.org/resources/ntfspeech%20to%20text
The Benefit to the Deaf of Real-Time Captions in a Mainstream Classroom Environment
https://www.c2ccaptioning.com/aslnews.pdf
Communication Apps
Recently a popular reality TV show featured a deaf
contestant using an app that converts speech into
text and vice versa. The technology appeared to
give the deaf contestant the opportunity to interact
with other contestants without using an interpreter.
Does the app work that well or is this just another
TV misconception like, “all deaf individuals can
lipread perfectly even in the dark?”
This warranted a closer look at the available
speech/text apps and technology to answer the
question, “Is this feasible for daily use?” A review of various speech-to-text apps and text-tospeech programs resulted in a list of benefits and drawbacks.
BENEFITS:
1. If a deaf or hard of hearing individual has near-standard speech usage, the speech-totext apps can be an excellent way to take personal notes on the move. Several Diction
programs advertise to populations who live with processing challenges, such as ADHD
or dyslexia, and this can benefit those who process their information better verbally.
2. Some text-to-speech apps can be programmed with a set of established phrases. Using
this feature would be a great way to bypass writing in some interactions that occur
frequently, such as ordering coffee from the barista or asking for an appointment. This
would be effective in a limited number of scenarios.
3. The speech-to-text app can increase comprehensibility if a deaf or hard of hearing
individual needs some assistance understanding their conversation partner. This would
work during one-on-one interactions, or where only one person is speaking at a time.
4. There is a high number of program and app options. Some come pre-installed in
smartphones, such as Siri or Google’s VoiceNote, while others can be purchased and
downloaded through online stores.
DRAWBACKS:
1. Across the board, the apps and programs are designed to either convert speech into
text, or text into speech, so two separate programs would need to be utilized to convert
from text to speech and then from speech to text. The time it will take to switch between
programs does not lend itself to effective outcomes.
2. The apps and programs, for the large part, are not successful in understanding speech
that is non-standard or heavily accented. Some speech-to-text programs, after a long
period of input, will be able to decipher one individual’s speech variations, but this does
not factor in the speech of other users. This means the apps can’t be taken around the
community and used to understand just anyone.
3. Very few programs offer a trial period, which necessitates purchasing several versions to
see which one works best with non-standard speech. With the average speech/text app
costing $9.99 each, and high quality programs going for $199 and up, with no guarantee
that the program will work with non-standard speech, it can become expensive.
RESULTS:
These apps and programs can be used successfully in highly specific situations, but technology
has not yet gotten to where these programs are easy, flexible, effective, or feasible to use on a
daily basis in a variety of contexts.
If you’d like to learn more about these type apps, try these search phrases in your favorite
search engine, iTunes Apps Store, and Google Play: text to speech, speech to text, text to
voice, and voice to text.