Multimedia Search and Retrieval: Innovative Applications

Multimedia Search and Retrieval: Innovative Applications and Integration of
Searchable Images, Graphs, Video, and Audio to Augment and Complement
Textual Search Results
Lorrie Apple Johnson and Brian Hitson
U.S. Department of Energy
Office of Scientific and Technical Information
Behrooz Chitsaz
Microsoft Research
In 2008, under the auspices of the International Council for Scientific and Technical Information
(ICSTI)’s Technical Activities Coordinating Committee (TACC), the concept for a project on
multimedia search and retrieval was conceived. The project’s focus was to explore approaches
for enhancing access to multimedia scientific data by demonstrating an identification scheme for
video, audio, images, graphs, and other visualization tools as well as the integration of these
media types into the search and retrieval capabilities of prominent scientific search engines. This
final project report discusses objectives, methodology, technology use and adaptation, and
outcomes, including the successful release of the U.S. Department of Energy’s ScienceCinema
(http://www.osti.gov/sciencecinema/) website, developed in partnership with Microsoft
Research.
Video, audio, images, and other types of multimedia have the potential to greatly enhance the
usefulness and communicative abilities of traditional text-based information collections. These
new forms of scientific information, including multimedia, numeric data, and social media, are
emerging rapidly, with a significant increase observed just over the past four years. Many
scientific conferences and symposia, for example, are now recorded, and the presentations are
offered to attendees and others in video format. Most of the Department of Energy (DOE)
research laboratories have created their own YouTube sites, and are posting videos of guest
lectures, experimental procedures, and roundtable discussions. Likewise, the European
Organization for Nuclear Research (CERN) also maintains a large collection of video and audio
files showing scientific experiments and lectures. Other organizations, such as professional
societies and universities, collect and maintain multimedia science information, and continued
proliferation of multimedia as a communication medium within the sciences is foreseen.
While visually appealing, and offering numerous benefits above and beyond text-based
information, multimedia information does present some special opportunities and challenges.
Some of the challenges to traditional search and retrieval mechanisms include the lack of written
transcripts, minimal metadata (no abstracts, keywords, or subject categories), and complex
scientific/technical/medical vocabulary. Additionally, many of these videos are long, up to an
hour or more in length. For a scientist interested in only one part of a video or experiment,
watching a sixty minute video could pose a substantial time burden.
To overcome some of these challenges and barriers, the DOE’s Office of Scientific and
Technical Information (OSTI) partnered with Microsoft Research on this project. OSTI’s
mission is to disseminate the research results emanating from the Department of Energy’s
research investment in basic and applied sciences (over 10 billion USD per year). Since the
1940’s, OSTI has carried out this mission for DOE and its predecessor agencies. Initially,
collection of research results in the form of technical reports and other publications in paper
format was the standard procedure. In the 1990’s, the transition to publications in electronic
formats occurred. Today, yet another transition is underway, towards the collection of
multimedia science information.
Facilitated by the ICSTI TACC, OSTI and Microsoft Research formed a collaborative
partnership to work on search and retrieval methods for multimedia science information. A
Microsoft research project called Microsoft Research Audio Video Indexing System (MAVIS)
was identified as an excellent candidate for use in a prototype system. MAVIS uses state-of-theart audio indexing technology to enable search and retrieval of spoken words within a video,
performing much like a search for text-based information.
Although speech recognition accuracy has increased over the years as a result of improved
algorithms and faster, cheaper computational capabilities, automatic transcripts of technical
speech content are often not accurate enough to provide high quality search results. To enable
high quality search results, along with the ability to “click and play,” MAVIS uses a multi pass
approach with automatic vocabulary adaptation that uses web search and natural language
processing to learn the terminology used in each multimedia file being processed. During the
speech recognition phase, MAVIS stores word alternatives to increase the probability that users
will find what they are searching for. The result of the recognition phase is referred to as the
Audio Index Blob (AIB), which is indexed by the SQL full text indexing service, ultimately
resulting in speech content that can be searched in much the same way as textual metadata.
Given that the multi-pass speech recognition process is computationally intensive, the MAVIS
speech recognition process runs as a Windows Azure service, so organizations do not have to
invest in speech indexing infrastructure.
Over the course of about 2 years, OSTI collected over 1,000 hours of video files from the DOE’s
national laboratories and research facilities. RSS feeds with metadata and URLs for the videos
were sent to Microsoft Research, where audio indexing was performed using MAVIS. The
resulting AIBs were returned to OSTI, and imported into OSTI’s SQL servers. The end result
allows users to search for a precise term within the video, and be directed to the exact point in
the video where the term was spoken. Using MAVIS technology, OSTI developed a web
product called ScienceCinema, which was officially announced and released at the ICSTI Winter
Workshop in Redmond, WA on 9 February 2011. ScienceCinema represents a ground-breaking
capability in multimedia search and retrieval. For the first time, the public has access to a large
audio-indexed and searchable video collection of DOE-sponsored science information.
ScienceCinema Website (http://www.osti.gov/sciencecinema)
A search for U.S. Department of Energy videos containing the spoken term “biofuels”
Snippets indicate the points in the video where the speaker said the word “biofuels”
Following the release of ScienceCinema, OSTI and Microsoft were approached by CERN,
which, as stated earlier, also maintains a large collection of scientific and technical videos and
other multimedia information. The U.S. Department of Energy has long had a productive
relationship with CERN, which is a world leader in physics research. CERN volunteered its
multimedia material for inclusion in ScienceCinema, and a partnership was formed with OSTI to
apply the speech indexing technology to CERN files and to make them searchable through
ScienceCinema. The first installment of CERN multimedia content was added in May 2011, and
additional content will be added on an ongoing basis.
Examples of Videos/Audio Files from CERN containing the spoken term “particles”
As part of OSTI’s ongoing processes for collecting and disseminating DOE’s R&D results,
additional multimedia content from DOE researchers, from within both the research laboratory
and university communities, is also expected to be added to ScienceCinema over time.
In June 2011, at the ICSTI Annual Conference in Beijing, yet another milestone was reached in
multimedia search and retrieval of science information. By combining federated search and
speech-indexed technology, OSTI announced a new tool in scientific discovery. Online searches
for scientific information within large search portals, such as WorldWideScience.org, had
heretofore been limited to text-based information. Now, users of such portals have access to
multimedia information, and can search and view multimedia alongside textual information on
the same topic.
ScienceCinema results appearing in WorldWideScience.org
By using the speech-recognition search technology made possible through ScienceCinema (via
MAVIS), the ability to search for multimedia information through portals like
WorldWideScience.org vastly extends the availability and accessibility of multimedia science
information. ScienceCinema, by enabling easy access to multimedia science information,
maximizes the use of DOE and CERN research results, which could ultimately lead to new
breakthroughs and benefits to the scientific community.