Reach-Through-the-Screen: A New Metaphor for Remote

Reach-Through-the-Screen: A New Metaphor
for Remote Collaboration
Jonathan Foote, Qiong Liu, Don Kimber, Patrick Chiu, and Frank Zhao
FX Palo Alto Laboratory Inc., 3400 Hillview Ave., Palo Alto, CA 94304 USA
Abstract. For some years, our group at FX Palo Alto Laboratory has
been developing technologies to support meeting recording, collaboration, and videoconferencing. This paper presents several systems that
use video as an active interface, allowing remote devices and information
to be accessed “through the screen.” For example, SPEC enables collaborative and automatic camera control through an active video window.
The NoteLook system allows a user to grab an image from a computer
display, annotate it with digital ink, then drag it to that or a different
display. The ePIC system facilitates natural control of multi-display and
multi-device presentation spaces, while the iLight system allows remote
users to “draw” with light on a local object. All our systems serve as
platforms for researching more sophisticated algorithms to support additional functionality and ease of use.
1
Introduction
The Immersive Conferencing group at FX Palo Alto Laboratory investigates
novel technologies for communication and collaboration. At FXPAL, our approach is to extend commodity hardware and software with novel approaches
and interfaces that add functionality and simplify ease of use. Our work depends on a long tradition of research in this area by workers at Xerox PARC,
MIT, Sony, UNC, Stanford, and elsewhere, and we apologize that space does not
permit more than a cursory review of related work.
1.1
“Reach-Through-the-Screen” Interaction
Many of our systems are unified by a common interaction metaphor we call
“RTS,” for “reach-through-the-screen.” In RTS, a video image of a particular
location becomes an active user interface. We use the intuition that “things on
the network are closer than they appear:” that is, devices visible in the video
are available for control over the network, even if they are in a remote location.
Devices and functions visible in the video image become available for the user
to control via the video. For example, the video image of a display device becomes an active interface to that device: via RTS, a user may drag and drop
a presentation file onto the video image, where it is then displayed. Similarly,
dragging a file onto a printer image will case the printer to print that file. By
RTS metaphor, while the iLight camera-projector system
allows real-world objects to be annotated from a video
image. Drawing on the video image causes the drawn
annotations to be projected onto the scene. With all these
systems, we have implemented a basic robust version as a
platform for both use and research. In many cases, we plan
to augment the systems with automatic functionality from
computer vision, sensors, and/or machine learning
techniques.
Immersive Conferencing Directions at FX Palo Alto Laboratory
3. THE FlySPEC CAMERA SYSTEM
Jonathan Foote, Qiong Liu, Don Kimber, and Patrick Chiu
2 A traditional pan/tilt/zoom (PTZ) camera system
FX Palo Alto Laboratory, 3400 Hillview
Ave.,
Palomultiple
Alto, CA
94304
USAto point the camera to
cannot
allow
remote
users
different
positions
at
the
same
time. To serve different
ABSTRACT
Participant
Nameis to capture
Highlight
09:22 a straightforward
viewing requests,
approach
some years, our group at FX Palo Alto Laboratory the event with a panoramic camera that covers every
been developing technologies to support meeting possible view and serve different viewing requests through
rding, collaboration, and videoconferencing. This electronic pan/tilt/zoom. However, Hamano
a panoramic camera
r presents a few of our more interesting research
ctions. Many of our systems use a video image as an generally lacks the required resolution. FlySPEC is the
face, allowing devices and information to be accessed name of our SPot Enhanced Camera system constructed by
ough the screen." For example, SPEC enables hybrid combining motorized PTZ cameras with a panoramic
aborative and automatic camera control through an camera [3]. Using this combined system, users view a
e video window. The NoteLook system allows a user
low-resolution panoramic view of a scene simultaneously
rab an image from a computer display, annotate it
digital ink, then drag it to that or a different display, with a customized close-up video. The panoramic video is
e automatically generating timestamps for later video the same for all users. The close-up video for a user is
ew. The ePIC system allows natural use and control of selected by marking a region in panoramic video using a
i-display and multi-device presentation spaces, and gestures. Thus the panoramic video serves as an interface:
Scribbled
Light system allows remote users to “draw” with light
by clicking
or circling objects in the panorama yields a
Annotation
local object. All our systems serve as platforms for
arching more sophisticated algorithms that will zoomed-in video close-up.
fully support additional advanced functions and ease
e.
1. INTRODUCTION
Immersive Conferencing group at FX Palo Alto
oratory investigates
novel
technologies
for
munication and collaboration. We have been
rding and reusing meetings for more than five years,
have more recently started to investigate remote
munication scenarios like teleconferences. At FXPAL,
are not in a position to compete with commodity
oconferencing hardware from vendors such as
Com, nor is it practical to reinvent the functions of
ware videoconferencing tools like NetMeeting or
X. Instead, our approach is to extend commodity
ware and software with novel approaches and
faces that add functionality and simplify ease of use.
2. “REACH-THROUGH-THE-SCREEN”
INTERACTION
y of our systems are unified by a common interaction
phor we call “RTS,” for “reach-through-the-screen.”
TS, a video image of a particular location becomes an
e user interface. We use the intuition that “things on
network are closer than they appear:” that is, devices
le in the video are available for control over the
Document
we optimize the view returned to each user by maximizing
a cost function. Our FlySPEC system currently uses the
overall electronic zoom factor as the cost function.
Specifically, our management software tries to move PTZ
cameras to positions that minimize the average electronic
zoom for all users, thus resulting in the best images for all
[2]. In the absence of human control, the FlySPEC system
learns from past controls. Our current research is to
associate sensors and machine learning algorithms to find
the optimal camera location and learn from human actions.
For example, given motion and audio sensors, we could
learn that motion near the podium results in camera
movement from human controllers, while motion near the
door does not.
Control
Interface
Overview
Window
User
Selection
Close-up
Window
Device Control
drag &
dropautomatic control
3.1. Camera management
and
Figure
1: “reachsystem
throughhas
the ascreen”
interface:
A FlySPEC
limited
numberaccess
of physical
Fig.
1.
(Left)
“Reach
screen”
devices
through
active
video window.
PTZ
cameras.
A an
basic
function
of through
the FlySPECthe
system
is
Figure 4: FlySPEC user interface. Circling a region
in the panoramic video points the PTZ camera for
interface
device access through an
a close-up allows
view
network, even
if they window.
are in a remote location. Devices and
active
video
functions visible in the video image become available for
the user to control via the video. For example, the video
image of
deviceFlySPEC
becomes an active
interface
to
Fig.
2.a display
(Right)
user
interface.
that device: via RTS, a user may drag and drop a
the
PTZfilecamera
for image,
a close-up
presentation
onto the video
where it is view
then
displayed. Similarly, dragging a file onto a printer image
will case the printer to print that file. By interacting with
an image, users are not encumbered with headgear or
special eyeglasses. This is a non-immersive “window-onthe-world” (WoW) style of augmented reality [1]. Figure 1
shows some of the interaction possibilities available with
RTS (though we have not fully implemented all of them).
For example, right-clicking on an audio device could call
up the control panel for that device. A hallmark of RTS
interaction that in many cases it gives remote participants
richer control than local users. In this paper, we review
four related systems that use the RTS metaphor. The
FlySPEC system allows a camera to be controlled by
pointing or circling regions of interest in the video. The
Circling a region in the panoramic video points
interacting with an image, users are not encumbered with headgear or special
eyeglasses. This is a non-immersive “window-on-the-world” (WoW) style of augmented reality [1]. Figure 1 shows some of the interaction possibilities available
with RTS (though we have not fully implemented all of them). For example,
right-clicking on an audio device could call up the control panel for that device.
A hallmark of RTS interaction that in many cases it gives remote participants
richer control than local users. In this paper, we review four related systems that
use the RTS metaphor. The FlySPEC system allows a camera to be controlled
by pointing or circling regions of interest in the video. The NoteLook system
allows presentation graphics to be “dragged” from a video image, annotated,
and then “dropped” onto a display device to display the annotations. The ePIC
system allows simple control of multi-screen presentation displays and devices
using the RTS metaphor, while the iLight camera-projector system allows realworld objects to be annotated from a video image. Drawing on the video image
causes the drawn annotations to be projected onto the scene. With all these
systems, we have implemented a basic robust version as a platform for both use
and research. In many cases, we plan to augment the systems with automatic
functionality from computer vision, sensors, and/or machine learning techniques.
2
The FlySPEC Camera System
A traditional pan/tilt/zoom (PTZ) camera system cannot allow multiple remote users to point the camera to different positions at the same time. To serve
different viewing requests, a straightforward approach is to capture the event
with a panoramic camera that covers every possible view and serve different
3
Fig. 3. Multiple user requests for SPEC camera zoom
Pan/Tilt/Zoom
Camera
ra system
on graphics to be
notated, and then
to display the
s simple control of
devices using the
ra-projector system
tated from a video
causes the drawn
ene. With all these
robust version as a
many cases, we plan
c functionality from
machine learning
SYSTEM
Z) camera system
point the camera to
To serve different
proach is to capture
that covers every
ng requests through
panoramic camera
n. FlySPEC is the
Figure 3: Multiple camera view requests
viewing requests through electronic pan/tilt/zoom. However, a panoramic cameratogenerally
lacks the
required resolution.
FlySPEC isand
the name
of our SPot
minimize
conflicting
PTZ requests
maximize
the
Enhanced Camera system constructed by combining motorized PTZ cameras
view
quality camera
for all[3].remote
users.
Thissystem,
is done
serving
with
a panoramic
Using this
combined
usersby
view
a lowresolution
panoramic with
view ofaa combination
scene simultaneously
with a customized
close-up
view requests
of cropped
and zoomed
video. The panoramic video is the same for all users. The close-up video for a
images from both cameras. Each user therefore controls a
user is selected by marking a region in panoramic video with a gesture. Thus
camera”
thatas behaves
like
a personal
PTZ
camera,
the“virtual
panoramic
video serves
an interface:
clicking
or circling
objects
in the
panorama
yields
a
zoomed-in
video
close-up.
(but may have reduced resolution). This is a problem for
remote applications like classes, seminars, or sports
Camera management and automatic control
games.
To tackle the problem caused by multiple users,
A we
FlySPEC
systemthe
hasview
a limited
number to
of each
physical
PTZby
cameras.
A basic
optimize
returned
user
maximizing
function of the FlySPEC system is to minimize conflicting PTZ requests and
a cost function. Our FlySPEC system currently uses the
maximize the view quality for all remote users. This is done by serving view
overall
zoom
as images
the from
costbothfunction.
requests
with electronic
a combination of
croppedfactor
and zoomed
cameras.
Each
user
therefore
controls
a
“virtual
camera”
that
behaves
like
a
personal
Specifically, our management software tries to move
PTZ
PTZ camera, (but may have reduced resolution). This is a problem for remote
cameraslike
toclasses,
positions
that
minimize
average
electronic
applications
seminars,
or sports
games. the
To tackle
the problem
caused
byzoom
multiplefor
users,
we
optimize
the
view
returned
to
each
user
by
maximizing
a
all users, thus resulting in the best images for all
cost function. Our FlySPEC system currently uses the overall electronic zoom
[2].asInthethe
of humanourcontrol,
the software
FlySPEC
factor
costabsence
function. Specifically,
management
tries system
to move
PTZ
camerasfrom
to positions
minimize the
average
electronicresearch
zoom for all is
users,
learns
pastthat
controls.
Our
current
to
thus resulting in the best images for all [2]. For example, Figure 2 shows multiple
associate sensors and machine learning algorithms to find
human zoom requests as rectangles; this is satisfied by zooming the camera to the
the optimal
camera
location
andresearch
learn isfrom
bounding
box of all
requests.
Our current
to usehuman
machineactions.
learning
to For
associate
sensor
data
with
human
control
actions
to
automatically
example, given motion and audio sensors, we develop
could
good camera control strategies. For example, given motion and audio sensors,
thatthatmotion
near
podium
in camera
welearn
could learn
motion near
the the
podium
results in results
camera movement
from
human
controllers,
while
motion
near
the
door
does
not.
movement from human controllers, while motion near the
door does not.
2.1
Control
Interface
4
Figure 6: NoteLook pen annotation interface
Figure 5: NoteLook pen annotation client
active panoramic
image interface
with hotspots
Figure showing
6: NoteLook
pen annotation
showing active panoramic image with hotspots
Fig. 4. (Left) NoteLook pen annotation
client
image with the panoramic image, so that hotspots become
Figure 5: NoteLook pen annotation client
active
the zoomed-in
well. become
image with
theinpanoramic
image,video
so thatashotspots
3.2.
Camera
registration
and calibration
Fig.
5. (Right)
NoteLook
pen annotation interface
active
panoramic image
active in the showing
zoomed-in video
as well.
3.2.
Camera
registration
calibration
Panoramic
video is and
supplied
by the FlyCam system
with
hotspots
4. THE NoteLook COLLABORATION SYSTEM
Panoramic
supplied
by the FlyCam
developedvideo
at ourislab,
which employs
multiple system
cameras and 4. THE
TheNoteLook
NoteLookCOLLABORATION
annotation system isSYSTEM
designed to be run on
developed
at
our
lab,
which
employs
multiple
cameras
and
stitches the video images together [4]. A two-cameraThe NoteLook
annotation
system
is NoteLook
designed to allows
be run on
a
wireless
pen
tablet
[5].
collaboration
stitches
video images
together
[4].with
A two-camera
systemthe
provides
a panoramic
video
a horizontal fielda wireless
pen
tablet [5].
NoteLook
allows
collaboration
and
annotation
around
digital
presentation
graphics, using
system
provides
a panoramic video
with a horizontal
field
of view
of approximately
110 degrees.
(Recently,
we haveand annotation
around
digital
presentation
graphics,
using
digital ink. In operation, the NoteLook system displays a
2.2constructing
Camera
of view
approximately
110registration
degrees. (Recently, and
we havecalibration
In operation, the NoteLook system displays a
been of
cheaper
but low-resolution
panoramicdigital ink.
thumbnail
view of the conference room in the top left
been constructing cheaper but low-resolution panoramic
view of the conference room in the top left
cameras by equipping inexpensive webcams with wide-thumbnail
corner. Panoramic video shows a view of the environment
cameras by equipping inexpensive webcams with widecorner. Panoramic video shows a view of the environment
angle
lenses.)
A
chief
asset
of
the
panoramic
camera
is
with hotspots
overlaid
on the appearing
displays appearing
in the
angle lenses.) A chief asset of the panoramic camera is
with hotspots
overlaid
on the displays
in the
Panoramic
video
is supplied
by the
FlyCam
system
developed
atthat
our
lab, which
that
is fixed
respect
the scene.
This
allows
that
it isit fixed
withwith
respect
to thetoscene.
This allows
many many
video “Hotspots”
image. “Hotspots”
correspond
to active devices
video
image.
that
correspond
to
active
devices
sophisticated
functions
be accomplished.
For
example,the video
employs
multiple
cameras
and
stitches
images
[4]. on
A
twosophisticated
functions
to betoaccomplished.
For example,
are outlined
with together
colored rectangle
overlays
are outlined
with
colored
rectangle
overlays
the
liveon the live
registering
devices
the
appear
the
registering
devices
withwith
the
areasareas
theya they
appear
in thein video
video Besides
image.
Besides
the
two
video
channels
camera
system
provides
panoramic
with
a
horizontal
field
of
view
of
video image.
the two video
channels
from
the from the
panoramic
image
allows
be manipulated
through
panoramic
image
allows
themthem
to betomanipulated
through
front
and
back
meeting
room additional
cameras,
additional
channels
fronthave
and
back
meeting
room
cameras,
channels
approximately
110
degrees.
(Recently,
we
been
constructing
cheaper
but
that
area
without
having
to locate
with expensive
that
area
without
having
to locate
them them
with expensive
and and
may be may
hooked
duringupa during
teleconference
to interact with
be up
hooked
a teleconference
to interact with
fragile
computer
vision
techniques.
fragile
computer
vision
techniques. cameras by equipping
low-resolution
panoramic
with
widedisplaysdisplays
atinexpensive
a remote
A user canAgrab
image
at alocation.
remotewebcams
location.
userancan
grabofan image of
3.3.
Devices
andand
Hotspots
3.3.
Devices
Hotspots
a slide
onthat
one of
the
wall
draggingby dragging
ashowing
slide is
showing
on
one
of displays
thewith
wallby
displays
angle
lenses.)
A
chief
asset
of
the
panoramic
camera
it
is
fixed
respect
Availability
and
accessibility
of
devices
are
graphically
Availability and accessibility of devices are graphicallyfrom that
display’s
hotspot into
the note
page
area. page
The area. The
from
that display’s
hotspot
the note
to the
scene.
This
allows
sophisticated
functions
without
having
tointo
compensate
depicted
by by
the
hotspots
overlaid
on the
video video
depicted
the
hotspots
overlaid
on active
the active
image appears
in full resolution,
and can beand
drawn
upon
image
appears
in
full
resolution,
can
be
drawn upon
window.
In practice,
we used
the
fixed
panoramic
video video
for camera
motion.
For
example,
registering
with
the
areas
they
window.
In practice,
we used
the
fixed
panoramic
withdevices
digital
After
annotating
the
slide,appear
the
can
with ink.
digital
ink.
After
annotating
the user
slide,in
the user can
image, and define hotspots to be particular regions of this
annotatedthrough
image up to
a selected
wall display
image,
and define hotspots
to be
particular
regions
of thissend thesend
theThe
panoramic
image
allows
them
to
that
the annotated image
uparea
to a without
selected wall display
image.
possible device
operations
are indicated
by be manipulated
drag-and-drop operation. Users may also scribble
image. The possible device operations are indicated bywith a with
a
drag-and-drop
operation.
Users
may also scribble
colors
and styles
oflocate
the highlight
borders.
We expensive
are currently and
having
to
them
with
fragile
computer
vision
techniques.
notes on blank note pages with the pen. The image of any
colors
and
styles
of
the
highlight
borders.
We
are
currently
notes
on blank
note pages
with the slide
pen. or
The
image of any
working on automatically registering the PTZ camera
note
page,
whether
is
contains
an
annotated
only
working on automatically registering the PTZ camera
note page, whether is contains an annotated slide or only
2.3
Devices and Hotspots
ink strokes, can be beamed up to any wall display. At the
strokes,the
canuser
be beamed
any pages
wall display.
At the
end of ink
a session,
can saveup
thetonote
for
session,
userarecan
save the note
viewingend
on of
the aWeb.
Note the
pages
time-stamped
and pages for
viewing
on the with
Web.anyNote
pagesmeeting
are time-stamped
and
automatically
associated
recorded
video
in a Web
interface, such
that clicking
a particular
automatically
associated
with anyonrecorded
meeting video
annotation
video
playbacksuch
at thethat
timeclicking
the annotation
in astarts
Web
interface,
on a particular
was graphically
made.
More details
and features
of atNoteLook
are annotation
are
depicted
by the
hotspots
annotation
starts
video
playback
the time the
described
in [6].
was
made. More details and features of NoteLook are
Availability and accessibility of devices
Device
controlactive
“hotspots”video window. In practice, we used the fixed panoramic
overlaid
on the
Figure
7: Device
control
hotspots in a panoramic
described in [6].
Device
control
“hotspots”
video
image
video
image,
and
define
hotspots
to be particular regions of this image. The
Figure 7: Device control
hotspots
in a panoramic
video image
possible
device operations are indicated by colors and styles of the highlight
borders. We are currently working on automatically registering the PTZ camera
image with the panoramic image, so that hotspots become active in the zoomedin video as well.
5
Figure
8: ePIC
presentation
authoring
andsystem
control
Fig.
6. ePIC
presentation
authoring
and control
Figure 9: Previewing ePIC pre
5. THE EPIC PRESENTATION CONTROL SYSTEM
(top) and VRML (bottom) repre
Today, more and more meeting environments are
equipped
with
a
richer
variety
of
displays,
often
including
3 The NoteLook Collaboration System
lighting.) At top right is the slid
secondary or multiple projectors or large-screen displays.
arrange slides and room control
This gives a presenter richer options for conveying
presentation and device. The bott
information to others. For example, a presenter can use
detailed slide view, or zoomed vid
the primary display for a text presentation, while using
camera. At bottom right the sl
The NoteLookanother
annotation
system
is designed
to be
run on
a wireless pen tablet
display
to show
a supporting
figure
or video.
presentation are available either fo
[5]. NoteLookExisting
allows collaboration
and
annotation
aroundare
digital
presentation
authoring tools,
such
as PowerPoint,
excellent
drop operations or for arrangin
graphics, using
ink.presentations
In operation,for
thea NoteLook
system
fordigital
creating
single display,
butdisplays
providea thumbpresentation.
nail view of the
in the
top leftorcorner.
video shows
no conference
support forroom
multiple
devices,
remotePanoramic
presentations.
EPIC
a view of the To
environment
with use
hotspots
overlaid onenvironments,
the displays appearing
in provides both a preview mod
enable better
of media-rich
we
In
the video image.
“Hotspots”
to active devices
are outlined withthe preview mode, the actual
designed
ePIC,that
a correspond
tool for authoring
and replaying
controlled. Rather, a preview o
colored rectangle
overlays on
live video
image.
Besides thethat
twoalso
video chanpresentations
on the
arbitrary
device
configurations,
rendered in the video image canv
nels from the supports
front andreal-time
back meeting
room
may
control
[7]cameras,
(as in additional
FlySPEC channels
and
thumbnails in the hotspot boundin
be hooked up Notelook).
during a teleconference
to interact
at atools
remote locaePIC complements,
but with
does displays
not replace
the
tion. A user can
grab
an
image
of
a
slide
showing
on
one
of
the
wall
displays
by user can navigate the presen
used to author specific media. It can organize media
controls, and see a visual indicat
dragging fromprepared
that display’s
hotspot
into
the
note
page
area.
The
image
appears
for simple devices and synchronously present
rendered
on each display device.
in full resolution,
can or
bemore
drawn
upon withvenues.
digital For
ink.example,
After annotating
the
themand
in one
multimedia
the
contain a rendering of a VRML
slide, the user can send the annotated image up to a selected wall display with
EPIC system can import a conventional PowerPoint file
available. In this case, slide thumb
a drag-and-drop operation. Users may also scribble notes on blank note pages
and re-author it for effective presentation on multiple
D on the display surfaces. B
with the pen. The image of any note page, whether it contains an annotated
displays. Our prototype supports arbitrary configurations
viewpoint can be selected, the
slide or only ink strokes, can be “beamed up” to any wall display. At the end
of displays, printers, speakers and room lights.
presentation from any point in the
of a session, the user can save the note pages for publishing on the Web. Note
EPIC, which stands for Environment Picking Image
pages are time-stamped and automatically associated with any recorded meetan example of a VRML output, in
Canvas, uses live or static images of the presentation
ing video in a Web interface, such that clicking on a particular annotation starts
for selecting the viewpoint. Users
environment
graphical user
interface
(GUI).
video playback
at the time as
thea annotation
was made.
More
detailsThis
and features
out of the rendering to see detai
allows
users in
to [6].
visually select and control presentation
of NoteLook are
described
presentation.
Currently, we are
devices, even in remote locations. Users may drag slides
incorporate machine learning t
or media files onto a visual representation of the intended
presentations and optimal slide plac
device. For example, a user can drag a slide thumbnail on
top of any of the displays visible in Figure 8 to show a
slide on that display. The live panoramic video is
6. THE ILIGHT SY
especially useful for remote presentations, as it allows the
presenter to see exactly what the audience is seeing at any
The iLight system allows remo
time. Figure 8 shows the ePIC user interface. At top left is
light projected on a real-life object
the active video canvas, showing room control hotspots
aligned camera/projector system, li
(besides displays, ePIC can control other room features
is presented as a drawable video
6
4
The ePIC Presentation Control System
Today, more and more meeting environments are equipped with a richer variety
of displays, often including secondary or multiple projectors or large-screen displays. This gives a presenter richer options for conveying information to others.
For example, a presenter can use the primary display for a text presentation,
while using another display to show a supporting figure or video. Existing authoring tools, such as PowerPoint, are excellent for creating presentations for a
single display, but provide no support for multiple devices, or remote presentations. To enable better use of media-rich environments, we designed ePIC,
a tool for authoring and replaying presentations on arbitrary device configurations, that also supports real-time control [7] (as in FlySPEC and Notelook).
ePIC complements, but does not replace tools used to author specific media.
It can organize media prepared for simple devices and synchronously present
them in one or more multimedia venues. For example, the EPIC system can import a conventional PowerPoint file and re-author it for effective presentation on
multiple displays. Our prototype supports arbitrary configurations of displays,
printers, speakers and room lights. EPIC, which stands for Environment Picking
Image Canvas, uses live or static images of the presentation environment as a
graphical user interface (GUI). This allows users to visually select and control
presentation devices, even in remote locations. Users may drag slides or media
files onto a visual representation of the intended device. For example, a user
can drag a slide thumbnail on top of any of the displays visible in Figure 8 to
show a slide on that display. The live panoramic video is especially useful for
remote presentations, as it allows the presenter to see exactly what the audience
is seeing at any time. Figure 8 shows the ePIC user interface. At top left is the
active video canvas, showing room control hotspots (besides displays, ePIC can
control other room features such as loudspeakers, printers, cameras, and even
room lighting.) At top right is the slide timeline. Users can arrange slides and
room control actions by time in the presentation and device. The bottom left
panel shows a detailed slide view, or zoomed video image from the PTZ camera. At bottom right the slides in a PowerPoint presentation are available either
for immediate drag-and-drop operations or for arranging into a multiscreen presentation. EPIC provides both a preview mode and a playback mode. In the
preview mode, the actual room devices are not controlled. Rather, a preview
of the presentation is rendered in the video image canvas. By displaying slide
thumbnails in the hotspot bounding boxes (see Figure 4), the user can navigate
the presentation using the usual controls, and see a visual indication of what
slides are rendered on each display device. The image canvas can contain a rendering of a VRML model of the venue, if available. In this case, slide thumbnails
are rendered in 3-D on the display surfaces. Because the rendering viewpoint can
be selected, the user can preview the presentation from any point in the venue.
Figure 9 shows an example of a VRML output, including a pop-up menu for
selecting the viewpoint. Users may also zoom in and out of the rendering to see
details or overviews of the presentation. Currently, we are investigating ways to
7
Figure 8: ePIC presentation authoring and control
Figure 9: Previewing ePIC presentations on video
5. THE EPIC PRESENTATION CONTROL SYSTEM
(top) and VRML (bottom) representations
Today, more and more meeting environments are
Previewing
ePIC presentations
equipped Fig.
with a7.
richer
variety of displays,
often including on video (left) and VRML (right) representalighting.) At top right is the slide timeline. Users can
secondarytions
or multiple projectors or large-screen displays.
arrange slides and room control actions by time in the
This gives a presenter richer options for conveying
presentation and device. The bottom left panel shows a
information to others. For example, a presenter can use
detailed slide view, or zoomed video image from the PTZ
presentation authoring and control
the primary display for a text presentation, while using
camera.
At bottom
right the slides and
in a optimal
PowerPoint
incorporate
machine
learning
to
assist
multi-screen
presentations
another display to show a supporting figure or video.
Figureplacement
9:tools,
Previewing
ePIC presentations
on video presentation are available either for immediate drag-andESENTATION CONTROL SYSTEM
slide
Existing authoring
such as[9].
PowerPoint,
are excellent
drop operations or for arranging into a multiscreen
(top) and VRML (bottom) representations
and more meeting environments for
are creating
presentations for a single display, but provide
presentation.
her variety of displays, often including
no support
for multiple
remote
lighting.)
At topdevices,
right isorthe
slide presentations.
timeline. Users can
EPIC provides both a preview mode and a playback mode.
ple projectors or large-screen displays.
To enable
better
use and
of
media-rich
environments,
we Drawing
arrange
slides
room control
actions by time
in the
5ePIC,
The
iLight
Projection
System
In the preview
mode, the actual room devices are not
senter richer options for conveying
designed presentation
a tool
for authoring
andleftreplaying
and device.
The bottom
panel shows a
controlled. Rather, a preview of the presentation is
ers. For example, a presenter can use
detailed
slide view,device
or zoomed
video image
from
presentations
on arbitrary
configurations,
that
alsothe PTZ
rendered in the video image canvas. By displaying slide
y for a text presentation, while using
Atcontrol
bottom
right(as
theinslides
in a PowerPoint
The
iLight
system
allows
remote
supports camera.
real-time
[7]
FlySPEC
andusers to draw using light projected on a realthumbnails in the hotspot bounding boxes (see Figure 4),
how a supporting figure or video.
presentation
are available
immediate
Notelook).
complements,
buteither
does
notUsing
replace tools
lifeePIC
object
or scene
[8].for
andrag-andaligned
camera/projector
system,using
live the
video
tools, such as PowerPoint, are excellent
the user
can navigate the presentation
usual
drop
operations
or for It
arranging
into amedia
multiscreen
used
to
author
specific
media.
can
organize
tations for a single display, but provide
controls,
and see
a visual
what remote
slides are
from
the devices
scene and
is presented
as
a drawable
video
image
to indication
one or of
more
presentation.
prepared
for
simple
synchronously
present
ltiple devices, or remote presentations.
rendered on each display device. The image canvas can
EPIC
both a preview
mode
and
a playback
users.
Graphics
drawn
on
the
video
canvas
are projected onto the scene visior provides
more
multimedia
venues.
For
example,
the mode.
use of media-rich environments, them
we in one
contain a rendering of a VRML model of the venue, if
In
the
preview
mode,
the
actual
room
devices
are
not
EPIC system
a conventional
file
ble can
in import
the
video.
Thus ofPowerPoint
remote
users
may
annotate
particular
sceneareobjects
tool for authoring and replaying
available.
In this case,
slide thumbnails
rendered inor3controlled.
Rather,
a
preview
the
presentation
is
and re-author it for effective presentation on multiple
bitrary device configurations, that also
D
on
the
display
surfaces.
Because
the
rendering
features
for
local
viewers,
and
can
also
display
text
and
project
arbitrary
digrendered
in the video
image
canvas.configurations
By displaying slide
Our prototype
supports
arbitrary
control [7] (as in FlySPEC displays.
and
viewpoint can be selected, the user can preview the
thumbnails
in
the
hotspot
bounding
boxes
(see
Figure
4),
ital
images.
Unlike
related
systems,
there
are
no
video
feedback
problems,
nor
of displays, printers, speakers and room lights.
omplements, but does not replace tools
presentation from any point in the venue. Figure 5 shows
the user
can navigate
the presentation
the usual
EPIC, which
stands
for Environment
Pickingusing
Image
ecific media. It can organize media
are
computer
vision
techniques
required
to
detect
objects
or
actions.
Imagine
a
an example of a VRML output, including a pop-up menu
a visual
of what slides are
Canvas, controls,
uses liveand
or see
static
imagesindication
of the presentation
le devices and synchronously present
for selecting
the
viewpoint.
Users may
also projector
zoom in and
rendered
onwhiteboard
each display device.
The
image canvas
can
shared
as
an
example
scenario.
The
iLight
system’s
video
as a graphical user interface (GUI). This
e multimedia venues. For example,environment
the
out of the rendering to see details or overviews of the
contain a rendering of a VRML model of the venue, if
“draws”
remote
images
a local whiteboard,
while
sending live whiteboard
users
to visually
select and
controlon
presentation
import a conventional PowerPoint allows
file
presentation.
Currently, we are investigating ways to
available. In this case, slide thumbnails are rendered in 3devices, even
in remote
locations.
Usersdrawer.
may drag In
slides
or effective presentation on multiple
video
to
the
remote
operation,
remote
users
drawto onassist
the multi-screen
camera
incorporate machine learning
D on the display surfaces. Because the rendering
otype supports arbitrary configurations
or media files onto a visual representation of the intended
viewpoint
can befamiliar
selected, graphical
the user can tools
previewsuch
the
presentations
and optimal
slide
placement pens,
[9].
image
with
as
rectangles
and
(digital)
while
s, speakers and room lights.
device. For
example,from
a user
can
draginathe
slide
thumbnail
presentation
any
point
venue.
Figureon5 shows
nds for Environment Picking Image
local
users
draw
directly
on
the
whiteboard
using
(physical)
dry-erase
marktop of any
the displays
visibleoutput,
in Figure
8 to ashow
a menu
an of
example
of a VRML
including
pop-up
or static images of the presentation
slide onfor
that
display.
The
live Users
panoramic
video
is in respect
ers.
The
local
projector,
fixed
with
to the
video
camera,
projects the
selecting
the
viewpoint.
may
also
zoom
and
6. THE
ILIGHT
SYSTEM
graphical user interface (GUI). This
especiallyout
useful
for rendering
remote presentations,
it allows
the of the
of the
to images
see detailsasonto
or
overviews
remotely
drawn
the
whiteboard.
Because
the
iLight
camera and
sually select and control presentation
presenterpresentation.
to see exactly what
the audience
seeing at anyways to
Currently,
we are isinvestigating
The iLight system allows remote users to draw using
mote locations. Users may drag slides
projector
are
exactly
aligned,
the
projected
image
is
drawn
exactly
where the
time. Figure
8 shows the
ePIC user
interface.
top left
is
incorporate
machine
learning
to At
assist
multi-screen
light projected on a real-life object or scene [8]. Using an
a visual representation of the intended
the activepresentations
video canvas,
room
control
hotspots
and showing
optimal
slide
placement
[9].
remote
user
intends.
Local
users
see
the
ink
and
projected
light
intermixed
the
aligned camera/projector system, live video from on
the scene
le, a user can drag a slide thumbnail(besides
on
displays, ePIC can control other room features
is
presented
as
a
drawable
video
image
to
one
or
more
whiteboard,
while
remote
users
see
a
camera
image
of
the
same.
Thus
local
and
displays visible in Figure 8 to show
a as loudspeakers, printers, cameras, and even room
such
play. The live panoramic video is
remote users
can
freely
draw
and
annotate
each
other’s
work
as
if
they
shared
a
6. THE ILIGHT SYSTEM
or remote presentations, as it allows the
local
whiteboard.
Though
neither
can
erase
the
other’s
marks,
the
remote
user
actly what the audience is seeing at any
The iLight system allows remote users to draw using
ws the ePIC user interface. At top left is
has
functionality not available with physical ink. For example, the remote user
light projected on a real-life object or scene [8]. Using an
anvas, showing room control hotspots
may project
any system,
arbitrary
image,
asscene
well as copy-and-paste regions of the camaligned
camera/projector
live video
from the
ePIC can control other room features
isera
presented
as aWe
drawable
video image investigating
to one or more using camera feedback to enhance the
image.
are
currently
ers, printers, cameras, and even room
projection onto surfaces with nonuniform reflectivity. By monitoring the reflection visible in the camera image, it is possible to modify the projected image
to compensate for color and brightness variations in the scene. For example,
when projecting on a checkerboard surface, the projection can be brightened in
the dark squares and dimmed in the light squares to result in a more uniform
perceived illumination.
8
User
graphical
input
User
graphical
input
Drawing GUI
video
projector
Drawing GUI
video
projector
image
calibration
projection image
registered with
camera view
video camera
projection image
registered with
camera view
Figure 10: iLight camera/projector system
camera
remote
users. Graphics drawnvideo
on the
video canvas are
image
projected
onto the scene visible in the video. Thus remote
calibration
users may annotate particular scene objects or features for
local viewers, and can also display text and project
arbitrary digital images. Unlike related systems, there are
no video feedback problems, nor are computer vision
techniques required to detect objects or actions.
Imagine a shared whiteboard as an example scenario. The
iLight system’s video projector “draws” remote images on
a local whiteboard, while sending live whiteboard video to
the remote drawer. In operation, remote users draw on the
camera image with familiar graphical tools such as
rectangles and (digital) pens, while local users draw
directly on the whiteboard using (physical) dry-erase
markers. The local projector, fixed with respect to the
video camera, projects the remotely drawn images onto the
whiteboard. Because the iLight camera and projector are
exactly aligned, the projected image is drawn exactly
where the remote user intends. Local users see the ink and
projected light intermixed on the whiteboard, while remote
users see a camera image of the same, as in Figure 2. Thus
local and remote users can freely draw and annotate each
other’s work as if they shared a local whiteboard. Though
neither can erase the other’s marks, the remote user has
functionality not available with physical ink. For example,
the remote user may project any arbitrary image, as well as
copy-and-paste regions of the camera image. We are
currently investigating using camera feedback to enhance
the projection onto surfaces with nonuniform reflectivity.
By monitoring the reflection visible in the camera image,
it is possible to modify the projected image to compensate
for color and brightness variations in the scene. For
example, when projecting on a checkerboard surface, the
projection can be brightened in the dark squares and
dimmed in the light squares to result in a more uniform
perceived illumination.
Figure 10: iLight camera/projector system
Figure 11: iLight user interface. User drawings on
the video window are projected onto the actual
scene.
Fig. 8. (Left) iLight camera/projector system
8. REFERENCES
remote
Graphics
drawn
on the User
videodrawings
canvas on
are the
Fig. 9.users.
(Right)
iLight user
interface.
video
window
areK.,projected
[1] Tani,
M., Yamaashi,
K., Tanikoshi,
Futakawa, M. &
Tanifuji, S. “Object-oriented video: Interaction with real-world
projected
the scene.
scene visible in the video. Thus remote objects through live video.” In Proc of CHI ’92, ACM Press,
onto theonto
actual
users may annotate particular scene objects or features for pp.593-598, 711-712..
[2] Qiong Liu, and Don Kimber, Learning Automatic Video
local viewers, and can also display text and project Capture from Human Camera Operations. In Proc. IEEE Intl.
Conf. on Image Processing
6 Conclusion
arbitrary
digital images. Unlike related systems, there are [3] Qiong Liu, Don Kimber, Jonathan Foote, Lynn Wilcox, and
John Boreczky. “FLYSPEC: A Multi-User Video Camera
no video feedback problems, nor are computer vision System with Hybrid Human and Automatic Control.” In Proc.
ACM Multimedia 2002, pp. 484-492, Juan-les-Pins, France,
Figure 11: iLight
Decembercommunication
1-6, 2002.
techniques
to detect
or that
actions.
We haverequired
presented
four objects
systems
support remote
and user
col- interface. User drawings on
[4] J. Foote and D. Kimber, “FlyCam: Practical Panoramic
the
video
window
are projected onto the actual
Video.”
In
Proc.
IEEE
International
Conference
on
Multimedia
Imagine
a shared
whiteboard
as anand
example
scenario.
The and Expo, vol. III, pp. 1419-1422, 2000. interface.
laboration,
using
a powerful
intuitive
“reach-through-the-screen”
scene.
[5]
Patrick
Chiu,
Qiong
Liu,
John
Boreczky,
Jonathan
Foote,
iLight
system’s
video
projector
“draws”
remote
images
on
Perhaps most importantly, these systems serve asTohru
platforms
for investigating
Fuse, Don Kimber, Surapong Lertsithichai, and Chunyuan
Liao, “Manipulating and annotating slides in a multi-display
a local
whiteboard,
while
sending
live
whiteboard
video
to
algorithms, such as computer vision and machine environment.”
learning,
that will hopefully
In Proceedings of INTERACT '03, pp. 583-590
Patrick Chiu, Ashutosh Kapuskar, Sarah Reitmeier, and Lynn
thesupport
remote additional
drawer. In operation,
usersand
drawease
on the
advancedremote
functions
of [6]
use.
Wilcox.
“NoteLook: Taking Notes in Meetings with Digital
and Ink.” In Proc. ACM Multimedia 8.
’99, REFERENCES
Orlando,
camera image with familiar graphical tools such as Video
Florida, November
1999.
[1]
Tani,
M.,
Yamaashi,
K., Tanikoshi, K., Futakawa, M. &
Chunyuan Liao, Qiong Liu, Don Kimber, Patrick Chiu,
rectangles and (digital) pens, while local users draw [7]
Jonathan Foote,
and Lynn Wilcox,
“Shared Interactive Video for video: Interaction with real-world
Tanifuji,
S. “Object-oriented
Teleconferencing.
In
Proc.
ACM
Multimedia
2003,
pp.
546-554
directly
on the whiteboard using (physical) dry-erase (11/2/2003)objects through live video.” In Proc of CHI ’92, ACM Press,
References
Jonathan Foote and Don Kimber, “Annotating Remote
markers. The local projector, fixed
with respect to the [8]
pp.593-598,
711-712..
7. CONCLUSION
Reality with
iLight,” in preparation.
We
have
presented
four
systems
that
support
remote
video
camera,
projects
the
remotely
drawn
images
onto
the
1. M. Tani, K. Yamaashi,
K. Tanikoshi, M. Futakawa, and
S. Tanifuji,
“Object-oriented
[9] Qiong
Liu, Frank Zhao,
and Don Kimber, “Computer
communication and collaboration, using a powerful and intuitive
[2] Qiong
Liu,for and
Don
Kimber, Learning Automatic Video
Assisted Presentation
Authoring
Enhanced
Multimedia
“reach-through-the-screen”
interface.
Perhaps
mostthrough
importantly,arelive video.” In Proc of CHI ’92,
whiteboard.
Because the
iLight
camera
and
projector
video: Interaction
with
real-world
objects
Venues,” in preparation.
these systems serve as platforms for investigating algorithms,
Capture from Human Camera Operations. In Proc. IEEE Intl.
such as computer image
vision and is
machine
learning, that
will
ACM
Press, the
pp. 593–598
exactly
aligned,
projected
drawn
exactly
Conf. on Image Processing
hopefully support additional advanced functions and ease of use.
2. Q.the
Liu
and D.
Kimber,
Automatic
Video
from Liu,
Human
where
remote
user
intends.“Learning
Local users
see the ink
and Capture
[3] Qiong
Don Camera
Kimber, Jonathan Foote, Lynn Wilcox, and
Operations.”
In Proc.onIEEE
Intl. Conf.while
on Image
Processing
projected
light intermixed
the whiteboard,
remote
John Boreczky. “FLYSPEC: A Multi-User Video Camera
3. Q.
D. Kimber,
L. Wilcox,
and J.
“FLYSPEC:
A Multi-User
users
seeLiu,
a camera
image J.
ofFoote,
the same,
as in Figure
2. Boreczky.
Thus
System with Hybrid
Human and Automatic Control.” In Proc.
Camera
System
with Hybrid
Human
and each
Automatic
Control.”
In Proc.
ACM
ACM
Multimedia
2002,
pp. 484-492, Juan-les-Pins, France,
local Video
and remote
users
can freely
draw and
annotate
December1-6,
1-6,2002.
2002.
Multimedia
pp. 484–492,
France, December
other’s
work as if2002,
they shared
a local Juan-les-Pins,
whiteboard. Though
[4] J. Foote
andIEEE
D. Kimber,
4. J. Foote
and D.
“FlyCam:
Panoramic
Video.”
In Proc.
Intl. “FlyCam: Practical Panoramic
neither
can erase
theKimber,
other’s marks,
the Practical
remote user
has
Video.”
In
Proc.
IEEE
International
Conference on Multimedia
Conf.
on
Multimedia
and
Expo,
vol.
III,
pp.
1419–1422,
2000.
functionality not available with physical ink. For example,
and Expo,
vol. III, pp. 1419-1422,
2000.
5.
P.
Chiu,
Q.
Liu,
J.
Boreczky,
J.
Foote,
T.
Fuse,
D.
Kimber,
S.
Lertsithichai,
and
the remote user may project any arbitrary image, as well as
[5] Patrick
Chiu, Qiong Liu,
C. Liao, “Manipulating
andcamera
Annotating
Slides
a Multi-Display
Environment.”
In John Boreczky, Jonathan Foote,
copy-and-paste
regions of the
image.
Wein are
Tohru Fuse, Don Kimber, Surapong Lertsithichai, and Chunyuan
Proc.
of
INTERACT
’03,
pp.
583–590
currently investigating using camera feedback to enhance
Liao, “Manipulating and annotating slides in a multi-display
P. Chiu, A.
S. Reitmeier,
andreflectivity.
L. Wilcox. “NoteLook: Taking Notes in
the6. projection
ontoKapuskar,
surfaces with
nonuniform
environment.” In Proceedings of INTERACT '03, pp. 583-590
Meetings with Digital Video and Ink.” In Proc. ACM Multimedia ’99, Orlando,
By monitoring
the reflection visible in the camera image,
[6] Patrick Chiu, Ashutosh Kapuskar, Sarah Reitmeier, and Lynn
Florida, November 1999.
Wilcox. “NoteLook: Taking Notes in Meetings with Digital
it is possible to modify the projected image to compensate
7. C. Liao, Q. Liu, D. Kimber, P. Chiu, J. Foote, and L. Wilcox,
Video “Shared
and Ink.”Interactive
In Proc. ACM Multimedia ’99, Orlando,
for color and brightness variations in the scene. For
Video for Teleconferencing. In Proc. ACM Multimedia 2003,
pp.
546–554 1999.
Florida,
November
example, when projecting on a checkerboard surface, the
8. J. Foote and D. Kimber, “Annotating Remote Reality with
in preparation.
[7]iLight,”
Chunyuan
Liao, Qiong Liu, Don Kimber, Patrick Chiu,
projection can be brightened in the dark squares and
9. Q. Liu, F. Zhao, and D. Kimber, “Computer Assisted Presentation
Authoring
Jonathan Foote,
and Lynn for
Wilcox, “Shared Interactive Video for
dimmed
in the Multimedia
light squaresVenues,”
to result ininpreparation.
a more uniform
Enhanced
Teleconferencing. In Proc. ACM Multimedia 2003, pp. 546-554
perceived illumination.
(11/2/2003)
7. CONCLUSION
We have presented four systems that support remote
communication and collaboration, using a powerful and intuitive
“reach-through-the-screen” interface. Perhaps most importantly,
these systems serve as platforms for investigating algorithms,
such as computer vision and machine learning, that will
hopefully support additional advanced functions and ease of use.
[8] Jonathan Foote and Don Kimber, “Annotating Remote
Reality with iLight,” in preparation.
[9] Qiong Liu, Frank Zhao, and Don Kimber, “Computer
Assisted Presentation Authoring for Enhanced Multimedia
Venues,” in preparation.