Stream Description

CLUE Framework
First Draft
IETF - 81
July, 2011
Allyn Romanow ([email protected])
Mark Duckworth ([email protected] )
Andy Pepperell ([email protected])
Brian Baldino ([email protected] )
Multiple Media Streams
Video and Audio
London
Video and Audio
Dallas
Video and Audio
Paris
Challenges
Usable
now
• Current
functionality
Extensible
• Future
functionality
Simple
• Practical to
implement
What’s Needed?

MEDIA CAPTURE DESCRIPTION

CHOOSING STREAMS
Process

Consumer sends hints to provider

Provider sends capabilities

Consumer chooses streams
(Not negotiated in the strict sense, 2 one-way)
Structure of Information
Capture Sets
Media Capture
Audio or Video
Simultaneous
Transmission
Set
Media Capture
Audio or Video
Attributes
Encode
Group
Media Capture
Audio or Video
Media Capture
Description
Mark Duckworth
Media Capture &
Attributes
Capture Sets
Media Capture
Audio or Video
Simultaneous
Transmission
Set
Encoding
Group
Media Capture
Audio or Video
Attributes
Media Capture
Audio or Video
Attributes
Audio attributes
• Purpose (role)


Main
Presentation
• Mixed – true/false
• Channel Format



Linear array
Stereo
Mono
• Linear position

0 to 100
EXTENSIBILITY
Video attributes
• Purpose (role)


Main
Presentation
• Composed – true/false
• Auto switched

True/false
• Spatial scale

Image width
Capture Scene
Capture Scene
VC2
VC1
VC2
Three cameras
Cameras
People
VC0
VC3
VC4
VC1
Two cameras, moved & zoomed out
VC5
VC0
Switched (based on voice) with composed PiP
Capture Set
Capture Set Rows
(VC0, VC1, VC2)
VC0
VC1
VC2
Three cameras
(VC3, VC4)
VC3
VC4
(VC5)
(AC0)
Each alternative
representation of a
Capture Scene is a
row in a Capture Set
Two cameras, moved and zoomed out
VC5
Switched (based on voice), composed PiP
Video Capture Adjacency
people
cameras
VC1
right
VC1
right
left
VC0
left
VC0
Capture Set:
(VC0, VC1)
Other capture set rows
Matching Audio with Video


Same capture scene
Video adjacency matches audio sound stage
Matching Audio with Video
Spatial extent of video
VC0
VC1
VC2
Left
Stereo
Right
0
50
100
Linear Array
Spatial extent of audio
Choosing Streams
Andy Pepperell
Basic message flow
Consumer capability advertisement
Media capture advertisement
Media
Stream
Consumer
Consumer configuration
of provider’s streams
Media
Stream
Provider
Capabilities Sent by Consumer
Physical factors
e.g. number of screens
User preferences
Software limitations
e.g. media capture attributes known
Consumer capability advertisement
Media
Stream
Consumer
Advertisement Sent by Provider
Consumer capability advertisement
Provider fixed characteristics
e.g. number of cameras
Dynamic factors
e.g. whether presentation source
present
Media
Stream
Provider
Media capture advertisement
Configure Msg Sent by Consumer
Provider capture advertisement
simultaneous transmission set + encoding
groups
Consumer’s fixed characteristics
e.g. number of screens
Dynamic factors
e.g. change of user preferences
Media
Stream
Consumer
Stream configure message
Provider Capture Advertisement
Captures and attributes
Simultaneous transmission sets
Capture sets
Encoding groups
Simultaneous Transmission Sets
Right
VC2
VC3
People
VC1
VC0
Center
Left
Center camera can do either
regular or zoomed
(VC0, VC1, VC2)
(VC0, VC3, VC2)
Encoding Groups
Media
Stream
Provider
Attribute Name
Description
maxBandwidth
Maximum number of bits per
second relating to all encodes
combined
maxVideoMbps
Maximum number of
macroblocks per second
relating to all video encodes
combined:
((width + 15) / 16) * ((height + 15)
/ 16) * framesPerSecond
videoEncodes[]
Set of potential video encodes
can be generated
audioEncodes[]
Set of potential audio encodes
that can be generated
Encoding
Encoding
Encoding
group
group
Group
Encoding Group Structure
Media stream provider
Encoding group
Encoding group
Encoding group
Encode 1
Encode 2
Encode 3
Video Encode Attributes
Name
Description
maxBandwidth
Maximum number of bits per second relating to the video encode
maxMbps
Maximum number of macroblocks per second relating to the video
encode:
((width + 15) / 16) * ((height + 15) / 16) * framesPerSecond
maxWidth
Video resolution’s maximum width, expressed in pixels
maxHeight
Video resolution’s maximum height, expressed in pixels
maxFrameRate
Maximum frame rate
Sample Encoding Group
<=2 encodes, <= 1080p30
Bandwidth trade-off between encodes & group as a whole
EG0: maxMbps = 489600, maxBandwidth=6000000


ENC0: maxWidth=1920, maxHeight=1080,
maxFrameRate=60, maxMbps=244800,
maxBandwidth=4000000
ENC1: maxWidth=1920, maxHeight=1080,
maxFrameRate=60, maxMbps=244800,
maxBandwidth=4000000
Examples
Brian Baldino
Single Camera Endpoint
Single Camera Endpoint
Single Camera Endpoint
Three Camera Endpoint
Three Camera Endpoint
Three Camera Endpoint
MCU Scenarios
Three Camera Endpoint
with Presentation
QUESTIONS