Video Carving

Video Carving
BILLY CHEN AND
PRADEEP SEN
MICROSOFT, REDMOND, WA
DEPARTMENT OF ELECTRICAL AND COMPUTER
ENGINEERING, UNIVERSITY OF NEW MEXICO,
ALBUQUERQUE, NM
EUROGRAPHICS 2008
Outline
Video Carving
1. Introduction
2. Related Work
4.
Implementation
3. Video Carving
3.1 Min-Cut/Max-Flow Algorithm
3.1.1 Growth
stage
3.1.2
Augmentation
stage
3.1.3 Adoption
stage
5. Results
6. Conclusions
and Future
Work
1. Introduction
Motivation
 Why do we need video condensed or synopsis ?
This is a particularly significant problem with video, since raw,
uneditted footage consists of lots of time where nothing important
happens with only a few short moments of interest in between.
ex: extracting the key information from a video is also particularly
important problem in security and surveillance applications.
1. Introduction
 Several techniques have been proposed to condense
long video into a shorter and more useful synopsis.

downsampling or fast-forwarding
the video is cut-down in size by extracting only every nth frame.
drawback:
fails to capture a rapidly moving object since the temporal
samples might miss the actual object (an example of temporal
aliasing).
1. Introduction
 The author in this paper present a novel scheme to
take a long video stream with m frames and condense
it into a short viewable clip with n frames (where n <<
m) that preserves the most important information.
1. Introduction
Idea
 Most approaches prune down the video size by eliminating
whole frames from the video stream, the author observe
that each deleted frame does not have to consist of pixels
from a single time step.
 They think of the frames to be deleted as “sheets” within the
space-time volume where each pixel on the sheet has one
and only one time step, but different pixels can have
different time steps
2. Related Work
 Several techniques have been proposed to create
video summaries.

Frame-based approaches

simply play the video faster
drawback: fast activities may be the lost in the process.
=>To avoid this problem, techniques have been developed that
identify activities and adaptively adjust the frame rate

Object-based approaches

To represent activities as 3D objects in the space time
domain (e.g. video cube) and seek a tighter packing of
these objects in the time axis.
2. Related Work
 This idea of incrementally removing regions is inspired
by Avidan and Shamir's work on seam
carving(siggraph2007)
 To resize an image, they incrementally remove seams, which are 8-
connected paths through the image.
 Complementary to summarizing video
 Video Retargetting is the task for different output resolutions.
3. Video Carving
 A long video can be summarized through video
carving by incrementally removing 2D sheets from
the video cube to reduce its total time.
 The sheet must fully cut across the xy-plane of the
video cube.
 To compute this sheet, author use a min-cut
formulation.
3. Video Carving
 A min-cut will traverse through regions of low difference
(e.g. high similarity).When the low-difference sheet has
been found and removed, the resulting video will have
few visual artifacts since the removed pixels will be
similar to their surroundings both spatially and
temporally.
 By creating an appropriate graph of video pixels and
augmenting it with source and sink nodes, they can find
the min-cut of this graph and therefore compute the
corresponding sheet to remove from the video cube.
3. Video Carving
 First, they define a node for each pixel of the video
cube. Nodes have edges to their top, bottom, left, and
right neighbors. They also have edges to nodes in the
same pixel location in the next and previous frames.
Edge weights are computed using a
measure of spatio-temporal difference.
V 2 V 2 V 2
e(V ) 


x
y
t
Node(pixel)
Edge
3. Video Carving
 A source and sink node are connected to all the
nodes in the First and last frame, respectively.
First frame
Last frame
3.1 Min-Cut/Max-Flow Algorithm
To compute the min-cut algorithm on the graph, they use Boykov
and Kolmogorov’s min-cut/max-flow algorithms (IEEE Transactions
om PAMI 2004).
First , we have a directed graph G = <V, E>
 Terminology:



Active node: active nodes represent the outer border in each tree while the
passive nodes are internal. Active nodes allow trees to “grow” by
acquiring new children (along non-saturated edges) from a set of
free nodes.
Passive node: passive nodes can not grow as they are completely blocked by
other nodes from the same tree.
Free node: the nodes that are not in S or T are called “free”.
We have
S、T : Tree
s、t : source node and sink node
O : orphan’s set
A: active node’s set
S ⊂ V, s ∈ S , T ⊂ V , t ∈ T , S ∩ T = ∅
3.1 Min-Cut/Max-Flow Algorithm
It is convenient to store content of search trees S and T via flags TREE(p)
indicating affiliation of each node p so that

TREE(p) =
S
T
∅
if p ∈ S
if p ∈ T
if p is free node
If node p belongs to one of the search trees then the information about its
parent will be stored as PARENT(p).

Roots of the search trees (the source and the sink), orphans, and all free nodes have no parents, t.e.
PARENT(p) = ∅.
We will also use notation tree_cap(p → q) to describe residual capacity of
either edge (p, q) if TREE(p) = S or edge (q, p) if TREE(p) = T.
3.1 Min-Cut/Max-Flow Algorithm
The algorithm iteratively repeats the following three stages:
• “growth” stage: search trees S and T grow until they touch
giving an s ->t path
• “augmentation” stage: the found path is augmented, search
tree(s) break into forest(s)
• “adoption” stage: trees S and T are restored.
3.1 Min-Cut/Max-Flow Algorithm
initialize: S = {s}, T = {t}, A = {s, t}, O = Ø;
while true
grow S or T to find an augmenting path P from s to t
if( P = Ø ) terminate
augment on P
adopt orphans
end while
3.1.1 Growth stage
Growth stage:
At this stage active nodes acquire new children from a set of free
nodes, The growth stage terminates if an active node encounters a
neighboring node that belongs to the opposite tree. In this case we detect
a path from the source to the sink.
while A != ∅
pick an active node p ∈ A
for every neighbor q such that tree_cap(p → q) > 0
if TREE(q) = ∅ then add q to search tree as an active node:
TREE(q) := TREE(p), PARENT(q) := p, A := A ∪ {q}
if TREE(q) != ∅ and TREE(q) != TREE(p) return P = PATHs→t
end for
remove p from A
end while
return P = ∅
3.1.2 Augmentation stage
Augmentation stage:
The augmentation phase may split the search trees S and T into
forests. The source s and the sink t are still roots of two of the
trees while orphans form roots of all other trees.
find the bottleneck capacity ∆ on P
update the residual graph by pushing flow ∆ through P
for each edge (p, q) in P that becomes saturated
if TREE(p) = TREE(q) = S
then set PARENT(q) := ∅ and
O := O ∪ {q} (q is orphan)
if TREE(p) = TREE(q) = T
then set PARENT(p) := ∅ and
O := O ∪ {p} (p is orphan)
end for
3.1.3 Adoption stage
Adoption stage:
During this stage all orphan nodes in O are processed until O becomes
empty. Each node p being processed tries to find a new valid parent
within the same search tree; in case of success p remains in the tree but
with a new parent, otherwise it becomes a free node and all its children
are added to O, The goal of the adoption stage is to restore single-tree
structure of sets S and T with roots in the source and the sink.
while O != ∅
pick an orphan node p ∈ O and remove it from O
process p
end while
3.1.3 Adoption stage
Process p
Trying to find a new valid parent for p among its neighbors.
If node p finds a new valid parent q then
set PARENT(p) = q.
(In this case p remains in its search tree and the active
(or passive) status of p remains unchanged.)
If p does not find a valid parent then
• scan all neighbors q of p such that TREE(q) = TREE(p):
– if tree cap(q → p) > 0
add q to the active set A
– if PARENT(q) = p
add q to the set of orphans O and set PARENT(q) := ∅
• TREE(p) := ∅ , A := A − {p} (p becomes a free node)
*(A valid parent q should satisfy: TREE(q) = TREE(p),tree cap(q → p) > 0, and the “origin” of q
should be either source or sink.)
3.1 Min-Cut/Max-Flow Algorithm
 Terminal Condition
The algorithm terminates when the search trees S and T
can not grow (no active nodes) and the trees are separated
by saturated edges. This implies that a maximum flow is
achieved. The corresponding minimum cut can be
determined by S = S and T = T.
3. Video Carving
 Finally, we find a min-cut on this graph and compute
a corresponding sheet that has the property that it
has only one temporal value at every projected pixel
location.
 To do this, they first find the set of nodes ” S”, that
have edges that cross the min-cut. We then use a
“front-surface” strategy to determine which nodes to
remove.
3. Video Carving
 For each pixel location, we project it along the time-
axis of the video cube, from the first frame to the last
frame. The first node n ∈ S we encounter will be the
pixel we remove from the video cube.
3. Video Carving
 Once a sheet is removed from the video cube, the
remaining pixels are packed to cover the empty space.
Because every pixel location had one and only one
frame removed, the total video cube is shortened by
one frame.
4. Implementation
Restriction:
the memory requirements of storing the entire data structure can be
signicant.
=>store the video stream as a 3D doubly-linked grid of of pixels with each
“pixel” storing the color and gradient information as well as pointers to
its neighbors, resulting in a structure “40 bytes” in size per pixel.
this limits the maximum number of pixels in our graph to about 50
million.(32-bit Windows gives applications only 2GB of total memory)
Ex:
For a 720×480 video at 30 frames per second, this only yields about about 150 frames (5
seconds), which is unacceptable.
4. Implementation
 In order to process videos of larger sizes, they take
the input video and break it up into smaller video
subsets, each which can fit entirely within memory.
Then extract a single frame from each subset with
the min-cut algorithm. Therefore, after the first pass
through the entire video is finished, they have
removed as many frames as there were video subsets.
 Continue making passes through the video removing
frames until the video reaches the desired size.
5. Results
5. Results
 video carving preserves important information that
is not in the fast-forwarded version.
5. Results
 However, our video carving technique has artifacts that
show up as “motion tails” following rapidly-moving objects.
These are caused by video sheets that traverse the path of
the object, placing it with a previous image of itself on the
same frame.
=> These artifacts are the direct cause of having to use a small subset of
the video during processing.Since each video subset that was processed
was only a few seconds long and required the removal of a video sheet.
6. Conclusions and Future Work
 First, they might reduce the motion tails in the
condensed video by processing larger blocks of video
at one time.
 In addition, it would be of interest to be able to
enforce temporal order in the final video.
 Because we do not use any object information during
processing, the carving of video sheets can cause
discontinuities to appear as objects move.
6. Conclusions and Future Work
 By carving out low-gradient video sheets from a long
video, they are able to produce a much shorter
version that preserves important information, even
going as far as compositing objects together that
happen different times in the same frame.