View presentation

- Videocites Confidential Information -
Video-by-Video Search Engine
Based on RediSearch by
Presentation By Eitan Shapiro
- Videocites Confidential Information -
Videocites
• Cloud-based SaaS platform for video-by-video search for all online
videos
• Enable content owners to keep track and better monetize their videos
• Novel, highly efficient, patent-protected fingerprinting technology
Fingerprinting
Partial or
1,000,000 TB
4 TB Ultra-lightweight
of video
process
Representation ratio = 1/250,000
1 HD frame = 32bit
video fingerprints DB
completely
identical videos
2
- Videocites Confidential Information -
The Problem
Official views in YT: 107M
Katy Perry – Hot N Cold
39,997 replications found in YT
Non-official views: 697M (!)
LOST
697,000 Premium CPMs
Last seen in YouTube upload area
➢
No video fingerprinting technology implemented for
digital rights management
➢
72% of top rated videos on Facebook are freebooted
from YouTube
➢
Massive boundless video proliferation
3
- Videocites Confidential Information -
Our Solution
We help content owners to manage all
Regain control on
video content
Video Duplications/Citations
We look everywhere (3rd party privilege)
…
We find more thanks to modality-indifference
and outstanding
Ratio - 4:3
accuracy
Subtitles
Flipped
Res - 240P
1
Increased monetization
and eyeballs
2
Video-based
multi-platform true
analytics
4
- Videocites Confidential Information -
Basic Architecture
Sampling video databases x100s faster
than playback and extracting metadata
Index and metadata
Databases
Video sample
QUERY
RESULTS
User Dashboard
Creating an ultra-lightweight video
fingerprint Technology is protected by a
Monetize
Ignore
Block
Take down
5
- Videocites Confidential Information -
Interactive Video Search System
6
- Videocites Confidential Information -
From Video Search to a Textual-Like Search
• Each video frame is represented by an ultra-lightweight fingerprint of
32bits
• Once a video is fingerprinted, it can be considered as a text document
• Each frame is represented by a hashcode, which is a Term in a document
• We build an inverted index from hashcodes to video identifiers
Video Frame
Hashcode
7C34BD18
XCAZ97N8
Video IDs that contain a specific hashcode
x596qz
c x57722m
x57nelr
x57b39
z x58utv
x5n34z
c
vKll65
x57722m
3
C22FGO15
x5972z
c
jh59rtyr
k
x57nelr
x57b66
z
x57nelr
88Glk
x
x575622
m
x57nel
r
x58s9gy
yt76GHF
x57ap7
x57iuw
l
n56gFhs
Lk98cv
x
x32hy6
x57nel j
r
92GHGhf
x58lap
5
x57nelr
0
x57ap7l
LomN6
5
x57nelr
x57iuw
0
x57ap7l
x57nelr
x57ap7l
x57iuw
0
OpP76N
7
- Videocites Confidential Information -
Example of RediSearch Index
• Index holds around 2.6M fingerprinted
videos
• Index size in memory is around 3.9GB
• Index is using only two fields
▪ h – for hashcodes
▪ publish_date – for date filter
• The high efficiency of RediSearch index
allows handling millions of videos in a
single machine
8
- Videocites Confidential Information -
RediSearch Query
• Query terms (hashcodes) are expanded to logical Synonyms
• As a result Boolean query can grow to hundreds of search terms
9
- Videocites Confidential Information -
Why RediSearch is a Great Fit (1)
• RediSearch index sharding
▪ Once scaling up, database sharding is required
▪ Sharding video documents is based on document identifier - Video ID
▪ Redis sharding policy is based on keys and not values, however in
inverted index the keys are the Terms (hashcodes) and the values are the
Video IDs
▪ Luckily, RediSearch is naturally sharding by Video IDs
• RediSearch fast numeric range filter
▪ Allowing us to filter over numeric metadata of the query video
▪ Significantly speed up search
10
- Videocites Confidential Information -
Why RediSearch is a Great Fit (2)
• Key feature in search engines is the ability to sort by relevance
• Relevance in video search is measured by hashcode differences
• RediSearch support:
▪ Server side custom query expansion to logical Synonyms
▪ Scoring based on relevance calculated by custom function
• Sorting by score expedites the search process
11
- Videocites Confidential Information -
Why Redis Over SSD Saves the Day
• In addition to index, we hold fingerprinted videos in Redis (+50GB every month
)
• Redis over SSD allows to optimize between requirement for speed and memory
usage and eventually drive costs down
• Video fingerprinted objects are processed concurrently
• Lazy loading from SSD works nicely with prefetching mechanism that allow to
eliminate delay
SSD Disk
Redis Over SSD
Multiple Search Processes
Fingerprinted Video Objects
12
- Videocites Confidential Information -
Collaboration with RedisLabs Team
• Support in development of new features, like custom scoring
for RediSearch
• Support for unit-testing through rmtest - library for disposable
local Redis server(s) based on port number
• Support rollout to production with quick turnaround for bug
fixes
13
- Videocites Confidential Information -
Future Challenges
• RediSearch cluster
• Support the ability to search over multiple pages in
RediSearch cluster
• Handle High-Availability and Disaster-Recovery
• Faster loading time from a Redis backup
14
- Videocites Confidential Information -
Thank You !
Videocites is hiring…
Contac us: [email protected]
15