Solving Vision

How do we know that we solved vision?
16-721: Learning-Based Methods in Vision
A. Efros, CMU, Spring 2009
Columbia Object Image Library (COIL-100) (1996)
Corel Dataset
Yu & Shi, 2004
Average Caltech categories (Torralba)
{ all photos}
2.5E+09
2E+09
1.5E+09
1E+09
500000000
0
12/15/2003
6/15/2004
12/15/2004
6/15/2005
12/15/2005
Flickr.com
6/15/2006
12/15/2006
6/15/2007
12/15/2007
Flickr Paris
Real Paris
Automated Data Collection
Kang, Efros, Hebert, Kanade, 2009
Something More Objective?
Famous Tsukuba Image
Middlebury Stereo Dataset
Issue 1
• We might be testing too soon…
• Need to evaluate the entire system:
– Give it enough data
– Ground it in the physical world
– Allow it to affect / manipulate its environment
• Do we need to solve Hard AI?
– Maybe not. We don’t need Human Vision per
se – how about Rat Vision?
Issue 2
• We might be looking for “magic” where
none exist…
Valentino Braitenberg, Vehicles
Source Material: http://www.bcp.psych.ualberta.ca/~mike/
Pearl_Street/Margin/Vehicles/index.html
Introduces a series of (hypothetical) simple robots that seem,
to the outside observer, to exhibit complex behavior.
The complex behavior does not come from a complex brain, but
from a simple agent interacting with a rich environment.
Vehicle 1: Getting around
A single sensor is attached to a single motor.
Propulsion of the motor is proportional to the signal
detected by the sensor.
The vehicle will always move in a straight line,
slowing down in the cold, speeding up in the warm.
Braitenberg: “Imagine, now, what you would think if you saw such a vehicle swimming
around in a pond. It is restless, you would say, and does not like warm water. But it is
quite stupid, since it is not able to turn back to the nice cold sport it overshot in its
restless ness. Anyway, you would say, it is ALIVE, since you have never seen a particle
of dead matter move around quite like that.”
More complex vehicles
Moral of the Story
• “Law of Uphill Analysis and Downhill
Invention: machines are easy to
understand if you’re creating them; much
harder to understand ‘from the outside’.
• Psychological consequence: if we don’t
know the internal structure of a machine,
we tend to overestimate its complexity.”
Turing Tests for Vision
• Your thoughts…
Have we solved vision if we solve
all the boundary cases?
Varum
Computer Vision Database
Zhaoyin Jia
 Object segmentation/recognition
Detailed segmented/labeled, all the scenes in life.
 Semantic meaning in image/video
Human understanding of the image/story behind the image
During the
Spring break
Before the
deadline
In the class
Failed in
16721
 Feeling/reaction after understanding
Cute
Adorable
Safe
Best project
in 16721
Threatened
Run
Call for help
Love
Kiss
More threatened
Run faster
Need more help
How do we know that we solved vision?
General Rule: Turing test
Yuandong Tian
If CVS == HVS in
Training & Performance & Speed & Failure case
Then We declare vision is solved. Beers and Being laid off.
Verifiable Specific Rules:
Challenges in Training
 Full-automatic object Discovery & Categorization from unlabeled,
long video sequence.
 Multi-view robust real-time Recognition of ten of thousands of
objects, given few trainings of each object.
Challenges in Performance
 Pixel-wise Localization and Registration in cluttered and
degraded scene;
 Long-term real-time robust Tracking for generic objects in cluttered
and degraded video sequence.
Human failure – human vision illusion
 Able to explain human vision illusions, and Reproduce them.
Conclusion:
Good luck for all!
16-721: Learning-based method in vision
Turing Test for Vision
• From the blog:
– No overall test. Vision is task-dependent. Do one
problem at a time.
– Use Computer Graphics to generate tons of test data
– A well-executed Grand Challenge
• Genre Classification in Video
– The Ultimate Dataset (25-year-old grad student)
– Need to handle corner cases / illusions. “Dynamic
range of difficulty”. 
– It’s all about committees, independent evaluations,
and releasing source code
– It’s hopeless…