Camera Based
Document Image Analysis
David Doermann
University of Maryland, College Park
What defines the problem?
• Traditional Document Analysis
– Deals primarily with paper representations
– Acquired with flatbed or sheetfed scanners
• Camera Based Analysis
– Clearly defined by the acquisition device, its properties,
the impacts of it use, etc…. but…
The devices open up a wide range of new and interesting
applications (and problems) and extends what we may
consider document analysis….
Scanner Acquisition
• Advantages
– Reasonable quality –
• Controlled lighting, high resolution, fixed imaging
plane
– Rapid Acquisition
– Relatively cheap
• Disadvantages
– Specialized Device
– Fixed– Documents must come to device
– Requires handing of documents or documents in a
sheet form for feeders
“Book” Camera/Scanner Acquisition
• One step removed from traditional
scanners
– Controlled environment – lighting, image
plane, orientation
– Changes the nature of the content
• Easier to image atypical documents
– Rare, Historic, fragile…
• Often very expensive
• Can be relatively slow
– sheet scanners are hundreds of pages per
minute
– …although robotic cameras can image 10s of
pages a minute….
Industrial Cameras
•
•
•
•
•
Removes the constraints on configuration
Often still in a controlled environment
Custom (and expensive) solutions are common
Processing power bounded only by cost
Usage
– Postal applications, document inspection (newspapers,
etc), industrial applications
(Portable) Digital Cameras
• Provide a much greater flexibility then scanners
– Multiple uses
– Devices goes to documents
• Potentially removes the bottleneck of acquisition for simple
tasks
… fewer constraints (Lighting, image plane, focus,…)
increases complexity of resulting image and image
processing…
… yet allows a wider variation of applications
A significant tradeoff
Roadmap…
• Discussion of some key related research and
applications of non-scanner DIA
• Influences of mobile devices on applications
• Issues with processing “traditional” documents vs
processing text
• Future of camera based capture….
– Open issues
– New opportunities
What has been done?
• Applications primarily centered on “Image Text”
– Text in Video Graphics
– Text from WWW pages
– Text in Scenes
• Some work on key challenges
– Imaging of text in controlled environments such as
parking lots, meeting rooms, assembly lines, etc
• Limited work on actually processing traditional
documents….
Video Text Recognition
• Indexing content from graphic or scene text in videos used to
supplement speech, closed captions,…
• Countless papers published
• Challenges are well known
– Low Resolution, Complex background, Different font style and size,
Lighting, Camera motion, Text/Object motion, Occlusion/distortion, …
all magnified for scene text
• Benefits of multiple frames, repeated content
WWW Text Image Analysis
• Applications
– Identifying graphic text for indexing and retrieval
– Identification of SPAM email in attachments
– Uncovering hidden information….
• Issues
– Text style variations (font, font style, orientation).
– Text Quality (Color, Size, Anti-aliasing)
– Image resolution
Visual Input
• Applications
– General input for computer systems
– Passive verification of signatures from
cameras mounted over the writing surface
– Has general implications for mobile devices
that don’t have “traditional” keyboard input
• Challenges:
– Pen tip tracking
– Identifying the temporal relations
– Online recognition
Whiteboard Reading
• Reading handwritten and printed material for meeting scenarios
• Challenges:
–
–
–
–
Must deal with unconstrained handwriting
Distinguish text from graphics and sketches
Parse and Interpret graphics (electronic ink)
Content can appear and disappear – dynamically produced
Meeting and Lecture Processing
• Meetings
– Reading name plates and tags
– Identifying and linking references to documents
– Processing whiteboards
• Lectures
– Reading text on projected presentations
– Detection, normalization and matching of text with
source content (PowerPoint)
• Challenges:
– Variable content
– Animations
License Plate Reading
• Applications
– Parking lot tracking
– Red-light and Speed Camera
– Vehicle Surveillance
• Challenges
–
–
–
–
Moving Vehicles
Complex plates
Night and all weather imaging
Limited use of context
Road Sign Recognition
• Applications
– Driver Assisted Systems,
Automated Mapping, Sign
guideline enforcement (location,
quality)
Sign Image
Hausdorffmatching results
Detected Sign
Shape
Nominal Conditions: Typical Rectangular Freeway
• Challenges
– Low resolution, motion blur
– Real-time systems
– Detecting signs under a variety of
conditions…
Nominal Conditions: Rural Caution Sign
Adverse Glint Conditions: Freeway Sign
Sign Recognition and Translation
• Application: Integrated identification,
recognition and translation text found on
foreign signs, maps, menus, transportation
schedules, etc
• Extremely useful for other character
sets…
• Primarily PDA or Mobile Phone Based
Hardware
• Networked or Standalone solutions have
are being marketed…
• Ultimately software solutions are
desirable….
Systems for the Visually Impaired
• Allows legally blind consumers
access to a variety of
information sources
– Transportation, shopping, …
• System builds end to end
application of detection,
enhancement, recognition and
speech transcription
A. Zandifar, A. Chahine, R. Duraiswami and L.S. Davis, “ A Video-based interface to textual information for the
visually impaired ”, IEEE Computer Society ICMI 2002, pp 325-330.
Commonalities
• Most of these systems can be/have been engineered
and with the right constraints more are technically
feasible… but perhaps not cost effective.
But what is the catalyst that will promote more general
applications?
• Mobile devices and wireless networking are
providing a platform which no longer requires
special hardware
Mobile Devices
• Examples: PDAs, Digital Cameras, Cellular Phones
• Devices are becoming common and pervasive
• They are becoming increasingly powerful (processor,
memory, power, resolution…)
• G3 networks promise multimedia support
• They are easy to use
– Devices go to the documents
– Rapid and Flexible Acquisition
• Acquisition becomes just another application of the device
How do they compare?
(subjectively)
Resolution
Distortion
Lighting
Background
Zoom/Focus
Blur
Noise
Scanner
Camera
Adequate(?)
150-600dpi
Minimal
Controlled
Improving
Domain
Dependent
N/A
N/A
Minimal
Lens/Perspective
Sensor and
Environment
Often Complex
Variable
Motion, focus
Sensor
Are Digital Cameras being used for text?
Yes….
• Active capture of information sources [Paris 03]
• Note taking during presentations [ICDAR 03]
• Japan – Signs prohibit the use of digital cameras in
bookstores! They are being used as portable photocopiers….
How about hardcopy documents?
• Falcon MT system currently testing high resolution cameras
for input to standard OCR systems …
What are the challenges of imaging traditional documents?
Resolution and Large Documents
• Related Work
– Super-resolution
• Irani (1991), Patti (1997), Capel (2000), Fekri (2000)
– Mosaicing
• Taylor (1997, 1999), Mirmehdi (2001), …
• State of the Art
– Digital Cameras: > 6 megapixels (can provide
effective 300dpi)
– PDAs:
> 1.3 megapixels
– Mobile Phones: 1 megapixel
• But better cameras are on the way….
– (4 megapixels phones by 2005)
Blur from Focus/Depth of Field
• The imaging plane may not be
parallel documents resulting in
increased blur
• Frequency domain strategy
– Tsai (1984), Tom (1994), Kim
(1990, 1993), …
• Iterative solution
– Stark (1989), Tekalp (1992),
Irani (1991), …
• Bayesian methods
– Schultz (1994, 1995), …
Lighting
• Natural lighting can be
uneven
• Providing lighting can be
challenging
• Lighting correction
– Global brightness / contrast
– Uneven brightness
• Adaptive thresholding
– Too many to list …
…Motion Blur
• Controlled with adequate lighting and shutter speed
Warping and Perspective Distortion
• Completely arbitrary
viewing angles may not be
realistic however….
• Remove perspective
distortion of plane
document pages
– Clark (2000, 2001), …
• Unwarp curl pages using
3D shape:
– Brown (2001), Pilu (2001),
…
• … imaging plane is not guaranteed to be smooth
• But we simply enhance results and use existing tools?
For controlled imaging….
From scanner: 300 dpi
From camera: ~200 dpi
Commercial OCR
• OCR is almost identical….
From scanner
From camera
More Typical Example
OCR Result
Simple Rectification
Original
Rectified
Better OCR Results
Original
Rectified
Simple Unwarping
• Text line straightening
• Unwarping book-spine type deformation
Curved Surface
OCR Result
t(,_~.t catc+go'r'jZ(tN071 Cap("rinI,ClLt
c~~'.
et,io-r1. "i,•~~,tO 1 fl classes A
hd°d 10 zr~ for
n.tc goriz~ r (l f n(,-l'(Lll
~+`~fSet ' illi111t of t11-~'1epo7't 011, OUT ea;Pc?'i,rn.e~~ts
), 'J1~~r
~n
ij11111•/iii' ~~~
jfl
c(ItPy..i,(Lt-io?L of optically
address the
1111119 l1fs' . the of feCts OCR errors ifl(Ly )LL v(" Olt
{1~~1Et1t q~I.d-at197,,Sionahi,ty red ction. (Lnd c~,teynr.izfL_
iTII1i
d t, 1 ~ep~~rt on 'uvaq~s that cateyvri.zatiorf,
If~f l. f
~o~ recti,orL and rctr•ie•11al cf fect%VCflF,ss.
f`11f1, help
f,foJ
_iiiCtion
Rough Text Extraction
Original
Edge Areas
Extracted Binary Text
Text Areas
Threshold Surface
Local Direction Detection
Cylindrical Model
B
Generatrix
C
Directrix
C
B
A
D
D
v
u
O
A
Text Line Tracing
Straightened Text Lines
OCR Result
4 successful text categor°izataon, capcri'raent divides
tcrtual collection into pre-defined clu,.sses. A true
,lirr>sentatzve for eachh class is generally obtained
tiiryttg training of the ca.tcgorzzer.
jri this paper, Yale report on o•a.r ca;periro.erats o-n,
t,pivtrtg and categorization of optically recognized
(pr tartaerrts. In partzctalar. ure in'll address the is;,ie,s regarding the of fcct.s OCR, cr-ror.s m ay have on
10itaing. dirraensionality reduction, gnd categoriza_
tip. lire further report on ways that, categorization
pla,Uhelp error* cor•rectlor( and retrieval effectiveness.
Line Extraction
• Using Extrapolation with overlapping direction
estimates
Improved OCR Result
A successful text categorization experiment divides
textual collection into pre-defined. classes. A true
~presentative for each class is generally obtained
ing training of the categorizer.
In this paper, we report on our experiments on
joining and categorization of optically recognized
pcuments. In particular, we will address the isoes regarding the effects OCR errors may have on
joining, dimensionality reduction, and categorizahon. We further- report on ways that categorization
moy help error correction and retrieval effectiveness.
Open Questions
• Are existing tools good enough?
– Can we simply “enhance” the images
– or do we need to develop new tools….
– or new constraints?
• Can we make use of degradation knowledge?
– apply constraints from clear parts of the document to
recognize similar text blurred by perspective?
• Will such devices replace scanners? No
• Will they open up the market to new applications?
– They already have…
– Integrated Information Services
• Map locations
• Tourist information
– Promise of DIA on text captured with digital camera
(business cards, nametags, pages of notes, …)
• Grand Challenges
– Image Quality
• Immediate feedback for “processability”
– Moving processing to the device
– Killer Applications
MyLifeLog
• Record and Index anything and everything
– Text from everywhere you have been
– Identification of every document you have ever looked at
(not necessarily read…)
– Recall of everything you have ever written
• Is it possible?
IJDAR Special Issue
• Special Issue on Camera Based Text and Document
Recognition
• Papers Due: November 2003
http://ijdar.cfar.umd.edu/special_issues/TD-SI.html
© Copyright 2026 Paperzz