Object information derived from object images

USOO8798322B2
(12) United States Patent
(10) Patent No.:
Boncyk et al.
(54)
(45) Date of Patent:
OBJECT INFORMATION DERIVED FROM
OBJECT IMAGES
.
(71)
_
(51)
Int. Cl.
G06K 9/00
(52)
U.S. Cl.
ApphcamZ Nam H°ldings 1P1 LLC, Culver CIYY’
(58)
Field of Classi?cation Search
See application ?le for complete search history.
Ronald H. Cohen, Pasadena, CA (US)
(73) Assignee: Nant Holdings IP, LLC, Culver City,
CA (US)
(56)
References Cited
U.S. PATENT DOCUMENTS
Subject to any disclaimer, the term of this
patent is extended or adjusted under 35
4,947,321 A
4,991,008 A
U.S.C. 154(b) by 0 days.
(21) Appl.No.: 13/971,758
DE
EP
10050486 A1
0614559 B1
4/2002
1/1999
(Continued)
Prior Publication Data
US 2013/0336533 A1
OTHER PUBLICATIONS
Dec. 19, 2013
Arai T., et al., “PaperLink: A Technique for Hyperlinking from Real
Paper to Electronic Content,” CHI 97 Electronic Publications:
Papers, Conference on Human Factors In Computer Systems,
Related US. Application Data
(60)
FOREIGN PATENT DOCUMENTS
Aug. 20, 2013
(65)
8/1990 Spence et a1.
2/1991 Nama
(Continued)
This patent is subject to a terminal dis
claimer.
(22) Filed:
(2006.01)
USPC ....................... .. 382/305, 181, 103, 704/1410
(72) Inventors: Wayne C. Boncyk, Evergreen, CO (US);
Notice:
*Aug. 5, 2014
USPC ........................................................ .. 382/103
CA (Us)
(*)
US 8,798,322 B2
Division of application No. 13/686,851, ?led on Nov.
27, 2012, now Pat. No. 8,588,527, which is a
continuation of application No. 13/207,211, ?led on
Aug. 10, 2011, now Pat. No. 8,331,679, which is a
division of application No. 13/037,330, ?led on Feb.
28, 2011, now Pat. No. 8,218,873, which is a division
Atlanta, Georgia, Mar. 22-27, 1997, pp. 327-334.
(Continued)
Primary Examiner * Ishrat I Sherali
(74) Attorney, Agent, or Firm * Fish & Tsang, LLP
ofapplication No. 12/568,130, ?led on Sep. 28, 2009,
now Pat. No. 7,899,252, which is a division of
application No. 11/204,901, ?led on Aug. 15, 2005,
now
Pat.
No.
7,680,324,
which
is
a
continuation-in-part of application No. 09/992,942,
?led on Nov. 5, 2001, now Pat. No. 7,016,532.
(60) Provisional application No. 60/317,521, ?led on Sep.
5, 2001, provisional application No. 60/246,295, ?led
on Nov. 6, 2000, provisional application No.
60/ 630,524, ?led on Nov. 22, 2004, provisional
application No. 60/625,526, ?led on Nov. 4, 2004.
4321.
mm, mm“
(57)
ABSTRACT
Search terms are derived automatically from images captured
by a camera equipped cell phone, PDA, or other image cap
turing device, submitted to a search engine to obtain infor
mation of interest, and at least a portion of the resulting
information is transmitted back locally to, or nearby, the
device that captured the image.
40 Claims, 8 Drawing Sheets
Search
En in
4341
US 8,798,322 B2
Page 2
(56)
References Cited
U.S. PATENT DOCUMENTS
5,034,812
5,241,671
5,259,037
5,497,314
5,576,950
5,579,471
5,594,806
5,615,324
5,625,765
5,682,332
5,724,579
5,742,521
5,742,815
5,751,286
5,768,633
5,768,663
5,771,307
5,815,411
5,893,095
5,894,323
5,897,625
5,911,827
5,915,038
5,917,930
5,926,116
5,933,823
5,933,829
5,933,923
5,945,982
5,971,277
5,978,773
5,982,912
5,991,827
5,992,752
6,009,204
6,031,545
6,037,936
6,037,963
6,038,333
6,045,039
6,055,536
6,061,478
6,064,398
6,072,904
6,081,612
6,098,118
6,108,656
6,144,848
6,173,239
6,181,817
6,182,090
6,199,048
6,202,055
6,208,353
6,208,749
6,208,933
6,256,409
6,278,461
6,286,036
6,307,556
6,307,957
6,393,147
6,396,475
6,396,537
6,396,963
6,404,975
6,405,975
6,411,725
6,414,696
6,430,554
6,434,561
6,453,361
6,490,443
6,501,854
6,502,756
>D
7/1991
8/1993
11/1993
3/1996
11/1996
11/1996
1/1997
3/1997
4/1997
10/1997
3/1998
4/1998
4/1998
5/1998
6/1998
6/1998
6/1998
9/1998
4/1999
4/1999
4/1999
6/1999
6/1999
6/1999
7/1999
8/1999
8/1999
8/1999
8/1999
10/1999
11/1999
11/1999
11/1999
11/1999
12/1999
2/2000
3/2000
3/2000
3/2000
4/2000
4/2000
5/2000
5/2000
6/2000
6/2000
8/2000
8/2000
11/2000
1/2001
1/2001
1/2001
3/2001
3/2001
3/2001
3/2001
3/2001
7/2001
8/2001
9/2001
10/2001
10/2001
5/2002
5/2002
5/2002
5/2002
6/2002
6/2002
6/2002
7/2002
8/2002
8/2002
9/2002
12/2002
12/2002
1/2003
Rawlings
Reed et al.
Plunk
Novak
Tonomura et al.
Barber et al.
Colbert
Kuboyama
Ellenby et al.
Ellenby et al.
Suzuki
Ellenby et al.
Stern
Barber et al.
Allen et al.
Lin
Lu et al.
Ellenby et al.
Jain et al.
Kain et al.
Gustin et al.
Heller
Abdel-Mottaleb et al.
Kayani et al.
Kitano et al.
Cullen et al.
Durst et al.
Catlos et al.
Higashio et al.
Cragun et al.
Hudetz et al.
Fukui et al.
Ellenby et al.
Wilz, Sr. et al.
Ahmad
Ellenby et al.
Ellenby et al.
Denton et al.
Wang
Stinson et al.
Shimakawa et al.
Kanoh et al.
Ellenby et al.
Desai et al.
Gutkowicz-Krusin et al.
Ellenby et al.
Durst et al.
Walsh et al.
Ellenby
Zabih et al.
Peairs
Hudetz et al.
Houvener et al.
Ayer et al.
Gutkowicz-Krusin et al.
Lazar
Wang
Ellenby et al.
Rhoads
Ellenby et al.
Gutkowicz-Krusin et al.
Danneels et al.
Ellenby et al.
Squilla et al.
Shaffer et al.
Bopardikar et al.
Sankrithi et al.
Rhoads
Ellenby et al.
Rothschild
Durst, Jr. et al.
Morris
Freeny, Jr.
Konishi et al.
Faahraeus
6,510,238
6,522,292
6,522,772
6,522,889
6,526,158
6,532,298
6,533,392
6,535,210
6,542,933
6,563,959
6,567,122
6,578,017
6,597,818
6,601,026
6,650,794
6,651,053
6,658,389
6,674,923
6,674,993
6,675,165
6,689,966
6,690,370
6,691,914
6,711,278
6,714,969
6,724,914
6,738,630
6,744,935
6,748,122
6,765,569
6,766,363
6,771,294
6,801,657
6,804,726
6,842,181
6,853,750
6,856,965
6,865,608
6,866,196
6,882,756
6,885,771
6,912,464
6,925,196
6,950,800
6,956,593
6,968,453
6,974,078
6,985,240
6,993,573
6,996,251
7,002,551
7,016,532
7,016,889
7,016,899
7,027,652
7,031,496
7,031,536
7,031,875
7,050,653
7,053,916
7,062,454
7,103,772
7,119,831
7,127,094
7,143,949
7,167,164
7,175,095
7,224,995
7,245,273
7,254,548
7,295,718
7,301,536
7,305,354
7,309,015
7,333,947
7,345,673
7,353,182
7,353,184
7,353,990
1/2003
2/2003
2/2003
2/2003
2/2003
3/2003
3/2003
3/2003
4/2003
5/2003
5/2003
6/2003
7/2003
7/2003
11/2003
11/2003
12/2003
1/2004
1/2004
1/2004
2/2004
2/2004
2/2004
3/2004
3/2004
4/2004
5/2004
6/2004
6/2004
7/2004
7/2004
8/2004
10/2004
10/2004
1/2005
2/2005
2/2005
3/2005
3/2005
4/2005
4/2005
6/2005
8/2005
9/2005
10/2005
11/2005
12/2005
1/2006
1/2006
2/2006
2/2006
3/2006
3/2006
3/2006
4/2006
4/2006
4/2006
4/2006
5/2006
5/2006
6/2006
9/2006
10/2006
10/2006
12/2006
1/2007
2/2007
5/2007
7/2007
8/2007
11/2007
11/2007
12/2007
12/2007
2/2008
3/2008
4/2008
4/2008
4/2008
Haycock
Ellenby et al.
Morrison et al.
Aarnio
Goldberg
Cambier et al.
Koitabashi
Ellenby et al.
Durst, Jr. et al.
Troyanker
Anderson et al.
Ebersole et al.
Kumar et al.
Appelt et al.
................... .. 704/9
Aoki
Rothschild
Alpdemir .................... .. 704/275
Shih et al.
Tarbouriech
Rothschild
Wiebe
Ellenby et al.
Isherwood et al.
Gu et al.
Klein et al.
Brundage et al.
Ashmore
Choi et al.
Ihara et al.
Neumann et al.
Rothschild
Pulli et al.
Cieplinski
Ellenby et al.
Acharya
Aoki
Stinson et al.
Hunter
Rathus et al.
Bober
Takahashi
Parker
Kass et al.
McIntyre et al.
Gupta et al.
Doyle et al.
Simon
Benke et al.
Hunter
........................ ..
709/218
Malone et al.
Azuma et al.
Boncyk et al.
Bazoon
Stern et al. .......................... .. 1/1
I’anson
Shimano et al.
Kajiwara
Ellenby et al.
Edso et al.
Kobayashi et al.
Giannini et al.
Jorgensen et al.
Ohto et al.
Elbaum et al.
Hannigan
Ericson et al.
Pettersson et al.
Rhoads
Eberl et al.
Tannenbaum
................ .. 705/ 18
Park et al.
Ellenby et al.
Rodriguez et al.
Frantz et al.
Wiebe et al.
Ericson et al.
Missinhoun et al.
Kirshenbaum et al.
Elliot et al.
...... .. 705/7.12
US 8,798,322 B2
Page3
(56)
References Cited
2005/0024501 A1
2/2005 Ellenby et a1.
U_g_ PATENT DOCUMENTS
2005/0102233 A1
2005/0162523 A1
5/2005 Parketal.
7/2005 Darrell et a1.
2005/0162532 A1
2005/0185060 A1
2005/0206654 A1
7/2005 Toyoda
8/2005 Neven, Sr. et al.
9/2005 Vaha-Sipila
7,356,705 B2
7,362,922 B2
4/2008 Ting
4/2008 Nishiyama et al‘
7376 645 B2*
5/2008 Bernard ,,,,,,,,,,,,,,,,,,,,,,,,, N 1/1
2005/0252966 A1
733832209 B2
@2008 Hudetz et a1,
2006/0008124 A1
1/2006 Ewe etal.
7,410,099 B2
7,427,980 B1
7,430,588 B2
8/2003 Fukasawa et 31,
9/2008 Partridge et a1.
9/2008 Hunter
2006/0038833 A1
2006/0161379 A1
2006/0190812 A1
2/2006 Mallinson et a1.
7,477,909 B2*
7,526,440 B2*
1/2009 Roth ......... ..
. 455/466
4/2009 Walker etal. .............. .. 705/26.7
2006/0223635 A1
2007/0109619 A1
7,558,595 B2
7/2009 Angelhag
2007/0146391 A1
6/2007 Pentenriederetal.
7,564,469 B2
7/2009 Cohen
2007/0182739 A1
8/2007 Platonov et a1.
7,580,061 B2
7,595,816 B1
8/2009 Toyoda
9/2009 Ennghtetal
2008/0021953 A1
2008/0157946 A1
7,641,342
7,680,324
7,696,905
7,720,436
7,737,965
1/2010
3/2010
4/2010
5/2010
EberletaL
Boncyk et a1.
Ellenby et a1.
Hamynen et a1.
6/2010 Alter et a1,
2008/0243721
2008/0279481
2009/0027337
2010/0045933
7,756,755 B2 *
7/2010 Ghosh et a1. ............... .. 705/26.1
2010/0188638 A1
7,764,808
7,765,126 B2
7/2010 Zhu
Hudetz
et a1,
et al‘
2011/0173100
2012/0002872 A1
7/2011
1/2012 Boncyk 6t
et 31.
7,768,534 B2
7,769,228 B2
7,774,283 B2
8/2010 Pentenrieder et a1.
8/2010 Bahlmann et al,
8/2010 Das et a1.
2012/0072353 A1
2012/0095857 A1
2012/0231887 A1
3/2012 B99119 9t al~
4/2012 McKelvey et a1.
9/2012 Lee et a1.
B2
B2
B2
B2
B2
7,775,437 B2
A1
A1
A1
A1
11/2005
Kulas
7/2006 Ellenby et a1.
8/2006 Ellenby et a1.
10/2006
Rosenberg
5/2007 Eberletar,
1/2008 Gil
7/2008 Eberletal.
10/2008 Joao
11/2008 Ando
1/2009 Hildreth
2/2010 Eberl et 211.
7/2010 Eberl et al~
8/2010 Cohen
7,830,417 B2 *
11/2010 Liu et a1. ................ .. 348/211.99
7,845,558 B2
12/2010 Beemer et a1.
7,889,193 B2
2/2011 Platonov 6t
EP
0967574 A2
12/1999
7,896,235 B2 *
3Z2011 Rahrillach?ndran ........... .. 235/379
EP
1012725 Al
6/2000
7,916,138 B2
3 2011
Jo
et
.
FOREIGN PATENT DOCUMENTS
8,090,616 B2
1/2012 Proctor,
Lemay CtJr.31.et a1.
EP
1355258 A2
10/2003
8,121,944 B2
2/2012 Norman 6t 31.
EP
2264669 A2
12/2010
8,131,118 B1 *
3/2012 Jing 6t 31. .................... .. 382/305
JP
86314297 A
1/1988
8,131,595 B2
3/2012 Lee et
JP
H09231244 A
9/1997
8,218,874 B2
8,219,146 B2:
7/2012 Boncyk 6t
7Z2012 Connors 6t 31. ............. ..
JP
JP
H10289243 A
H11265391 A
10/1998
9/1999
8,255,291 B1
8 2012
Nair ............. ..
. 705 26.64
8,312,168 B2 * 11/2012 Rhoads et a1. .............. .. 709/245
i;
588} 18888; 2
2888}
8,320,615 B2
8,326,031 B2
JP
JP
2001256500 A
2001265970 A
9/2001
9/2001
8,447,066 B2 *
8,494,274 B2 *
8,588,527 B2
8,626,602 B2 *
11/2012 H3an3 6t 31.
12/2012 Boncyk 6t
5/2013 King et 31. .................. .. 382/100
7Z2013 Badhaiudezln 6t 31.
112013 Boncy et .
.
/
382 181
1/2014 George ...................... .. 705/26.1
JP
2002197103 A
7/2002
JP
2002297648 A
10/2002
188888
i;
3888838828 2
2001;0011276 A1
2001 0032252 A1
8Z2001 Durst, Jralet 31.
10 2001 Durst et .
JP
2004005314 A
1/2004
2001/0044824 A1
11/2001 Hunter et a1.
i;
3883888881 2
8588:
2001/0047426 A1
2001/0053252 A1
11/2001 Hunter
12/2001 Creque
JP
JP
2005011180 A
2005038421 A
1/2005
2/2005
2002/0019819 A1
2/2002 Sekiguchi 6t 31.
JP
2005509219 A
4/2005
2002;0048403
A1
2002 0055957 A1
4Z2002
GEerreri
5 2002 O
sawa
JP
2007509392 A
4/2007
2002/0084328 A1
7/2002 Kim
88
8333888 2} 5888;
2002 0090132 A1
7 2002 Boncy et
2002/0102966 A1
8/2002 Lev et a1.
.
88
88118852 2}
188888
2002/0103813 A1
2002/0124188 A1
8/2002 Frigon
9/2002 Sherman 6t 31.
WO
WO
9942946 A2
9942947 A2
8/1999
8/1999
2002/0150298 A1
10/2002 R3j3g0p31 6t
WO
9942946 A3
10/1999
2002;0156866
A1
2002
0163521 A1
10Z2002 ESillhniider
112002
en y eta.1
WO
9942947 A3
12/1999
2002/0167536 A1
11/2002 Valdes et a1.
88
8881888 2%
13888?
2003/0064705
2003/0095681
2003/0116478
2003/0164819
2004/0208372
2005/0015370
4/2003
5/2003
6/2003
9/2003
10/2004
1/2005
W0
W0
WO
WO
WO
WO
A1
A1
A1
A1
A1
A1
Desiderio
Burg et 31.
Laskowski
Waibel
Boncyk et al.
Stavely et a1.
0149056
0163487
0171282
0173603
0201143
02059716
A1
A1
A1
A1
A2
A2
7/2001
8/2001
9/2001
10/2001
1/2002
8/2002
US 8,798,322 B2
Page 4
(56)
References Cited
Klinker G., “Augmented Maintenance of Powerplants: A Prototyping
Case Study of a Mobile AR System,” International Symposium on
FOREIGN PATENT DOCUMENTS
Augmented Reality, 2001, IEEE Computer Society, pp. 124-136.
Levine J .M., “Real-Time Target and Pose Recognition for 3-D
WO
WO
WO
02073818 A1
02082799 A2
03041000 A1
9/2002
10/2002
5/2003
OTHER PUBLICATIONS
Graphical Overlay,” Master’s thesis, 1997, 48 pages.
Ljungstrand P., et al., “WebStickers: Using Physical Tokens to
Access, Manage and Share Bookmarks on The Web,” Proceedings of
the 2000 ACM Conference on Designing Augmented Reality Envi
ronments (DARE 2000), 2000, pp. 23-31.
Rekimoto J ., et al., “Augment-able Reality: Situated Communication
Bulman J ., et al., “Mixed Reality Applications In Urban Environ
Through Physical and Digital Spaces,” Wearable Computers, Second
ments,” BT Technology Journal, 2004, vol. 22 (3), pp. 84-94.
International Symposium, 1998, pp. 68-75.
Rekimoto J ., et al., “CyberCode: Designing Augmented Reality
Environments With Visual Tags,” Proceedings of the 2000 ACM
Carswell J .D., et al., “An Environment for Mobile Context-Based
Hypermedia Retrieval,” IEEE: Proceedings of the 13th International
Workshop on Database and Expert Systems Applications, 1529
4188/02, 2002, 5 pages.
Chang S.F., et al., “Visual Information Retrieval from Large Distrib
uted Online Respo sitories,” Communication of Association for Com
puting Machinery, ISSN:0001-0782, 1997, vol. 40 (12), pp. 64-71.
Chang W., et al., “Ef?cient Resource Selection in Distributed Visual
Information Systems,” ACM Multimedia, 1997, pp. 203 -213.
Diverdi S., et al., “ARWiniA Desktop Augmented Reality Window
Manager,” UCSB Tech Report Dec. 2003, University of California
Santa Barbara, May 2003, 7 pages.
Diverdi S., et al., “Level of Detail Interfaces,” Proc. ISMAR 2004,
IEEE/ACM IHyf Syrnp on Mixed and Augmented Reality, Arlington,
Virginia, 2004, 2 pages.
Feiner S., et al., “A Touring Machine: Prototyping 3D Mobile Aug
mented Reality Systems for Exploring The Urban Environment,”
Personal Technologies, 1997, vol. 1 (4), pp. 208-217.
Fernandez F., “Responsive Environments: Digital Objects in the
Landscape,” Thesis submitted to Department of Landscape Architec
ture, University of Manitoba, Winnipeg, Manitoba, Mar. 2004, 124
pages.
Geiger C., et al., “Mobile AR4ALL,” Proceedings of the IEEE and
ACM Intl Symposium on Augmented Reality (ISAR’ 01), Oct. 29-30,
2001, Columbia University, New York, 2 pages.
Gevers T., et al., “PicToSeek: Combining Color and Shape Invariant
Conference on Designing Augmented Reality Environments, 2000,
pp. 1-10.
Rekimoto J ., et al., “The World Through the Computer: Computer
Augmented Interaction With Real World Environments,” ACM Sym
posium on User Interface Software and Technology, 1995, pp. 29-36.
Rekimoto J ., “NaviCam: A Palmtop Device Approach to Augmented
Reality,” Fundamentals of Wearable Computers and Augmented
Reality, 2001, Bar?eld and Caudell, Eds., pp. 353-377.
Rohs M., et al., “A Conceptual Framework for Camera Phone-Based
Interaction Techniques,” Pervasive Computing. Lecture Notes in
Computer Science, 2005, vol. 3468, pp. 171-189.
Siltanen S ., et al., “Implementing a Natural User Interface for Camera
Phones Using Visual Tags,” Proceedings of the 7th Australasian User
interface conference, 2006, vol. 50, pp. 113-116.
Smailagic A, et al., “Metronaut: AWearable Computer With Sensing
and Global Communication Capabilities,” First International Sym
posium on Wearable Computers, Oct. 13-14, 1997, Cambridge, Mas
sachusetts; Digest of Papers, pp. 116-122.
Smith J .R., et al., “VisualSEEk: A Fully Automated Content-Based
Image Query System,” Proceedings of the fourth ACM international
conference on Multimedia, ACM New York, 1996, pp. 87-98.
Starner T., et al., “Augmented Reality Through Wearable Comput
ing,” Presence: Teleoper. Virtual Environ. 6, 4, Massachusetts Insti
tute of Technology, 1997, 24 pages.
Features for Image Retrieval,” IEEE Transactions on Image Process
Supplementary European Search Report for Application No.
ing, 2000, vol. 9 (1), pp. 102-119.
Haritaoglu I., “InfoScope: Link from Real World to Digital Informa
tion Space,” IBM Almaden Research, UbiComp, Lecture Notes in
Computer Science, 2001, vol. 2201, pp. 247-255.
Hollerer T., et al., “Chapter Nine: Mobile Augmented Reality,” in:
Telegeoinformatics: Location Based Computing and Services,
Karimi H., eds., Taylor & Francis Books, Ltd., 2004, Chapter 9, 39
EP02778730, mailed on May 14, 2007, 3 pages.
Supplementary European Search Report for Application No.
pages.
EP06801326, mailed on Aug. 12, 2008, 8 pages.
Suzuki G., et al., “u-Photo: Interacting with Pervasive Services Using
Digital Still Images,” Pervasive Computing. Lecture Notes in Com
puter Science, vol. 3468, 2005, pp. 190-207.
Toye E., et al., “Interacting with Mobile Services: An Evaluation of
Camera-Phones andVisual Tags,” in: Personal and Ubiquitous Com
Iwamoto T., et al., “Digital Safari Guidebook With Image Retrieval,”
puting, vol. 11 (2), Springer-Verlag, London Limited, 2007, pp.
International Conference on Advances in Mathematical Computa
97-106.
tions and Statistical Computing, 1999, vol. 2, pp. 1011-1012.
Iwamoto T., et al., “u-Photo: A Design and Implementation of a
Snapshot Based Method for Capturing Contextual Information,” The
Second International Conference on Pervasive Computing Pervasive,
2004, Advances in Pervasive Computing, LinZNienna, Austria, 6
Wagner D., et al., “First Steps Towards Handheld Augmented Real
ity,” Vienna University of Technology, Proceedings of Seventh IEEE
International Symposium on Wearable Computers, Oct. 18-21, 2003,
9 pages.
Symposium on Wearable Computers, 1997, IEEE Computer Society,
Yang J., et al., “Smart Sight: A Tourist Assistant System,” Digest of
Papers, Third International Symposium on Wearable Computers,
Oct. 18-19, 1999, San Francisco, California, pp. 73-78.
Yeh T., et al., “Searching the Web with Mobile Images for location
Recognition,” IEEE: Proceedings of the 2004 IEEE Computer Soci
pp. 138-145.
ety Conference on Computer Vision and Pattern Recognition
Kangas K., et al., “Using Code Mobility to Create Ubiquitous and
Active Augmented Reality in Mobile Computing,” Mobicom, 1999,
Seattle, Washington, pp. 48-58.
(CVPR’04), 1063-6919/04, 2004, 6 pages.
Zhang X., et al., “Taking AR Into Large Scale Industrial Environ
ments: Navigation and Information Access With Mobile Computers,”
Proceedings of the IEEE and ACM Intl Symposium on Augmented
Reality, 2001, pp. 179-180.
pages.
J ebara T., et a1 ., “Stochasticks: Augmenting The Billiards Experience
With Probabilistic Vision and Wearable Computers,” International
Kato H., et al., “Marker Tracking and HMD Calibration for aVideo
Based Augmented Reality Conferencing System,” Proceedings of the
2nd IEEE and ACM Intl Workshop on Augmented Reality, San Fran
cisco, California, 1999, pp. 85-94.
* cited by examiner
US. Patent
Aug. 5, 2014
Sheet 1 0f8
INPUT HVIAGE
US 8,798,322 B2
/1 0
CAPTURE
OBJECT
ETERMINE
HVIAGE
_ SYMBOLIC
IMAGE
IlVlAGE
¢
INPUT IMAGE
26
l
DECOMPOSITION \ 3 4
DECODE
2 s /
DATABASE
MATCHING
\3 6
Jr
3 8 \\
FIG. 1
SELECT BEST
MATCH
40\
4 2\
$
URL LOOKUP
i
URL RETURN
SYMBOL
US. Patent
Aug. 5, 2014
Sheet 2 0f8
US 8,798,322 B2
§ 5 f8
\
FIG.2
12\Q [I]\18 hV
J
Q
V/
K”
US. Patent
Aug. 5, 2014
Sheet 3 0f8
US 8,798,322 B2
START
>
FOR EACH INPUT IIVIAGE
SEGMENT GROUP
FOR EACH OBJECT IN
DATABASE
—__>
FOR EACH _I_EW OF l——_
——>
OBJECT
FOR EACH SEGIVIENT
GROUP IN THIS VIEW
——>
r
I
V
W
GREYSCALE
CONTPARISON
COLORCUBE
COMPARISON
|
I
SHAPE
COMPARISON
+_._
WAVELET
CONIPARISON
¢
FIG. 3A
US. Patent
Aug. 5, 2014
Sheet 4 0f8
US 8,798,322 B2
l
CALCULATE CONIBINED
MATCH SCORE
NEXT SEGMENT GROUP IN
THIS DATABASE VIEW
NEXT VIEW OF THIS
DATABASE OBJECT
NEXT OBJECT IN
DATABASE
NEXT INPUT HVIAGE
SEGMENT GROUP
FIG. 3B
US. Patent
Aug. 5, 2014
Sheet 5 0f8
TARGET
CONTENT
SERVER
OBJECT
102
US 8,798,322 B2
100
103
CAMERA
BROWSER
IMAGE
PROCESSING
104
110
TERl‘viINAL
Y
TARGET
IMAGE DATA
OBJECT
INFORMATION
109
108
(HHECT
106
/
RECOGNITION I‘m“> DATABASE
IDENTIFICATION SERVER
US. Patent
Aug. 5, 2014
Sheet 6 0f8
TARGET
OBJECT
202
\l
\
US 8,798,322 B2
CONTENT
SERVER
200
211
/
I FIG. 5
203
CAMERA l/
i
I
BROWSER
IMAGE
‘
PROCESSING
204
210
TERMINAL
Y
IMAGE DATA
205
207
209
OBJECT
RECOGNITION
TARGET
OBJECT
INFORMATION
208
//
DATABASE
IDENTIFICATION SERVER
US. Patent
Aug. 5, 2014
Sheet 7 0f8
TARGET
OBJECT
302
US 8,798,322 B2
SPACECRAFT
DATA SYSTEM
\300
3 1 0/
1
\
Y
303
IMAGE
PROCESSING \
Flg_ 6
TARGET
OBJECT
INFORMATION
IMAGE DATA
305
3 0‘7
\v
3 0\8
A
/
\
OBJECT
RECOGNITION
DATABASE
IDENTIFICATION SERVER
US. Patent
Aug. 5, 2014
US 8,798,322 B2
Sheet 8 0f 8
460
412
Search
Ténn
$421A
425
FIG. 7
Search
Term
Results Set
Index
432 ’2.
WIN l l l
Search
Engine
430
US 8,798,322 B2
1
2
OBJECT INFORMATION DERIVED FROM
OBJECT IMAGES
providing the user with an entry code that can be used to
bypass the long lines of ordinary ticket purchasers.
In such embodiments it was speci?cally contemplated that
analysis of the images could be performed locally (i.e. on the
cell phone, PDA or other device capturing the image), distally
This application is a divisional ofSer. No. 13/686,851, ?led
Nov. 27, 2012, which is a continuation of Ser. No. 13/207,
211, ?ledAug. 10, 2011 andissuedDec. 11, 2012 as US. Pat.
No. 8,331,679, which is a divisional of Ser. No. 13/037,330
?led Feb. 28, 2011 and issued Jul. 10, 2012 as US. Pat. No.
8,218,873, which is a divisional of Ser. No. 12/568,130 ?led
on Sep. 28, 2009 and issued Mar. 1, 2011 as US. Pat. No.
7,899,252, which is a divisional of Ser. No. 11/204,901 ?led
Aug. 15, 2005 and issued Mar. 16, 2010 as US. Pat. No.
7,680,324, which is a continuation-in-part ofSer. No. 09/992,
at a server, or more preferably using some combination of the
two. It was also contemplated that any available database
could be accessed to provide the returned information,
including publicly accessible databases on the Internet. It was
not appreciated, however, that one could integrate these con
cepts with the searching capabilities of standard Search
Engines.
In the 1990s Yahoo !TM introduced the idea of indexing web
pages accessible on Internet, and providing a Search Engine
942 ?led Nov. 5, 2001 andissued as Mar. 21, 2006 as US. Pat.
that to access the index. Since that time dozens of other
No. 7,016,532, which claims priority to provisional applica
tion No. 60/317,521 ?led Sep. 5, 2001 and provisional appli
cation No. 60/246,295 ?led Nov. 6, 2000. US. patent appli
cation Ser. No. 11/204,901 ?led Aug. 15, 2005 and issued
Mar. 16, 2010 as US. Pat. No. 7,680,324 also claims priority
to provisional application No. 60/ 630,524 ?led Nov. 22, 2004
and provisional application Ser. No. 60/ 625,526 ?led Nov. 4,
2004. These and all other referenced patents and applications
are incorporated herein by reference in their entirety. Where a
searching systems have been developed, which use all man
ner of various search methods, algorithms, hardware and/or
software. All such systems and methods that accept user
inputs of Key Information, and then utilize such Key Infor
20
be a natural person, as well as a device (computing or other
wise), algorithm, system, organization, or any other entity. In
searching for information, a Search Engine can utilize any
de?nition or use of a term in a reference that is incorporated
by reference is inconsistent or contrary to the de?nition of that
25
term provided herein, the de?nition of that term provided
herein is deemed to be controlling.
FIELD OF THE INVENTION
30
The ?eld of the invention is digital imaging.
BACKGROUND
Several years ago the present inventors pioneered the con
35
cept of using digitally captured images to identify objects
within the images, and then using such identi?cations to
retrieve information from various databases. Examples
include:
Using a local device (cell phone, digital camera, PDA or
mation to provide the user with information of interest, are
referred to herein as Search Engines. The user, of course, can
suitable search domain, including for example:
A database (including for example a relational database, an
object database, or an XML database).
A network of resources including for example web pages
accessible within the Internet; and
A public or private collection of documents or information
(e.g., documents, information, and/or messages of a
company or other organization(s)) such as that main
tained by LEXISTM.
In a typical search, Key Information is provided to the
Search Engine in the form of key words comprising text,
numbers, strings, or other machine-readable information
types. The Search Engine then searches its indices of web
pages for matches, and returns to the user a hyperlinked
40
listing of Internet Uniform Resource Locators (“URLs”), as
well as some brief display of context in which the key word(s)
other device) to capture an image of an object in an art
are used. The information of interest can sometimes be found
museum, identifying the object from the image data, and
in the hyperlinked listing, but is more frequently found by
linking directly to the listed web pages.
Providing Key Information to Search Engines in the form
then providing the user with information regarding the
object (i.e., about or relating to the object);
Using a local device (cell phone, digital camera, PDA or
45
of text strings has inherent dif?culties. It involves strategy in
other device) to capture an image of an automobile as it
the selection of the text to be entered, and even with respect to
drives along a road, identifying the make and model
the format of the keywords (for example using wildcards).
Another dif?culty is that small computing and/or telephony
devices (e. g. telephones, both mobile and non-mobile), have
from the image data, and then providing a user with a
link to a website relating to that particular make and
model;
Using a local device (cell phone, digital camera, PDA or
other device) to capture an image of a bar code, logo, or
other indicia in a magazine, using information contained
in the indicia to identify a product, and providing a
telephone number or other contact information relating
to that product;
Using a local device (cell phone, digital camera, PDA or
50
SUMMARY OF THE INVENTION
55
The present invention provides apparatus, systems and
methods in which: (a) a digital photograph, video, MPEG,
AVI, or other image is captured using a camera equipped cell
phone, PDA, or other image capturing device; (b) key words
other device) to photograph a billboard of a restaurant,
identifying the restaurant from a barcode, special target,
written language, or other information contained in the
photograph, and using that information to access a data
base to provide the user with restaurant’s location,
menu, or telephone number; and
Using a local device (cell phone, digital camera, PDA or
other device) to capture an image of a sign at a sports
small and/ or limited keyboards, thus making text entry dif?
cult.
or other search criteria are automatically extracted or derived
60
from image; (c) the search criteria are submitted to a Search
Engine to obtain information of interest; and (d) at least a
portion of the resulting information is transmitted back
locally to, or nearby, the device that captured the image.
stadium, using information extracted from the image to
Some images so utilized will include symbolic content that
is suf?cient in and of itself to be relatively non-ambiguous.
Such symbolic content, for example, can be a telephone num
automatically purchase an entry ticket for the user, and
ber or a web-site address. In such instances the symbolic
65
US 8,798,322 B2
4
3
Another object is to provide communication means with
content search criteria can advantageously be utilized as a
operative devices without requiring a public connection
literal in the search criteria. In other instances signi?cant
additional processing can be needed. For example, an image
of an automobile will likely need to be processed to determine
the make and model, and that information (e.g. MercedesTM
SSOOTM) can then be transmitted to the Search Engine to be
used as key words for a search. It is also contemplated that
processing of some images will result in only best guesses.
therewith.
Various other objects, features, aspects and advantages of
the present invention will become more apparent from the
following detailed description of preferred embodiments of
the invention, along with the accompanying drawings in
which like numerals represent like components.
Thus, a side view of an automobile can not be analyzable into
a particular make and model, and in that case the system can
provide more generic terms such as SUV or automobile.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 is a schematic block diagram top-level algorithm
In general, the present invention provides technology and
?owchart;
processes that can accommodate linking objects and images
FIG. 2 is an idealized view of image capture;
FIGS. 3A and 3B are a schematic block diagram ofprocess
to information via a network such as the Internet, which
require no modi?cation to the linked object. Traditional meth
details of the present invention;
ods for linking objects to digital information, including
FIG. 4 is a schematic block diagram of a different expla
applying a barcode, radio or optical transceiver or transmitter,
nation of invention;
or some other means of identi?cation to the object, or modi
fying the image or object so as to encode detectable informa
tion in it, are not required because the image or object can be
identi?ed solely by its visual appearance. The users or devices
can even interact with objects by “linking” to them. For
example, a user can link to a vending machine by “pointing
and clicking” on it. His device would be connected over the
Internet to the company that owns the vending machine. The
company would in turn establish a connection to the vending
machine, and thus the user would have a communication
channel established with the vending machine and could
interact with it.
FIG. 5 is a schematic block diagram similar to FIG. 4 for
20
FIG. 6 is a schematic block diagram for spacecraft appli
cations.
FIG. 7 is a schematic of a system in which a local device
25
30
FIGS. 1-6 are copied from the priority PCT application,
It is especially preferred that algorithms are chosen such that
35
Search Engine-Related Embodiments
In FIG. 7 a system 400 generally comprises a portable
imaging device 410, a distal server 420, an electronic com
images, and can eliminate reliance on a distal server that
might be oversubscribed. Thus, some or all of the image
40
ing of symbols detected in the image can be distributed arbi
trarily between the mobile (client) device and the server. In
other words, some processing can be performed in the client
munications network 425, and a search engine 430.
In general, the portable device 410 captures an image 412
of an object 415; and transmits information 413 regarding the
image to the server 420. At least one of the device 410 and the
server 420 derives a search term 421A, 421B from at least one
of the image 412 and the transmitted information 413, respec
device and some in the server, without speci?cation of which
particular processing is performed in each, or all processing
DETAILED DESCRIPTION
PCT/U S02/ 35407 ?led Nov. 5,2002. Discussion of that those
?gures is set forth later in the application.
rithms are preferred over slower and less accurate algorithms.
processing, including image/object detection and/or decod
captures and image, a search term is automatically derived
from an image, is submitted to a search engine to produce a
results set, and information from the results set is sent back to
the device.
The present invention contemplates any suitable decompo
sition algorithms. Clearly, faster and more accurate algo
at least some processing can take place locally to the device
that captures the image. Such processing can in many
instances eliminate the need to wirelessly transmit detailed
cellular telephone and personal data assistant (PDA) applica
tions; and
45
tively. At least one of the device 410 and the server 420 cause
the search term 421A, 421B to be submitted via a network 425
can be performed on one platform or the other, or the plat
forms can be combined so that there is only one platform. The
to a search engine 430 that uses an index 432 of web pages or
image processing can be implemented in a parallel computing
manner, thus facilitating scaling of the system with respect to
database size and input tra?ic loading.
term 421A, 421B to produce a results set 434, and causes at
least a portion of the results set 434 to be transmitted back to
other information. The search engine then uses the search
50
the entire image, one or more subsets of the image, as well as
a name or other information derived from the image, but not
contained within the image. It should also be appreciated that
which can be determined based on the appearance of the
object in an image. This can be the location and/or identity of
people scanned by multiple cameras in a security system, a
55
the mobile client and serverinot necessarily a direct con
speci?c vehicles without requiring a transmission from the
60
partial views of the object.
nection.
Device 410 can be a cell phone, PDA, laptop computer, or
any other portable device that optically captures an image. By
Therefore, it is an object of the present invention to provide
a system and process for identifying digitally captured
images without requiring modi?cation to the object.
Another object is to use digital capture devices in ways
never contemplated by their manufacturer.
Another object is to allow identi?cation of objects from
one could use a proxy server between his/her portable device
and the server. I short, the present application contemplates
using any complexity of circuitous communication between
passive locator system more accurate than GPS or usable in
areas where GPS signals cannot be received, the location of
vehicle, and many other uses.
the portable device 410. In the above discussion it should be
appreciated that information regarding the image can include
It is further contemplated that some suitable algorithms
will take into account the position and orientation of an object
with respect to the user at the time the image was captured,
“optically captures” is meant some sort of light sensitive
array, the output of which can be processed to comprise a
visually perceptible image. Viewed from another perspective,
65
device 410 can be any camera having telephony capability,
and especially having cell phone capability. With current
technology, device 410 would usually have a lens or other
US 8,798,322 B2
5
6
light focusing mechanism, although it is contemplated that
from light emitted or re?ected from the object, even if the data
advances in electronics can eliminate the need for any physi
is never presented to a user as a visually perceptible image,
and even if the data is sent to a distal server without ever being
cal focusing mechanism. The term “optically captures” is not
satis?ed by a device that has not optical components, and is
merely capable of downloading images from the Internet or
collected into an image by the device.
The information transmitted to the server can comprise any
relevant information regarding the contents of the image.
Thus, information 413 could comprise the entire image, or a
other sources.
It is certainly contemplated that the cell phone or other
device providing the services discussed herein would operate
portion of the image. For example, where a user takes a
picture of a bar code (whether 2D, 3D or any other con?gu
ration, the device 410 could process the image 412 to remove
color and all background except the bar code itself, and then
software permitting it to do so. That software could be resi
dent on the device, in external memory (memory card), or
paged in as needed.
Object 415 (referred to as a Thing of Interest” in one or
more of the priority applications) can be any visually percep
merely send the portion of the image containing the bar code
as the transmitted information 413. In other cases it is con
tible object, regardless of dimension. Contemplated “two
dimensional objects” include objects in which the relevant
templated that the device 410 could suf?ciently process the
image 413 to derive one or more keywords, and then send
information is substantially in two dimensional format,
only the keyword(s) as the transmitted information 413. All
which includes advertisements and articles in magaZine or
other print media, as well as photographs or designs on bill
boards, street signs, restaurant or other business signs, user
manuals, paintings at a museum, and so forth.
possible combinations are also contemplated. Thus, a user
might take a photograph of a GucciTM handbag, the device
20
Contemplated three dimensional objects include substan
tially all physical objects in which the relevant information is
derived from the shape of the object and/ or the appearance of
the surface of the object. Thus, an automobile is considered
herein to have three dimensions of relevance where the shape
transmitted information 413. In such instances the process
can be iterative. Thus, the device might initially transmit the
word “Gucci” as the ?rst transmitted information, receive a
25
results set from the search engine indicating clothing acces
sories, and then subtract out background except for the hand
bag, and transmit the image of the handbag as the second
transmitted information. As discussed above, it is speci?cally
30
numerical/digital data that is mathematically derived from the
or other dimensions convey information about the make and
model. Similarly, a window in a building can be considered to
have three dimensions of relevance where the identity of the
contemplated that the device 410 could send the server 420
manufacturer or distributor can be gleaned from the overall
physical dimensions, detail, and so forth. As another
412 might derive the word “Gucci” from the image, subtract
out background except for the handbag, and then transmit: (a)
the word “Gucci”; and (b) the image of the handbag as the
example, a beverage container can be considered to have three
image. Examples include image features and characteristics
dimensions; information can be obtained from the shape of
that the server 420 could use in the server recognition process,
the container but further information can also be obtained
without transmitting the original image.
from the label, printing, logos, text, or other such visible
markings on the container (obtaining information from vis
35
As should be apparent by now, the transmitted information
need not be limited to image information. Sights, sounds, text,
ible markings on the container enables discrimination
between different containers that have identical physical
and all sorts of other information can be included in the
transmitted information, some of which can be derived
shape). Contemplated three dimensional objects include sub
stantially all physical objects in which the relevant informa
directly from the image, and some of which can be derived
indirectly from the image. In addition, the device 410 can also
capture non-visual information such as sounds, and that infor
mation can also be transmitted. Thus, it is contemplated that
the device could capture the sounds of a frog, capture an
image of a lake or forest, and send both to be used as (or
tion is derived from changes over time. For example, the
40
speed of a bird or its ?ight patterns, or a gesture of a person,
can be captured in multiple images over a period of time, and
can be relevant information, and can be reduced to search
terms (referred to as Key Information in one or more of the
priority documents) for submission to a search engine. Of
further analyzed into) search terms.
course, many objects will be considered to have two, three or
Distal server 420 is distal in the sense that it has no hard
wired link to device 410. Server 420 can be a single device, as
four dimensions of relevance herein. Thus, relevant informa
well as any number of devices coupled together, as for
45
tion for an automobile can be provided by each of a two
example in a server farm. All manner of suitable servers are
dimensional logo on the side of the vehicle, the three dimen
sional shape of the vehicle, and its four dimensional
acceleration or handling features.
It is especially contemplated that objects can include ani
mate and inanimate objects. Among animate objects are
included faces of people, and biometric information such as
contemplated. Thus, servers can use any reasonable hard
the ?ngerprint pattern on a human ?nger, an iris of a person,
50
tions protocols, and so forth.
In terms of interaction with the device, the various analyti
cal tasks discussed above can allocated in any suitable man
ner between server 420 and device 410. For example, in the
55
and so forth.
Image 412 is contemplated to be any array of pixels. In
most cases the pixels will be regularly arranged, but that is not
iterative operation discussed above with respect to the
GucciTM handbag, it is contemplated that the device 41 0 could
analyze the image suf?ciently to transmit the term “Gucci” as
an initial search term to the search engine 430, and the server
420 could then undertake the tasks of subtracting out back
absolutely necessary. In most cases the pixels also will num
ber greater than 19,200 (160x120), such as 78,800 (320x240)
but they can number few than that. More preferred images
ware, operate using any reasonable software, communica
60
ground of the image except for the handbag, and transmitting
the image of the handbag as a second search term.
In another example, the server 420 could determine that the
have greater pixel counts, including for example, 256,000
(640x400), more preferably at least 2 million, and even more
original image provided insuf?cient information, and send a
preferably at least 4 million. It is not necessary that the image
be actually constructed at the portable device. Thus, a state
ment that “the portable device captures an image of an object”
includes situations where the device receives and derives data
message to the user through the device 410, directing the user
to take another image (such as from another angle, closer, or
with greater detail.). Indeed, the server 420 could direct the
user to take an image of another object entirely, in order to
65
US 8,798,322 B2
7
8
help determine identity of the ?rst object. Thus, the user could
communications between server and search engine will very
likely take place over some combination of Internet and local
take a ?rst image of a payment display at a ball game, provide
that image to the server for identi?cation, and then instruct the
server farm networks.
The results set 434 can be of any size and composition, but
most likely will be tailored to accommodate the device 410. It
does very little good, for example, to transmit dozens of web
pages to a cell phone, which has insuf?cient display area to
user to take an image of a credit card against which the user
wants to be billed for entrance into the ball game. The server
could then process the payment against that credit card, and
provide an entry code that the user could type to pass through
an electronically controlled gate.
properly view them. Thus, it is contemplated that the results
set 434 can be whittled down or otherwise processed by the
In still another example, a user could use his cell phone to
capture an image of a screwdriver set at a hardware store, and
the cell phone could transmit the information derived from
the image to GoogleTM or some other search engine to ?nd
comparison prices. The server 420 could then instruct the user
to turn over the packaging and take another image of the set,
this time from the back side of the packaging. In this way
there is iterative interaction among the user’s device, the
server, and the search engine.
It should also be appreciated that there are embodiments in
which the search engine never communicates with the por
table device. For example, the server might do the search
query, get results, and provide them to the portable device, or
server (which of course is indicated generically by numeral
420 and need not be the very same box as utilized earlier in the
transmission of the transmitted information 413) before being
sent to the device 410. Thus, the server 420 or some other
processor can process results before providing them to the
device 410, such as where the search terms are submitted to
the search engine by the server 420 rather than by the device
410. But the device 410 can also access the search engine
directly using search information provided by the server. Four
contemplated search modes include the following:
20
even to a television or other device besides the portable
engine, and the search engine sends one or more web pages
device.
The phase “search engine” is contemplated herein to
include any system dedicated to the indexing, searching and
retrieval of information. The most familiar search engines
such as GoogleTM, YahoolTM, MSNTM, andAlta VistaTM focus
mostly or entirely on indexing web pages from the World
Wide Web portion of the Internet. Other search engines, such
as Lexis/NexisTM focus on indexing proprietary collections of
25
30
35
4. In any of the above modes, or in “direct linking” mode,
40
server, which causes an action to occur. An example of this is
clicking on a vending machine with a cell phone to buy
something from the machine. Another example is clicking on
45
symbol. It is speci?cally contemplated that in some instances
50
set, is transmitted back to the device, whether directly or
indirectly by the search engine. It is particularly contemplated
that a results set could include at least one hyperlinked
case the device and/or the server in any combination can
55
including public and private access networks, and combina
tions of the two. Preferred networks include the Internet, the
upcoming Internet II, cell phone networks, and so forth.
Although not expressly shown, the communication lines in
a TV listing in a newspaper, causing the server to change the
channel of the television in front of the user.
Thus, a statement that “the search engine causes at least a
portion of the results set 434 to be transmitted back to the
portable device 410” should be interpreted herein to mean
that at least some information relating to the results set, which
information can or can not be included verbatim in the results
plant, where the search term is the name of the plant). In either
perform one or more of the tasks of deriving the search term
and submitting it to one or more search engines.
Network 425 can be any workable electronic network,
the result might not be a search results page, but instead some
other type of information or action. For example, a server
could identify an object, and thereupon send a code to another
portion of an image, as well as a logo, bar code or other
an image will contain a literal of a search terms (e.g. the name
of a movie on a movie poster), in some instances an image
will not contain such a literal (e.g. a picture of a tree or other
engine, and receives a response. The server optionally pro
cesses the search response (which could be in any form) and
provides some result to portable device. The result could, for
example, comprise a ?le sent to the portable device, or a web
portable device.
names, number strings and other terms that are not found in
any dictionary; as well as (c) characters that are interpreted as
wild cards, truncations and so forth. Such search engines are
already starting to experiment with use of non-text keys,
including for example images and/or sounds. All such pos
sible keys fall within the scope of contemplated search terms.
Thus, contemplated search terms include key words, a
device composes a search URL, sends the search query to the
search engine, and receives one or more web pages in
response.
3. The server 420 sends the search query to the search
page on some server, with URL of that web page sent to the
engines, the keys are currently text. In such instances a user
typically enters one or more key words, where the term “key
word” is used in an extremely broad sense to include: (a)
words that would likely be found in a dictionary; (b) proper
back to the portable device.
2. The server 420 sends keywords, and optionally also a
search engine address, to portable device 410. The portable
data, which can include links to Internet Web pages. The
phase “search term” is contemplated herein to include any
keys or other inforrnationused by the search engines to access
their indexing system. In the case of most web based search
1. The server 420 composes a search URL (consisting of
search engine address and key words) and sends it to the
portable device 410. The portable device then executes the
search engine query by sending the search URL to the search
60
address.
It is speci?cally contemplated that results sets can include
the following types of information: Uniform Resource Loca
tor (URL); Uniform Resource Identi?er (URI); Internet Pro
tocol (IP) address; telephone number; radio frequency or
channel; television frequency or channel; and physical loca
FIG. 7 are all contemplated to be one- or two-way commu
tion or address. The result(s) displayed to the user can be
interactive. In such a case, the user can take further action by
nications as appropriate. Moreover, it is contemplated that
multiple networks will usually be involved. Thus, for
web page, or some combination of the two. Or, as discussed
example, communications between device 410 and server
420 will very likely take place over some combination of cell
phone (not shown) and Internet networks (e.g. 425), while
interacting directly with the object, by linking to a referenced
65
above, the results could cause another server/computer or
machine to perform some action, such as dispensing a product
or changing a channel.
US 8,798,322 B2
10
gesturing in sign language. Image/motion recognition tech
From a method perspective, methods of using a search
an object; running computer software that automatically
niques are used to translate the sign language into text or other
machine-understandable data, such as text. The machine
understandable data is either sent directly to a search engine
derives a ?rst search term from at least a portion the image;
or is used to determine search terms that in turn are sent to a
submitting the ?rst search term to the search engine; and
transmitting the information to the device. Some preferred
methods further comprise using the device to take a second
search engine. The search engine returns information
addresses pertinent to the meaning of the sign language or
engine to obtain information are contemplated comprising:
using a cell phone enabled portable device to take an image of
portions thereof.
Still another embodiment comprises search using speech
input. There, human speech is captured by a sound capture
and/or recording device. Speech recognition processing is
image of the object; running the computer software to derive
a second search term from at least a portion of the second
object; and submitting the second search term along with the
?rst search term to the search engine. Other preferred meth
then used to recognize the speech and translate it into
machine-understandable data (such as text). The machine
understandable data is either sent directly to a search engine
ods include the step of submitting the ?rst search term can
advantageously comprise: sending at least the portion of the
or is used to determine search terms that are in turn sent to a
image to a distal server; running the software on the server;
and the server sending the search term to the search engine.
Still other preferred methods include a distal server providing
the search term(s) to the device, with the device submitting
the search term(s) to the search engine.
Analysis of data (whether visual or otherwise) to produce
search engine. The search engine returns information
addresses pertinent to the meaning of the human speech or
portions thereof.
An especially preferred embodiment of this invention com
20
prises a search using camera-equipped portable device.
Useful techniques include. For example, signal analysis, Fou
There, imagery is captured by a portable device with a net
work connection (for example, a cellular telephone). Image
rier analysis, pattern matching, pattern recognition, image
recognition, object recognition, wavelet analysis, component
recognition processing is then used to recognize at least one
object in the imagery. The recognition process can be per
search terms can be accomplished in any suitable manner.
analysis, etc.
25
formed in the portable device, in a distant server, or distrib
uted and/or otherwise shared and performed partly in each.
Based on the identity of the object(s), text keywords corre
sponding to the object(s) are retrieved from a database. As
EXAMPLES
with the image recognition, it is preferred that this process
Search terms can be advantageously derived from attri
bute(s) including name, type, size, color, position, and loca
tion, with the derivation performed by algorithm, table/data
30
occur on a distant server, although it can be performed on the
base look-up, hardware device, or other suitable means. For
portable device or on a combination of the portable device and
the server. The text keywords are then sent to a search engine.
example, consider an example wherein the object being
This is accomplished by sending the keywords to an Internet
imaged is a poster for the color version of a movie named
“Modern Times,” starring Charlie Chaplin. The device 410
35
search engine web site as an HTTP transaction, with the
search keywords embedded in the URL that is sent to the
search engine web site. It is preferred that the HTTP transac
tion be initiated from the portable device, so that the search
results are returned directly to the portable device. In this
case, the search keywords would generally ?rst be made
available on the portable device; if they were determined on
and/ or the server 420 can identify as attributes the text “Mod
ern Times Movie Poster” and “Color Version”, and can from
that determine search terms such as “Modern Times”, “Col
orized”, “Charlie Chaplin”, and “Classic movies”. The
attributes and search terms in this case could be determined 40
the distant server then they are ?rst sent from the server to the
by a human user, a machine algorithm, or some combination
of the two.
portable device. The search engine results are returned to the
In another example, a user takes an image of a notebook
portable device as a web page which can then be displayed in
computer. An algorithm detects the notebook computer in the
image and identi?es it as being a Model 5, made by ZZZ
Corporation. The algorithm then determines the attribute
“ZZZ Model 5” and the corresponding search terms, “online
45
shopping”, “ZZZ”, “notebook”, and “5”.
An embodiment of particular interest comprises a search
using image and/or video input. The device captures one or
50
the web browser of the portable device. If the HTTP transac
tion was initiated by the server, then the results web page is
made available for viewing on the portable device by one or
more various means (the address of the results web page can
be sent to the portable device, or the entire web page can be
sent to the portable device, or the web page can be stored or
converted into another form on the server after which the
more of single images, multiple images, motion imagery,
portable device is directed to the address of the stored or
and/or video (each and all of these information types are
known henceforth as “imagery”). Indeed, the imagery can be
captured by more than one electronic imaging device, such as
converted page, etc.)
a digital camera, a camera-equipped mobile telephone, or a
Image Analysis
55
Preferred image analysis techniques are described in the
following, in which FIG. 1 shows the overall processing ?ow
security camera, or multiple such devices. An object or
objects are identi?ed in the imagery via image/object recog
nition techniques (software and/ or hardware). The identity of
and steps. These steps are described in further detail in the
the object(s) is used to look up, in a table/ database, a set of text
keywords search terms, which are then provided to a search
computer, mobile telephone, personal digital assistant, or
following sections.
In FIG. 2, for image capture 10, the user 12 utilizes a
60
other similar device 14 equipped with an image sensor (such
engine. The search engine returns information addresses
as a CCD or CMOS digital camera). The user 12 aligns the
(e.g., in the form of a web page with hyperlinks) that are
pertinent to the objects identi?ed in the imagery. The user
sensor of the image capture device 14 with the object 16 of
interest. The linking process is then initiated by suitable
then accesses information and/ or computing resources based
upon at least one of the information addresses.
Another contemplated embodiment comprises a search
using sign language input. Imagery is captured of a person
means including: the user 12 pressing a button on the device
65
14 or sensor; by the software in the device 14 automatically
recognizing that an image is to be acquired; by user voice
command; or by any other appropriate means. The device 14
US 8,798,322 B2
11
12
captures a digital image 18 of the scene at which it is pointed.
This image 18 is represented as three separate 2-D matrices of
enhancement techniques such as parallel computing and data
indexing techniques. It is also important to note that the loop
pixels, corresponding to the raw RGB (Red, Green, Blue)
representation of the input image. For the purposes of stan
dardizing the analytical processes in this embodiment, if the
comparison, rather than the shared approach shown in the
structures can be implemented independently for each inner
FIGS. 3A and 3B.
device 14 supplies an image in other than RGB format, a
Preferably, parallel processing is used to divide tasks
transformation to RGB is accomplished. These analyses
between multiple CPUs (central processing units) and/or
could be carried out in any standard color format, should the
need arise.
If the server 20 is physically separate from the device 14,
then user acquired images are transmitted from the device 14
to the Image Processor/ server 20 using a conventional digital
network or wireless network means. If the image 18 has been
computers. The overall algorithm may be divided in several
ways, such as:
compressed (e.g. via lossy JPEG DCT) in a manner that
introduces compression artifacts into the reconstructed image
18, these artifacts can be partially removed by, for example,
applying a conventional despeckle ?lter to the reconstructed
Sharing the Outer Loop
In this technique, all CPUs run the entire
algorithm, including the outer loop, but one
CPU runs the loop for the ?rst N cycles,
another CPU for the second N cycles, all
Sharing the Comparison
In this technique, one CPU performs the
loop functions. When the comparisons are
performed, they are each passed to a
simultaneously.
image prior to additional processing.
Image type determination 26 can be accomplished with a
discriminator algorithm which operates on the input image 18
and determines whether the input image contains recogniz
Sharing the database
20
separate CPU to be performed in parallel.
This technique entails splitting database
searches between CPUs, so that each CPU
is responsible for searching one section of
able symbols, such as barcodes, matrix codes, or alphanu
meric characters. If such symbols are found, the image 18 is
the database, and the sections are searched
sent to the decode symbol 28 process. Depending on the
Loop” technique described above.
con?dence level with which the discriminator algorithm ?nds
in parallel by multiple CPUs. This is, in
essence, a form ofthe “Sharing the Outer
25
the symbols, the image 18 also can or alternatively contain an
object of interest and can therefore also or alternatively be
sent to the Object Image branch of the process ?ow. For
example, if an input image 18 contains both a barcode and an
object, depending on the clarity with which the barcode is
Actual implementations can be some combination of the
above techniques that optimizes the process on the available
hardware.
30
detected, the image can be analyzed by both the Object Image
and Symbolic Image branches, and that branch which has the
where data resides to only search in those parts of the database
that contain potential matches. Various forms of indexing
may be used, such as hash tables, data compartmentalization
highest success in identi?cation will be used to identify and
link from the object.
The image can then be analyzed to determine the location,
size, and nature of the symbols in the decode symbol 28. The
symbols are preferably analyzed according to their type, and
35
example of such techniques is, in the shape comparison algo
rithm, if a database is to be searched for an entry with an area
with a value of A, the algorithm would know which database
40
For object images, one can advantageously perform a
“decomposition”, in the input image decomposition step 34,
of a high-resolution input image into several different types of
quanti?able salient parameters. This allows for multiple inde
pendent convergent search processes of the database to occur
45
in parallel, which greatly improves image match speed and
match robustness in the database matching 3 6. The best match
38 from either the decode symbol 28, or the image database
matching 3 6, or both, is then determined. If a speci?c URL (or
other online address) is associated with the image, then an
URL Lookup 40 is performed and the Internet address is
returned by the URL Return 42. Code examples are set forth
in the priority documents, as well as further detail, including
50
entries or data areas have this approximate value and would
not need to search the entire database.
FIG. 4 shows a simpli?ed con?guration of an alternative
analytical technique. Boxes with solid lines represent pro
cesses, software, physical objects, or devices. Boxes with
dashed lines represent information. The process begins with
an object of interest: the target object 100. In the case of
consumer applications, the target object 100 could be, for
example, beverage can, a music CD box, a DVD video box, a
magazine advertisement, a poster, a theatre, a store, a build
ing, a car, or any other object that user is interested in or
wishes to interact with. In security applications the target
object 100 could be, for example, a person, passport, or driv
er’s license, etc. In industrial applications the target object
segmentation, segment group generation, bounding box gen
eration, geometric normalization, wavelet decomposition,
(i.e., data within certain value ranges are stored in certain
locations), data sorting, and database table indexing. An
their content information is extracted. For example, barcodes
and alphanumeric characters will result in numerical and/or
text information.
Another technique employed to maximize speed is data
indexing. This technique involves using a priori knowledge of
55
color cube decomposition, shape decomposition, low-resolu
100 could be, for example, a part in a machine, a part on an
assembly line, a box in a warehouse, or a spacecraft in orbit,
etc.
tion grayscale image generation, grayscale comparison,
The terminal 102 is a computing device that has an “image”
wavelet comparison, color cube comparison, and calculation
capture device such as digital camera 103, a video camera, or
of combined match score.
any other device that an convert a physical object into a digital
FIGS. 3A and 3B show a preferred process ?ow that can 60 representation of the object. The imagery can be a single
image, a series of images, or a continuous video stream. For
occur within a database matching operation. The algorithm is
simplicity of explanation this document describes the digital
imagery generally in terms of a single image, however the
presented here as containing four nested loops with four
parallel processes inside the innermost loop. This structure is
for presentation and explanation only. Any actual implemen
at the innermost layer, can have a different structure in order
invention and this system can use all of the imagery types
described above.
After the camera 103 captures the digital imagery of the
to achieve the maximum bene?t from processing speed
target object 100, image preprocessing 104 software converts
tation, although most likely performing the same operations
65
US 8,798,322 B2
13
14
the digital imagery into image data 105 for transmission to
This is accomplished by analyzing the image data 105 and
comparing the results to other data, representing images of a
plurality of known objects, stored in the database 108, and
recognizing the target object 100 if a representation of target
and analysis by an identi?cation server 106. Typically a net
work connection is provided capable of providing communi
cations with the identi?cation server 106. Image data 105 is
data extracted or converted from the original imagery of the
target object 100 and has information content appropriate for
object 100 is stored in the database 108.
In some embodiments the terminal 102 includes software,
such as a web browser (the browser 110), that receives an
information address, connects to that information address via
a network or networks, such as the Internet, and exchanges
identi?cation of the target object 100 by the object recogni
tion 107, which can be software or hardware. Image data 105
can take many forms, depending on the particular embodi
ment of the invention. Speci?c examples are given in the
information with another computing device at that informa
priority documents.
tion address. In consumer applications the terminal 102 can
The image data 105 is sent from the terminal 102 to the
be a portable cellular telephone or Personal Digital Assistant
identi?cation server 106. The identi?cation server 106
equipped with a camera 103 and wireless Internet connection.
receives the image data 105 and passes it to the object recog
nition 107.
In security and industrial applications the terminal 102 can be
a similar portable hand-held device or can be ?xed in location
and/or orientation, and can have either a wireless or wire line
The identi?cation server 106 is a set of functions that
usually will exist on computing platform separate from the
network connection.
Other object recognition techniques also exist and include
terminal 102, but could exist on the same computing plat
form. If the identi?cation server 106 exists on a separate
computing device, such as a computer in a data center, then
20
methods that store 3-dimensional models (rather than 2-di
mensional images) of obj ects in a database and correlate input
the transmission of the image components 105 to the identi
images with these models of the target object is performed by
?cation server 106 is accomplished via a network or combi
an object recognition technique of which many are available
nation of networks, such a cellular telephone network, wire
less Intemet, Internet, and wire line network. If the
commercially and in the prior art. Such object recognition
techniques usually consist of comparing a new input image to
a plurality of known images and detecting correspondences
identi?cation server 106 exists on the same computing device
as the terminal 102 then the transmission consists simply of a
transfer of data from one software component or process to
another.
Placing the identi?cation server 106 on a computing plat
form separate from the terminal 102 enables the use of pow
25
between the new input image and one of more of the known
images. The known images are views of known objects from
a plurality of viewing angles and thus allow recognition of
2-dimensional and 3-dimensional objects in arbitrary orien
30
erful computing resources for the object recognition 107 and
database 108 functions, thus providing the power of these
FIG. 4 shows the object recognition 107 and the database
108 as separate functions for simplicity. However, in many
embodiments the object recognition 107 and the database 108
computing resources to the terminal 102 via network connec
tion. For example, an embodiment that identi?es objects out
of a database of millions of known objects would be facili
tations relative to the camera 103.
are so closely interdependent that they can be considered a
35
tated by the large storage, memory capacity, and processing
single process.
power available in a data center; it is very dif?cult to have
It is usually desirable that the database 108 be scalable to
enable identi?cation of the target object 100 from a very large
such computing power and storage in a portable device.
plurality (for example, millions) of known objects in the
Whether the terminal 102 and the identi?cation server 106 are
on the same computing platform or separate ones is an archi
database 108. The algorithms, software, and computing hard
40
tectural decision that depends on system response time, num
ber of database records, image recognition algorithm com
puting power and storage available in terminal 102, etc., and
this decision must be made for each embodiment of the inven
tion. Based on current technology, in most embodiments
comparison technique for comparing the image data 105 to
45
these functions will be on separate computing platforms.
The overall function of the identi?cation server 106 is to
50
larly important in cases where a non-metric distance is used in
object recognition 107, because techniques such database
clustering and multi-resolution search can not be possible and
codes or text, in the image.
55
3. Provide the target object information 109 that corre
thus the complete database must be searched by partitioning
the database across multiple CPUs.
sponds to the target object 100. The target object infor
mation 109 usually (depending on the embodiment)
As described above, the object recognition 107 can also
detect identifying marks on the target object 100. For
example, the target object 100 can include an identifying
includes an information address corresponding to the
target object 100.
The object recognition 107 detects and decodes symbols,
In addition to such software techniques, a parallel process
ing computing architecture can be employed to achieve fast
searching of large databases. Parallel processing is particu
The object recognition 107 and the database 108 function
together to:
l. Detect, recognize, and decode symbols, such as bar
2. Recognize the object (the target object 100) in the image.
data stored in the database 108, along with database cluster
ing and multi-resolution distance comparisons. This tech
nique is described in “Fast Exhaustive Multi-Resolution
Search Algorithm Based on Clustering for Ef?cient Image
Retrieval,” by Song, Kim, and Ra, 2000.
determine and provide the target object information 109 cor
responding to the target object 100, based on the image data
105.
ware must be designed to function together to quickly per
form such a search. An example software technique for per
forming such searching quickly is to use a metric distance
60
number or a barcode. This information can be decoded and
used to identify or help identify the target object 100 in the
such as barcodes or text, in the input image. This is accom
database 108. This information also can be passed on as part
plished via algorithms, software, and/or hardware compo
of the target object information 109. If the information is
included as part of the target object information 109 then it
nents suited for this task. Such components are commercially
available (The HALCON software package from MVTec is
an example). The object recognition 107 also detects and
recognizes images of the target object 100 orportions thereof.
65
can be used by the terminal 102 or content server 111 to
identify the speci?c target object 100, out of many such
objects that have similar appearance and differ only in the
US 8,798,322 B2
15
16
identifying marks. This technique is useful, for example, in
speci?c target object 100, the content server 111 connects to
and exchanges information with the target object 100 via a
cases where the target object 100 is an active device with a
network or networks such as the Internet. In this type of
embodiment, the terminal 102 is connected to the content
server 111 and the content server 111 is connected to the
network connection (such as a vending machine) and the
content server establishes communication with the target
object 100. A combination with a Global Positioning System
can also be used to identify like objects by their location.
The object recognition 107 can be implemented in hard
target object 100. Thus, the terminal 102 and target object 100
can communicate via the content server 111. This enables the
user to interact with the target object 100 despite the lack of a
direct connection between the target object 100 and the ter
minal 102.
ware, software, or a combination of both. Examples of each
category and additional details are set forth in one or more of
the priority documents.
In most embodiments the browser 110 will be a web
FIG. 5 shows an embodiment that uses a cellular telephone,
browser, embedded in the terminal 102, capable of accessing
PDA, or such portable device equipped with computational
and communicating with web sites via a network or networks
such as the Internet. In some embodiments, however, such as
capability, a digital camera, and a wireless network connec
those that only involve displaying the identity, position, ori
tion, as the terminal 202 corresponding to the terminal 102 in
FIG. 4. In this embodiment, the terminal 202 communicates
entation, or status of the target object 100, the browser 110
with the identi?cation server 206 and the content server 211
can be a software component or application that displays or
provides the target object information 109 to a human user or
via networks such as a cellular telephone network and the
Internet.
This embodiment can be used for applications such as the
to another software component or application.
In embodiments wherein the browser 11 0 is a web browser,
20
following (“user” refers to the person operating the terminal
the browser 110 connects to the content server 111 located at
202, and the terminal 202 is a cellular telephone, PDA, or
the information address (typically an Internet URL) included
in the target object information 109. This connection is
effected by the terminal 102 and the browser 110 acting in
similar device, and “point and click” refers to the operation of
the user capturing imagery of the target object 200 and initi
ating the transfer of the image data 205 to the identi?cation
concert. The content server 111 is an information server and 25 server 206).
computing system. The connection and information
The user “points and clicks” the terminal 202 at a compact
exchanged between the terminal 102 and the content server
disc (CD) containing recorded music or a digital video disc
(DVD) containing recorded video. The terminal 202 browser
111 generally is accomplished via standard Internet and wire
less network software, protocols (e.g. HTTP, WAP, etc.), and
networks, although any information exchange technique can
30
be used. The physical network connection depends on the
system architecture of the particular embodiment but in most
embodiments will involve a wireless network and the Inter
net. This physical network will most likely be the same net
work used to connect the terminal 102 and the identi?cation
connects to the URL corresponding to the CD or DVD and
displays a menu of options from which the user can select.
From this menu, the user can listen to streaming audio
samples of the CD or streaming video samples of the DVD, or
can purchase the CD or DVD.
35
The user “points and clicks” the terminal 202 at a print
media advertisement, poster, or billboard advertising a
server 106.
movie, music recording, video, or other entertainment. The
The content server 111 sends content information to the
terminal 102 and browser 110. This content information usu
browser 210 connects to the URL corresponding to the adver
tised item and the user can listen to streaming audio samples,
ally is pertinent to the target object 100 and can be text, audio,
video, graphics, or information in any form that is usable by
purchase streaming video samples, obtain show times, or
40
the browser 110 and terminal 102. The terminal 102 and
browser 110 send, in some embodiments, additional informa
For example, the programming could consist of a product
promotion involving a reduced price during a limited time.
tion to the content server 111. This additional information can
be information such as the identity of the user of the terminal
102 or the location of the user of the terminal 102 (as deter
mined from a GPS system or a radio-frequency ranging sys
tem). In some embodiments such information is provided to
the content server by the wireless network carrier.
The user can perform ongoing interactions with the content
server 111. For example, depending on the embodiment of the
invention and the applications, the user can:
45
50
users “point and click” on the television screen at speci?c
times, based on the on-screen content, to register votes, indi
cate actions, or connect to a web site through which they
perform real time interactions with the on-screen program.
The user “points and clicks” on an object such as a con
55
sumer product, an advertisement for a product, a poster, etc.,
the terminal 202 makes a telephone call to the company
selling the product, and the consumer has a direct discussion
with a company representative regarding the company’s
to a credit card.
In some embodiments the content server 111 can reside
within the terminal 102. In such embodiments, the commu
nication between the terminal 102 and the content server 111
users that “point and click” on this television programming
during the promotion are linked to a web site at which they
can purchase the product at the promotional price. Another
example is a interactive television programming in which
Listen to streaming audio samples if the target object 100 is
an audio recording (e.g., compact audio disc).
Purchase the target object 100 via on-line transaction, with
the purchase amount billed to an account linked to the
terminal 102, to the individual user, to a bank account, or
purchase the item or tickets.
The user “points and clicks” the terminal 202 at a television
screen to interact with television programming in real-time.
product or service. In this case the company telephone num
60
ber is included in the target object information 209. If the
target object information 209 also includes the company URL
does not occur via a network but rather occurs within the
then the user can interact with the company via both voice and
terminal 102.
In embodiments wherein the target object 100 includes or
is a device capable of communicating with other devices or
computers via a network or networks such as the Internet, and
Internet (via browser 210) simultaneously.
65
The user “points and clicks” on a vending machine (target
object 200) that is equipped with a connection to a network
such as the Internet and that has a unique identifying mark,
wherein the target object information 109 includes adequate
such as a number. The terminal 202 connects to the content
identi?cation (such as a sign, number, or barcode) of the
server 211 of the company that operates the vending machine.