USOO8798322B2 (12) United States Patent (10) Patent No.: Boncyk et al. (54) (45) Date of Patent: OBJECT INFORMATION DERIVED FROM OBJECT IMAGES . (71) _ (51) Int. Cl. G06K 9/00 (52) U.S. Cl. ApphcamZ Nam H°ldings 1P1 LLC, Culver CIYY’ (58) Field of Classi?cation Search See application ?le for complete search history. Ronald H. Cohen, Pasadena, CA (US) (73) Assignee: Nant Holdings IP, LLC, Culver City, CA (US) (56) References Cited U.S. PATENT DOCUMENTS Subject to any disclaimer, the term of this patent is extended or adjusted under 35 4,947,321 A 4,991,008 A U.S.C. 154(b) by 0 days. (21) Appl.No.: 13/971,758 DE EP 10050486 A1 0614559 B1 4/2002 1/1999 (Continued) Prior Publication Data US 2013/0336533 A1 OTHER PUBLICATIONS Dec. 19, 2013 Arai T., et al., “PaperLink: A Technique for Hyperlinking from Real Paper to Electronic Content,” CHI 97 Electronic Publications: Papers, Conference on Human Factors In Computer Systems, Related US. Application Data (60) FOREIGN PATENT DOCUMENTS Aug. 20, 2013 (65) 8/1990 Spence et a1. 2/1991 Nama (Continued) This patent is subject to a terminal dis claimer. (22) Filed: (2006.01) USPC ....................... .. 382/305, 181, 103, 704/1410 (72) Inventors: Wayne C. Boncyk, Evergreen, CO (US); Notice: *Aug. 5, 2014 USPC ........................................................ .. 382/103 CA (Us) (*) US 8,798,322 B2 Division of application No. 13/686,851, ?led on Nov. 27, 2012, now Pat. No. 8,588,527, which is a continuation of application No. 13/207,211, ?led on Aug. 10, 2011, now Pat. No. 8,331,679, which is a division of application No. 13/037,330, ?led on Feb. 28, 2011, now Pat. No. 8,218,873, which is a division Atlanta, Georgia, Mar. 22-27, 1997, pp. 327-334. (Continued) Primary Examiner * Ishrat I Sherali (74) Attorney, Agent, or Firm * Fish & Tsang, LLP ofapplication No. 12/568,130, ?led on Sep. 28, 2009, now Pat. No. 7,899,252, which is a division of application No. 11/204,901, ?led on Aug. 15, 2005, now Pat. No. 7,680,324, which is a continuation-in-part of application No. 09/992,942, ?led on Nov. 5, 2001, now Pat. No. 7,016,532. (60) Provisional application No. 60/317,521, ?led on Sep. 5, 2001, provisional application No. 60/246,295, ?led on Nov. 6, 2000, provisional application No. 60/ 630,524, ?led on Nov. 22, 2004, provisional application No. 60/625,526, ?led on Nov. 4, 2004. 4321. mm, mm“ (57) ABSTRACT Search terms are derived automatically from images captured by a camera equipped cell phone, PDA, or other image cap turing device, submitted to a search engine to obtain infor mation of interest, and at least a portion of the resulting information is transmitted back locally to, or nearby, the device that captured the image. 40 Claims, 8 Drawing Sheets Search En in 4341 US 8,798,322 B2 Page 2 (56) References Cited U.S. PATENT DOCUMENTS 5,034,812 5,241,671 5,259,037 5,497,314 5,576,950 5,579,471 5,594,806 5,615,324 5,625,765 5,682,332 5,724,579 5,742,521 5,742,815 5,751,286 5,768,633 5,768,663 5,771,307 5,815,411 5,893,095 5,894,323 5,897,625 5,911,827 5,915,038 5,917,930 5,926,116 5,933,823 5,933,829 5,933,923 5,945,982 5,971,277 5,978,773 5,982,912 5,991,827 5,992,752 6,009,204 6,031,545 6,037,936 6,037,963 6,038,333 6,045,039 6,055,536 6,061,478 6,064,398 6,072,904 6,081,612 6,098,118 6,108,656 6,144,848 6,173,239 6,181,817 6,182,090 6,199,048 6,202,055 6,208,353 6,208,749 6,208,933 6,256,409 6,278,461 6,286,036 6,307,556 6,307,957 6,393,147 6,396,475 6,396,537 6,396,963 6,404,975 6,405,975 6,411,725 6,414,696 6,430,554 6,434,561 6,453,361 6,490,443 6,501,854 6,502,756 >D 7/1991 8/1993 11/1993 3/1996 11/1996 11/1996 1/1997 3/1997 4/1997 10/1997 3/1998 4/1998 4/1998 5/1998 6/1998 6/1998 6/1998 9/1998 4/1999 4/1999 4/1999 6/1999 6/1999 6/1999 7/1999 8/1999 8/1999 8/1999 8/1999 10/1999 11/1999 11/1999 11/1999 11/1999 12/1999 2/2000 3/2000 3/2000 3/2000 4/2000 4/2000 5/2000 5/2000 6/2000 6/2000 8/2000 8/2000 11/2000 1/2001 1/2001 1/2001 3/2001 3/2001 3/2001 3/2001 3/2001 7/2001 8/2001 9/2001 10/2001 10/2001 5/2002 5/2002 5/2002 5/2002 6/2002 6/2002 6/2002 7/2002 8/2002 8/2002 9/2002 12/2002 12/2002 1/2003 Rawlings Reed et al. Plunk Novak Tonomura et al. Barber et al. Colbert Kuboyama Ellenby et al. Ellenby et al. Suzuki Ellenby et al. Stern Barber et al. Allen et al. Lin Lu et al. Ellenby et al. Jain et al. Kain et al. Gustin et al. Heller Abdel-Mottaleb et al. Kayani et al. Kitano et al. Cullen et al. Durst et al. Catlos et al. Higashio et al. Cragun et al. Hudetz et al. Fukui et al. Ellenby et al. Wilz, Sr. et al. Ahmad Ellenby et al. Ellenby et al. Denton et al. Wang Stinson et al. Shimakawa et al. Kanoh et al. Ellenby et al. Desai et al. Gutkowicz-Krusin et al. Ellenby et al. Durst et al. Walsh et al. Ellenby Zabih et al. Peairs Hudetz et al. Houvener et al. Ayer et al. Gutkowicz-Krusin et al. Lazar Wang Ellenby et al. Rhoads Ellenby et al. Gutkowicz-Krusin et al. Danneels et al. Ellenby et al. Squilla et al. Shaffer et al. Bopardikar et al. Sankrithi et al. Rhoads Ellenby et al. Rothschild Durst, Jr. et al. Morris Freeny, Jr. Konishi et al. Faahraeus 6,510,238 6,522,292 6,522,772 6,522,889 6,526,158 6,532,298 6,533,392 6,535,210 6,542,933 6,563,959 6,567,122 6,578,017 6,597,818 6,601,026 6,650,794 6,651,053 6,658,389 6,674,923 6,674,993 6,675,165 6,689,966 6,690,370 6,691,914 6,711,278 6,714,969 6,724,914 6,738,630 6,744,935 6,748,122 6,765,569 6,766,363 6,771,294 6,801,657 6,804,726 6,842,181 6,853,750 6,856,965 6,865,608 6,866,196 6,882,756 6,885,771 6,912,464 6,925,196 6,950,800 6,956,593 6,968,453 6,974,078 6,985,240 6,993,573 6,996,251 7,002,551 7,016,532 7,016,889 7,016,899 7,027,652 7,031,496 7,031,536 7,031,875 7,050,653 7,053,916 7,062,454 7,103,772 7,119,831 7,127,094 7,143,949 7,167,164 7,175,095 7,224,995 7,245,273 7,254,548 7,295,718 7,301,536 7,305,354 7,309,015 7,333,947 7,345,673 7,353,182 7,353,184 7,353,990 1/2003 2/2003 2/2003 2/2003 2/2003 3/2003 3/2003 3/2003 4/2003 5/2003 5/2003 6/2003 7/2003 7/2003 11/2003 11/2003 12/2003 1/2004 1/2004 1/2004 2/2004 2/2004 2/2004 3/2004 3/2004 4/2004 5/2004 6/2004 6/2004 7/2004 7/2004 8/2004 10/2004 10/2004 1/2005 2/2005 2/2005 3/2005 3/2005 4/2005 4/2005 6/2005 8/2005 9/2005 10/2005 11/2005 12/2005 1/2006 1/2006 2/2006 2/2006 3/2006 3/2006 3/2006 4/2006 4/2006 4/2006 4/2006 5/2006 5/2006 6/2006 9/2006 10/2006 10/2006 12/2006 1/2007 2/2007 5/2007 7/2007 8/2007 11/2007 11/2007 12/2007 12/2007 2/2008 3/2008 4/2008 4/2008 4/2008 Haycock Ellenby et al. Morrison et al. Aarnio Goldberg Cambier et al. Koitabashi Ellenby et al. Durst, Jr. et al. Troyanker Anderson et al. Ebersole et al. Kumar et al. Appelt et al. ................... .. 704/9 Aoki Rothschild Alpdemir .................... .. 704/275 Shih et al. Tarbouriech Rothschild Wiebe Ellenby et al. Isherwood et al. Gu et al. Klein et al. Brundage et al. Ashmore Choi et al. Ihara et al. Neumann et al. Rothschild Pulli et al. Cieplinski Ellenby et al. Acharya Aoki Stinson et al. Hunter Rathus et al. Bober Takahashi Parker Kass et al. McIntyre et al. Gupta et al. Doyle et al. Simon Benke et al. Hunter ........................ .. 709/218 Malone et al. Azuma et al. Boncyk et al. Bazoon Stern et al. .......................... .. 1/1 I’anson Shimano et al. Kajiwara Ellenby et al. Edso et al. Kobayashi et al. Giannini et al. Jorgensen et al. Ohto et al. Elbaum et al. Hannigan Ericson et al. Pettersson et al. Rhoads Eberl et al. Tannenbaum ................ .. 705/ 18 Park et al. Ellenby et al. Rodriguez et al. Frantz et al. Wiebe et al. Ericson et al. Missinhoun et al. Kirshenbaum et al. Elliot et al. ...... .. 705/7.12 US 8,798,322 B2 Page3 (56) References Cited 2005/0024501 A1 2/2005 Ellenby et a1. U_g_ PATENT DOCUMENTS 2005/0102233 A1 2005/0162523 A1 5/2005 Parketal. 7/2005 Darrell et a1. 2005/0162532 A1 2005/0185060 A1 2005/0206654 A1 7/2005 Toyoda 8/2005 Neven, Sr. et al. 9/2005 Vaha-Sipila 7,356,705 B2 7,362,922 B2 4/2008 Ting 4/2008 Nishiyama et al‘ 7376 645 B2* 5/2008 Bernard ,,,,,,,,,,,,,,,,,,,,,,,,, N 1/1 2005/0252966 A1 733832209 B2 @2008 Hudetz et a1, 2006/0008124 A1 1/2006 Ewe etal. 7,410,099 B2 7,427,980 B1 7,430,588 B2 8/2003 Fukasawa et 31, 9/2008 Partridge et a1. 9/2008 Hunter 2006/0038833 A1 2006/0161379 A1 2006/0190812 A1 2/2006 Mallinson et a1. 7,477,909 B2* 7,526,440 B2* 1/2009 Roth ......... .. . 455/466 4/2009 Walker etal. .............. .. 705/26.7 2006/0223635 A1 2007/0109619 A1 7,558,595 B2 7/2009 Angelhag 2007/0146391 A1 6/2007 Pentenriederetal. 7,564,469 B2 7/2009 Cohen 2007/0182739 A1 8/2007 Platonov et a1. 7,580,061 B2 7,595,816 B1 8/2009 Toyoda 9/2009 Ennghtetal 2008/0021953 A1 2008/0157946 A1 7,641,342 7,680,324 7,696,905 7,720,436 7,737,965 1/2010 3/2010 4/2010 5/2010 EberletaL Boncyk et a1. Ellenby et a1. Hamynen et a1. 6/2010 Alter et a1, 2008/0243721 2008/0279481 2009/0027337 2010/0045933 7,756,755 B2 * 7/2010 Ghosh et a1. ............... .. 705/26.1 2010/0188638 A1 7,764,808 7,765,126 B2 7/2010 Zhu Hudetz et a1, et al‘ 2011/0173100 2012/0002872 A1 7/2011 1/2012 Boncyk 6t et 31. 7,768,534 B2 7,769,228 B2 7,774,283 B2 8/2010 Pentenrieder et a1. 8/2010 Bahlmann et al, 8/2010 Das et a1. 2012/0072353 A1 2012/0095857 A1 2012/0231887 A1 3/2012 B99119 9t al~ 4/2012 McKelvey et a1. 9/2012 Lee et a1. B2 B2 B2 B2 B2 7,775,437 B2 A1 A1 A1 A1 11/2005 Kulas 7/2006 Ellenby et a1. 8/2006 Ellenby et a1. 10/2006 Rosenberg 5/2007 Eberletar, 1/2008 Gil 7/2008 Eberletal. 10/2008 Joao 11/2008 Ando 1/2009 Hildreth 2/2010 Eberl et 211. 7/2010 Eberl et al~ 8/2010 Cohen 7,830,417 B2 * 11/2010 Liu et a1. ................ .. 348/211.99 7,845,558 B2 12/2010 Beemer et a1. 7,889,193 B2 2/2011 Platonov 6t EP 0967574 A2 12/1999 7,896,235 B2 * 3Z2011 Rahrillach?ndran ........... .. 235/379 EP 1012725 Al 6/2000 7,916,138 B2 3 2011 Jo et . FOREIGN PATENT DOCUMENTS 8,090,616 B2 1/2012 Proctor, Lemay CtJr.31.et a1. EP 1355258 A2 10/2003 8,121,944 B2 2/2012 Norman 6t 31. EP 2264669 A2 12/2010 8,131,118 B1 * 3/2012 Jing 6t 31. .................... .. 382/305 JP 86314297 A 1/1988 8,131,595 B2 3/2012 Lee et JP H09231244 A 9/1997 8,218,874 B2 8,219,146 B2: 7/2012 Boncyk 6t 7Z2012 Connors 6t 31. ............. .. JP JP H10289243 A H11265391 A 10/1998 9/1999 8,255,291 B1 8 2012 Nair ............. .. . 705 26.64 8,312,168 B2 * 11/2012 Rhoads et a1. .............. .. 709/245 i; 588} 18888; 2 2888} 8,320,615 B2 8,326,031 B2 JP JP 2001256500 A 2001265970 A 9/2001 9/2001 8,447,066 B2 * 8,494,274 B2 * 8,588,527 B2 8,626,602 B2 * 11/2012 H3an3 6t 31. 12/2012 Boncyk 6t 5/2013 King et 31. .................. .. 382/100 7Z2013 Badhaiudezln 6t 31. 112013 Boncy et . . / 382 181 1/2014 George ...................... .. 705/26.1 JP 2002197103 A 7/2002 JP 2002297648 A 10/2002 188888 i; 3888838828 2 2001;0011276 A1 2001 0032252 A1 8Z2001 Durst, Jralet 31. 10 2001 Durst et . JP 2004005314 A 1/2004 2001/0044824 A1 11/2001 Hunter et a1. i; 3883888881 2 8588: 2001/0047426 A1 2001/0053252 A1 11/2001 Hunter 12/2001 Creque JP JP 2005011180 A 2005038421 A 1/2005 2/2005 2002/0019819 A1 2/2002 Sekiguchi 6t 31. JP 2005509219 A 4/2005 2002;0048403 A1 2002 0055957 A1 4Z2002 GEerreri 5 2002 O sawa JP 2007509392 A 4/2007 2002/0084328 A1 7/2002 Kim 88 8333888 2} 5888; 2002 0090132 A1 7 2002 Boncy et 2002/0102966 A1 8/2002 Lev et a1. . 88 88118852 2} 188888 2002/0103813 A1 2002/0124188 A1 8/2002 Frigon 9/2002 Sherman 6t 31. WO WO 9942946 A2 9942947 A2 8/1999 8/1999 2002/0150298 A1 10/2002 R3j3g0p31 6t WO 9942946 A3 10/1999 2002;0156866 A1 2002 0163521 A1 10Z2002 ESillhniider 112002 en y eta.1 WO 9942947 A3 12/1999 2002/0167536 A1 11/2002 Valdes et a1. 88 8881888 2% 13888? 2003/0064705 2003/0095681 2003/0116478 2003/0164819 2004/0208372 2005/0015370 4/2003 5/2003 6/2003 9/2003 10/2004 1/2005 W0 W0 WO WO WO WO A1 A1 A1 A1 A1 A1 Desiderio Burg et 31. Laskowski Waibel Boncyk et al. Stavely et a1. 0149056 0163487 0171282 0173603 0201143 02059716 A1 A1 A1 A1 A2 A2 7/2001 8/2001 9/2001 10/2001 1/2002 8/2002 US 8,798,322 B2 Page 4 (56) References Cited Klinker G., “Augmented Maintenance of Powerplants: A Prototyping Case Study of a Mobile AR System,” International Symposium on FOREIGN PATENT DOCUMENTS Augmented Reality, 2001, IEEE Computer Society, pp. 124-136. Levine J .M., “Real-Time Target and Pose Recognition for 3-D WO WO WO 02073818 A1 02082799 A2 03041000 A1 9/2002 10/2002 5/2003 OTHER PUBLICATIONS Graphical Overlay,” Master’s thesis, 1997, 48 pages. Ljungstrand P., et al., “WebStickers: Using Physical Tokens to Access, Manage and Share Bookmarks on The Web,” Proceedings of the 2000 ACM Conference on Designing Augmented Reality Envi ronments (DARE 2000), 2000, pp. 23-31. Rekimoto J ., et al., “Augment-able Reality: Situated Communication Bulman J ., et al., “Mixed Reality Applications In Urban Environ Through Physical and Digital Spaces,” Wearable Computers, Second ments,” BT Technology Journal, 2004, vol. 22 (3), pp. 84-94. International Symposium, 1998, pp. 68-75. Rekimoto J ., et al., “CyberCode: Designing Augmented Reality Environments With Visual Tags,” Proceedings of the 2000 ACM Carswell J .D., et al., “An Environment for Mobile Context-Based Hypermedia Retrieval,” IEEE: Proceedings of the 13th International Workshop on Database and Expert Systems Applications, 1529 4188/02, 2002, 5 pages. Chang S.F., et al., “Visual Information Retrieval from Large Distrib uted Online Respo sitories,” Communication of Association for Com puting Machinery, ISSN:0001-0782, 1997, vol. 40 (12), pp. 64-71. Chang W., et al., “Ef?cient Resource Selection in Distributed Visual Information Systems,” ACM Multimedia, 1997, pp. 203 -213. Diverdi S., et al., “ARWiniA Desktop Augmented Reality Window Manager,” UCSB Tech Report Dec. 2003, University of California Santa Barbara, May 2003, 7 pages. Diverdi S., et al., “Level of Detail Interfaces,” Proc. ISMAR 2004, IEEE/ACM IHyf Syrnp on Mixed and Augmented Reality, Arlington, Virginia, 2004, 2 pages. Feiner S., et al., “A Touring Machine: Prototyping 3D Mobile Aug mented Reality Systems for Exploring The Urban Environment,” Personal Technologies, 1997, vol. 1 (4), pp. 208-217. Fernandez F., “Responsive Environments: Digital Objects in the Landscape,” Thesis submitted to Department of Landscape Architec ture, University of Manitoba, Winnipeg, Manitoba, Mar. 2004, 124 pages. Geiger C., et al., “Mobile AR4ALL,” Proceedings of the IEEE and ACM Intl Symposium on Augmented Reality (ISAR’ 01), Oct. 29-30, 2001, Columbia University, New York, 2 pages. Gevers T., et al., “PicToSeek: Combining Color and Shape Invariant Conference on Designing Augmented Reality Environments, 2000, pp. 1-10. Rekimoto J ., et al., “The World Through the Computer: Computer Augmented Interaction With Real World Environments,” ACM Sym posium on User Interface Software and Technology, 1995, pp. 29-36. Rekimoto J ., “NaviCam: A Palmtop Device Approach to Augmented Reality,” Fundamentals of Wearable Computers and Augmented Reality, 2001, Bar?eld and Caudell, Eds., pp. 353-377. Rohs M., et al., “A Conceptual Framework for Camera Phone-Based Interaction Techniques,” Pervasive Computing. Lecture Notes in Computer Science, 2005, vol. 3468, pp. 171-189. Siltanen S ., et al., “Implementing a Natural User Interface for Camera Phones Using Visual Tags,” Proceedings of the 7th Australasian User interface conference, 2006, vol. 50, pp. 113-116. Smailagic A, et al., “Metronaut: AWearable Computer With Sensing and Global Communication Capabilities,” First International Sym posium on Wearable Computers, Oct. 13-14, 1997, Cambridge, Mas sachusetts; Digest of Papers, pp. 116-122. Smith J .R., et al., “VisualSEEk: A Fully Automated Content-Based Image Query System,” Proceedings of the fourth ACM international conference on Multimedia, ACM New York, 1996, pp. 87-98. Starner T., et al., “Augmented Reality Through Wearable Comput ing,” Presence: Teleoper. Virtual Environ. 6, 4, Massachusetts Insti tute of Technology, 1997, 24 pages. Features for Image Retrieval,” IEEE Transactions on Image Process Supplementary European Search Report for Application No. ing, 2000, vol. 9 (1), pp. 102-119. Haritaoglu I., “InfoScope: Link from Real World to Digital Informa tion Space,” IBM Almaden Research, UbiComp, Lecture Notes in Computer Science, 2001, vol. 2201, pp. 247-255. Hollerer T., et al., “Chapter Nine: Mobile Augmented Reality,” in: Telegeoinformatics: Location Based Computing and Services, Karimi H., eds., Taylor & Francis Books, Ltd., 2004, Chapter 9, 39 EP02778730, mailed on May 14, 2007, 3 pages. Supplementary European Search Report for Application No. pages. EP06801326, mailed on Aug. 12, 2008, 8 pages. Suzuki G., et al., “u-Photo: Interacting with Pervasive Services Using Digital Still Images,” Pervasive Computing. Lecture Notes in Com puter Science, vol. 3468, 2005, pp. 190-207. Toye E., et al., “Interacting with Mobile Services: An Evaluation of Camera-Phones andVisual Tags,” in: Personal and Ubiquitous Com Iwamoto T., et al., “Digital Safari Guidebook With Image Retrieval,” puting, vol. 11 (2), Springer-Verlag, London Limited, 2007, pp. International Conference on Advances in Mathematical Computa 97-106. tions and Statistical Computing, 1999, vol. 2, pp. 1011-1012. Iwamoto T., et al., “u-Photo: A Design and Implementation of a Snapshot Based Method for Capturing Contextual Information,” The Second International Conference on Pervasive Computing Pervasive, 2004, Advances in Pervasive Computing, LinZNienna, Austria, 6 Wagner D., et al., “First Steps Towards Handheld Augmented Real ity,” Vienna University of Technology, Proceedings of Seventh IEEE International Symposium on Wearable Computers, Oct. 18-21, 2003, 9 pages. Symposium on Wearable Computers, 1997, IEEE Computer Society, Yang J., et al., “Smart Sight: A Tourist Assistant System,” Digest of Papers, Third International Symposium on Wearable Computers, Oct. 18-19, 1999, San Francisco, California, pp. 73-78. Yeh T., et al., “Searching the Web with Mobile Images for location Recognition,” IEEE: Proceedings of the 2004 IEEE Computer Soci pp. 138-145. ety Conference on Computer Vision and Pattern Recognition Kangas K., et al., “Using Code Mobility to Create Ubiquitous and Active Augmented Reality in Mobile Computing,” Mobicom, 1999, Seattle, Washington, pp. 48-58. (CVPR’04), 1063-6919/04, 2004, 6 pages. Zhang X., et al., “Taking AR Into Large Scale Industrial Environ ments: Navigation and Information Access With Mobile Computers,” Proceedings of the IEEE and ACM Intl Symposium on Augmented Reality, 2001, pp. 179-180. pages. J ebara T., et a1 ., “Stochasticks: Augmenting The Billiards Experience With Probabilistic Vision and Wearable Computers,” International Kato H., et al., “Marker Tracking and HMD Calibration for aVideo Based Augmented Reality Conferencing System,” Proceedings of the 2nd IEEE and ACM Intl Workshop on Augmented Reality, San Fran cisco, California, 1999, pp. 85-94. * cited by examiner US. Patent Aug. 5, 2014 Sheet 1 0f8 INPUT HVIAGE US 8,798,322 B2 /1 0 CAPTURE OBJECT ETERMINE HVIAGE _ SYMBOLIC IMAGE IlVlAGE ¢ INPUT IMAGE 26 l DECOMPOSITION \ 3 4 DECODE 2 s / DATABASE MATCHING \3 6 Jr 3 8 \\ FIG. 1 SELECT BEST MATCH 40\ 4 2\ $ URL LOOKUP i URL RETURN SYMBOL US. Patent Aug. 5, 2014 Sheet 2 0f8 US 8,798,322 B2 § 5 f8 \ FIG.2 12\Q [I]\18 hV J Q V/ K” US. Patent Aug. 5, 2014 Sheet 3 0f8 US 8,798,322 B2 START > FOR EACH INPUT IIVIAGE SEGMENT GROUP FOR EACH OBJECT IN DATABASE —__> FOR EACH _I_EW OF l——_ ——> OBJECT FOR EACH SEGIVIENT GROUP IN THIS VIEW ——> r I V W GREYSCALE CONTPARISON COLORCUBE COMPARISON | I SHAPE COMPARISON +_._ WAVELET CONIPARISON ¢ FIG. 3A US. Patent Aug. 5, 2014 Sheet 4 0f8 US 8,798,322 B2 l CALCULATE CONIBINED MATCH SCORE NEXT SEGMENT GROUP IN THIS DATABASE VIEW NEXT VIEW OF THIS DATABASE OBJECT NEXT OBJECT IN DATABASE NEXT INPUT HVIAGE SEGMENT GROUP FIG. 3B US. Patent Aug. 5, 2014 Sheet 5 0f8 TARGET CONTENT SERVER OBJECT 102 US 8,798,322 B2 100 103 CAMERA BROWSER IMAGE PROCESSING 104 110 TERl‘viINAL Y TARGET IMAGE DATA OBJECT INFORMATION 109 108 (HHECT 106 / RECOGNITION I‘m“> DATABASE IDENTIFICATION SERVER US. Patent Aug. 5, 2014 Sheet 6 0f8 TARGET OBJECT 202 \l \ US 8,798,322 B2 CONTENT SERVER 200 211 / I FIG. 5 203 CAMERA l/ i I BROWSER IMAGE ‘ PROCESSING 204 210 TERMINAL Y IMAGE DATA 205 207 209 OBJECT RECOGNITION TARGET OBJECT INFORMATION 208 // DATABASE IDENTIFICATION SERVER US. Patent Aug. 5, 2014 Sheet 7 0f8 TARGET OBJECT 302 US 8,798,322 B2 SPACECRAFT DATA SYSTEM \300 3 1 0/ 1 \ Y 303 IMAGE PROCESSING \ Flg_ 6 TARGET OBJECT INFORMATION IMAGE DATA 305 3 0‘7 \v 3 0\8 A / \ OBJECT RECOGNITION DATABASE IDENTIFICATION SERVER US. Patent Aug. 5, 2014 US 8,798,322 B2 Sheet 8 0f 8 460 412 Search Ténn $421A 425 FIG. 7 Search Term Results Set Index 432 ’2. WIN l l l Search Engine 430 US 8,798,322 B2 1 2 OBJECT INFORMATION DERIVED FROM OBJECT IMAGES providing the user with an entry code that can be used to bypass the long lines of ordinary ticket purchasers. In such embodiments it was speci?cally contemplated that analysis of the images could be performed locally (i.e. on the cell phone, PDA or other device capturing the image), distally This application is a divisional ofSer. No. 13/686,851, ?led Nov. 27, 2012, which is a continuation of Ser. No. 13/207, 211, ?ledAug. 10, 2011 andissuedDec. 11, 2012 as US. Pat. No. 8,331,679, which is a divisional of Ser. No. 13/037,330 ?led Feb. 28, 2011 and issued Jul. 10, 2012 as US. Pat. No. 8,218,873, which is a divisional of Ser. No. 12/568,130 ?led on Sep. 28, 2009 and issued Mar. 1, 2011 as US. Pat. No. 7,899,252, which is a divisional of Ser. No. 11/204,901 ?led Aug. 15, 2005 and issued Mar. 16, 2010 as US. Pat. No. 7,680,324, which is a continuation-in-part ofSer. No. 09/992, at a server, or more preferably using some combination of the two. It was also contemplated that any available database could be accessed to provide the returned information, including publicly accessible databases on the Internet. It was not appreciated, however, that one could integrate these con cepts with the searching capabilities of standard Search Engines. In the 1990s Yahoo !TM introduced the idea of indexing web pages accessible on Internet, and providing a Search Engine 942 ?led Nov. 5, 2001 andissued as Mar. 21, 2006 as US. Pat. that to access the index. Since that time dozens of other No. 7,016,532, which claims priority to provisional applica tion No. 60/317,521 ?led Sep. 5, 2001 and provisional appli cation No. 60/246,295 ?led Nov. 6, 2000. US. patent appli cation Ser. No. 11/204,901 ?led Aug. 15, 2005 and issued Mar. 16, 2010 as US. Pat. No. 7,680,324 also claims priority to provisional application No. 60/ 630,524 ?led Nov. 22, 2004 and provisional application Ser. No. 60/ 625,526 ?led Nov. 4, 2004. These and all other referenced patents and applications are incorporated herein by reference in their entirety. Where a searching systems have been developed, which use all man ner of various search methods, algorithms, hardware and/or software. All such systems and methods that accept user inputs of Key Information, and then utilize such Key Infor 20 be a natural person, as well as a device (computing or other wise), algorithm, system, organization, or any other entity. In searching for information, a Search Engine can utilize any de?nition or use of a term in a reference that is incorporated by reference is inconsistent or contrary to the de?nition of that 25 term provided herein, the de?nition of that term provided herein is deemed to be controlling. FIELD OF THE INVENTION 30 The ?eld of the invention is digital imaging. BACKGROUND Several years ago the present inventors pioneered the con 35 cept of using digitally captured images to identify objects within the images, and then using such identi?cations to retrieve information from various databases. Examples include: Using a local device (cell phone, digital camera, PDA or mation to provide the user with information of interest, are referred to herein as Search Engines. The user, of course, can suitable search domain, including for example: A database (including for example a relational database, an object database, or an XML database). A network of resources including for example web pages accessible within the Internet; and A public or private collection of documents or information (e.g., documents, information, and/or messages of a company or other organization(s)) such as that main tained by LEXISTM. In a typical search, Key Information is provided to the Search Engine in the form of key words comprising text, numbers, strings, or other machine-readable information types. The Search Engine then searches its indices of web pages for matches, and returns to the user a hyperlinked 40 listing of Internet Uniform Resource Locators (“URLs”), as well as some brief display of context in which the key word(s) other device) to capture an image of an object in an art are used. The information of interest can sometimes be found museum, identifying the object from the image data, and in the hyperlinked listing, but is more frequently found by linking directly to the listed web pages. Providing Key Information to Search Engines in the form then providing the user with information regarding the object (i.e., about or relating to the object); Using a local device (cell phone, digital camera, PDA or 45 of text strings has inherent dif?culties. It involves strategy in other device) to capture an image of an automobile as it the selection of the text to be entered, and even with respect to drives along a road, identifying the make and model the format of the keywords (for example using wildcards). Another dif?culty is that small computing and/or telephony devices (e. g. telephones, both mobile and non-mobile), have from the image data, and then providing a user with a link to a website relating to that particular make and model; Using a local device (cell phone, digital camera, PDA or other device) to capture an image of a bar code, logo, or other indicia in a magazine, using information contained in the indicia to identify a product, and providing a telephone number or other contact information relating to that product; Using a local device (cell phone, digital camera, PDA or 50 SUMMARY OF THE INVENTION 55 The present invention provides apparatus, systems and methods in which: (a) a digital photograph, video, MPEG, AVI, or other image is captured using a camera equipped cell phone, PDA, or other image capturing device; (b) key words other device) to photograph a billboard of a restaurant, identifying the restaurant from a barcode, special target, written language, or other information contained in the photograph, and using that information to access a data base to provide the user with restaurant’s location, menu, or telephone number; and Using a local device (cell phone, digital camera, PDA or other device) to capture an image of a sign at a sports small and/ or limited keyboards, thus making text entry dif? cult. or other search criteria are automatically extracted or derived 60 from image; (c) the search criteria are submitted to a Search Engine to obtain information of interest; and (d) at least a portion of the resulting information is transmitted back locally to, or nearby, the device that captured the image. stadium, using information extracted from the image to Some images so utilized will include symbolic content that is suf?cient in and of itself to be relatively non-ambiguous. Such symbolic content, for example, can be a telephone num automatically purchase an entry ticket for the user, and ber or a web-site address. In such instances the symbolic 65 US 8,798,322 B2 4 3 Another object is to provide communication means with content search criteria can advantageously be utilized as a operative devices without requiring a public connection literal in the search criteria. In other instances signi?cant additional processing can be needed. For example, an image of an automobile will likely need to be processed to determine the make and model, and that information (e.g. MercedesTM SSOOTM) can then be transmitted to the Search Engine to be used as key words for a search. It is also contemplated that processing of some images will result in only best guesses. therewith. Various other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of preferred embodiments of the invention, along with the accompanying drawings in which like numerals represent like components. Thus, a side view of an automobile can not be analyzable into a particular make and model, and in that case the system can provide more generic terms such as SUV or automobile. BRIEF DESCRIPTION OF THE DRAWING FIG. 1 is a schematic block diagram top-level algorithm In general, the present invention provides technology and ?owchart; processes that can accommodate linking objects and images FIG. 2 is an idealized view of image capture; FIGS. 3A and 3B are a schematic block diagram ofprocess to information via a network such as the Internet, which require no modi?cation to the linked object. Traditional meth details of the present invention; ods for linking objects to digital information, including FIG. 4 is a schematic block diagram of a different expla applying a barcode, radio or optical transceiver or transmitter, nation of invention; or some other means of identi?cation to the object, or modi fying the image or object so as to encode detectable informa tion in it, are not required because the image or object can be identi?ed solely by its visual appearance. The users or devices can even interact with objects by “linking” to them. For example, a user can link to a vending machine by “pointing and clicking” on it. His device would be connected over the Internet to the company that owns the vending machine. The company would in turn establish a connection to the vending machine, and thus the user would have a communication channel established with the vending machine and could interact with it. FIG. 5 is a schematic block diagram similar to FIG. 4 for 20 FIG. 6 is a schematic block diagram for spacecraft appli cations. FIG. 7 is a schematic of a system in which a local device 25 30 FIGS. 1-6 are copied from the priority PCT application, It is especially preferred that algorithms are chosen such that 35 Search Engine-Related Embodiments In FIG. 7 a system 400 generally comprises a portable imaging device 410, a distal server 420, an electronic com images, and can eliminate reliance on a distal server that might be oversubscribed. Thus, some or all of the image 40 ing of symbols detected in the image can be distributed arbi trarily between the mobile (client) device and the server. In other words, some processing can be performed in the client munications network 425, and a search engine 430. In general, the portable device 410 captures an image 412 of an object 415; and transmits information 413 regarding the image to the server 420. At least one of the device 410 and the server 420 derives a search term 421A, 421B from at least one of the image 412 and the transmitted information 413, respec device and some in the server, without speci?cation of which particular processing is performed in each, or all processing DETAILED DESCRIPTION PCT/U S02/ 35407 ?led Nov. 5,2002. Discussion of that those ?gures is set forth later in the application. rithms are preferred over slower and less accurate algorithms. processing, including image/object detection and/or decod captures and image, a search term is automatically derived from an image, is submitted to a search engine to produce a results set, and information from the results set is sent back to the device. The present invention contemplates any suitable decompo sition algorithms. Clearly, faster and more accurate algo at least some processing can take place locally to the device that captures the image. Such processing can in many instances eliminate the need to wirelessly transmit detailed cellular telephone and personal data assistant (PDA) applica tions; and 45 tively. At least one of the device 410 and the server 420 cause the search term 421A, 421B to be submitted via a network 425 can be performed on one platform or the other, or the plat forms can be combined so that there is only one platform. The to a search engine 430 that uses an index 432 of web pages or image processing can be implemented in a parallel computing manner, thus facilitating scaling of the system with respect to database size and input tra?ic loading. term 421A, 421B to produce a results set 434, and causes at least a portion of the results set 434 to be transmitted back to other information. The search engine then uses the search 50 the entire image, one or more subsets of the image, as well as a name or other information derived from the image, but not contained within the image. It should also be appreciated that which can be determined based on the appearance of the object in an image. This can be the location and/or identity of people scanned by multiple cameras in a security system, a 55 the mobile client and serverinot necessarily a direct con speci?c vehicles without requiring a transmission from the 60 partial views of the object. nection. Device 410 can be a cell phone, PDA, laptop computer, or any other portable device that optically captures an image. By Therefore, it is an object of the present invention to provide a system and process for identifying digitally captured images without requiring modi?cation to the object. Another object is to use digital capture devices in ways never contemplated by their manufacturer. Another object is to allow identi?cation of objects from one could use a proxy server between his/her portable device and the server. I short, the present application contemplates using any complexity of circuitous communication between passive locator system more accurate than GPS or usable in areas where GPS signals cannot be received, the location of vehicle, and many other uses. the portable device 410. In the above discussion it should be appreciated that information regarding the image can include It is further contemplated that some suitable algorithms will take into account the position and orientation of an object with respect to the user at the time the image was captured, “optically captures” is meant some sort of light sensitive array, the output of which can be processed to comprise a visually perceptible image. Viewed from another perspective, 65 device 410 can be any camera having telephony capability, and especially having cell phone capability. With current technology, device 410 would usually have a lens or other US 8,798,322 B2 5 6 light focusing mechanism, although it is contemplated that from light emitted or re?ected from the object, even if the data advances in electronics can eliminate the need for any physi is never presented to a user as a visually perceptible image, and even if the data is sent to a distal server without ever being cal focusing mechanism. The term “optically captures” is not satis?ed by a device that has not optical components, and is merely capable of downloading images from the Internet or collected into an image by the device. The information transmitted to the server can comprise any relevant information regarding the contents of the image. Thus, information 413 could comprise the entire image, or a other sources. It is certainly contemplated that the cell phone or other device providing the services discussed herein would operate portion of the image. For example, where a user takes a picture of a bar code (whether 2D, 3D or any other con?gu ration, the device 410 could process the image 412 to remove color and all background except the bar code itself, and then software permitting it to do so. That software could be resi dent on the device, in external memory (memory card), or paged in as needed. Object 415 (referred to as a Thing of Interest” in one or more of the priority applications) can be any visually percep merely send the portion of the image containing the bar code as the transmitted information 413. In other cases it is con tible object, regardless of dimension. Contemplated “two dimensional objects” include objects in which the relevant templated that the device 410 could suf?ciently process the image 413 to derive one or more keywords, and then send information is substantially in two dimensional format, only the keyword(s) as the transmitted information 413. All which includes advertisements and articles in magaZine or other print media, as well as photographs or designs on bill boards, street signs, restaurant or other business signs, user manuals, paintings at a museum, and so forth. possible combinations are also contemplated. Thus, a user might take a photograph of a GucciTM handbag, the device 20 Contemplated three dimensional objects include substan tially all physical objects in which the relevant information is derived from the shape of the object and/ or the appearance of the surface of the object. Thus, an automobile is considered herein to have three dimensions of relevance where the shape transmitted information 413. In such instances the process can be iterative. Thus, the device might initially transmit the word “Gucci” as the ?rst transmitted information, receive a 25 results set from the search engine indicating clothing acces sories, and then subtract out background except for the hand bag, and transmit the image of the handbag as the second transmitted information. As discussed above, it is speci?cally 30 numerical/digital data that is mathematically derived from the or other dimensions convey information about the make and model. Similarly, a window in a building can be considered to have three dimensions of relevance where the identity of the contemplated that the device 410 could send the server 420 manufacturer or distributor can be gleaned from the overall physical dimensions, detail, and so forth. As another 412 might derive the word “Gucci” from the image, subtract out background except for the handbag, and then transmit: (a) the word “Gucci”; and (b) the image of the handbag as the example, a beverage container can be considered to have three image. Examples include image features and characteristics dimensions; information can be obtained from the shape of that the server 420 could use in the server recognition process, the container but further information can also be obtained without transmitting the original image. from the label, printing, logos, text, or other such visible markings on the container (obtaining information from vis 35 As should be apparent by now, the transmitted information need not be limited to image information. Sights, sounds, text, ible markings on the container enables discrimination between different containers that have identical physical and all sorts of other information can be included in the transmitted information, some of which can be derived shape). Contemplated three dimensional objects include sub stantially all physical objects in which the relevant informa directly from the image, and some of which can be derived indirectly from the image. In addition, the device 410 can also capture non-visual information such as sounds, and that infor mation can also be transmitted. Thus, it is contemplated that the device could capture the sounds of a frog, capture an image of a lake or forest, and send both to be used as (or tion is derived from changes over time. For example, the 40 speed of a bird or its ?ight patterns, or a gesture of a person, can be captured in multiple images over a period of time, and can be relevant information, and can be reduced to search terms (referred to as Key Information in one or more of the priority documents) for submission to a search engine. Of further analyzed into) search terms. course, many objects will be considered to have two, three or Distal server 420 is distal in the sense that it has no hard wired link to device 410. Server 420 can be a single device, as four dimensions of relevance herein. Thus, relevant informa well as any number of devices coupled together, as for 45 tion for an automobile can be provided by each of a two example in a server farm. All manner of suitable servers are dimensional logo on the side of the vehicle, the three dimen sional shape of the vehicle, and its four dimensional acceleration or handling features. It is especially contemplated that objects can include ani mate and inanimate objects. Among animate objects are included faces of people, and biometric information such as contemplated. Thus, servers can use any reasonable hard the ?ngerprint pattern on a human ?nger, an iris of a person, 50 tions protocols, and so forth. In terms of interaction with the device, the various analyti cal tasks discussed above can allocated in any suitable man ner between server 420 and device 410. For example, in the 55 and so forth. Image 412 is contemplated to be any array of pixels. In most cases the pixels will be regularly arranged, but that is not iterative operation discussed above with respect to the GucciTM handbag, it is contemplated that the device 41 0 could analyze the image suf?ciently to transmit the term “Gucci” as an initial search term to the search engine 430, and the server 420 could then undertake the tasks of subtracting out back absolutely necessary. In most cases the pixels also will num ber greater than 19,200 (160x120), such as 78,800 (320x240) but they can number few than that. More preferred images ware, operate using any reasonable software, communica 60 ground of the image except for the handbag, and transmitting the image of the handbag as a second search term. In another example, the server 420 could determine that the have greater pixel counts, including for example, 256,000 (640x400), more preferably at least 2 million, and even more original image provided insuf?cient information, and send a preferably at least 4 million. It is not necessary that the image be actually constructed at the portable device. Thus, a state ment that “the portable device captures an image of an object” includes situations where the device receives and derives data message to the user through the device 410, directing the user to take another image (such as from another angle, closer, or with greater detail.). Indeed, the server 420 could direct the user to take an image of another object entirely, in order to 65 US 8,798,322 B2 7 8 help determine identity of the ?rst object. Thus, the user could communications between server and search engine will very likely take place over some combination of Internet and local take a ?rst image of a payment display at a ball game, provide that image to the server for identi?cation, and then instruct the server farm networks. The results set 434 can be of any size and composition, but most likely will be tailored to accommodate the device 410. It does very little good, for example, to transmit dozens of web pages to a cell phone, which has insuf?cient display area to user to take an image of a credit card against which the user wants to be billed for entrance into the ball game. The server could then process the payment against that credit card, and provide an entry code that the user could type to pass through an electronically controlled gate. properly view them. Thus, it is contemplated that the results set 434 can be whittled down or otherwise processed by the In still another example, a user could use his cell phone to capture an image of a screwdriver set at a hardware store, and the cell phone could transmit the information derived from the image to GoogleTM or some other search engine to ?nd comparison prices. The server 420 could then instruct the user to turn over the packaging and take another image of the set, this time from the back side of the packaging. In this way there is iterative interaction among the user’s device, the server, and the search engine. It should also be appreciated that there are embodiments in which the search engine never communicates with the por table device. For example, the server might do the search query, get results, and provide them to the portable device, or server (which of course is indicated generically by numeral 420 and need not be the very same box as utilized earlier in the transmission of the transmitted information 413) before being sent to the device 410. Thus, the server 420 or some other processor can process results before providing them to the device 410, such as where the search terms are submitted to the search engine by the server 420 rather than by the device 410. But the device 410 can also access the search engine directly using search information provided by the server. Four contemplated search modes include the following: 20 even to a television or other device besides the portable engine, and the search engine sends one or more web pages device. The phase “search engine” is contemplated herein to include any system dedicated to the indexing, searching and retrieval of information. The most familiar search engines such as GoogleTM, YahoolTM, MSNTM, andAlta VistaTM focus mostly or entirely on indexing web pages from the World Wide Web portion of the Internet. Other search engines, such as Lexis/NexisTM focus on indexing proprietary collections of 25 30 35 4. In any of the above modes, or in “direct linking” mode, 40 server, which causes an action to occur. An example of this is clicking on a vending machine with a cell phone to buy something from the machine. Another example is clicking on 45 symbol. It is speci?cally contemplated that in some instances 50 set, is transmitted back to the device, whether directly or indirectly by the search engine. It is particularly contemplated that a results set could include at least one hyperlinked case the device and/or the server in any combination can 55 including public and private access networks, and combina tions of the two. Preferred networks include the Internet, the upcoming Internet II, cell phone networks, and so forth. Although not expressly shown, the communication lines in a TV listing in a newspaper, causing the server to change the channel of the television in front of the user. Thus, a statement that “the search engine causes at least a portion of the results set 434 to be transmitted back to the portable device 410” should be interpreted herein to mean that at least some information relating to the results set, which information can or can not be included verbatim in the results plant, where the search term is the name of the plant). In either perform one or more of the tasks of deriving the search term and submitting it to one or more search engines. Network 425 can be any workable electronic network, the result might not be a search results page, but instead some other type of information or action. For example, a server could identify an object, and thereupon send a code to another portion of an image, as well as a logo, bar code or other an image will contain a literal of a search terms (e.g. the name of a movie on a movie poster), in some instances an image will not contain such a literal (e.g. a picture of a tree or other engine, and receives a response. The server optionally pro cesses the search response (which could be in any form) and provides some result to portable device. The result could, for example, comprise a ?le sent to the portable device, or a web portable device. names, number strings and other terms that are not found in any dictionary; as well as (c) characters that are interpreted as wild cards, truncations and so forth. Such search engines are already starting to experiment with use of non-text keys, including for example images and/or sounds. All such pos sible keys fall within the scope of contemplated search terms. Thus, contemplated search terms include key words, a device composes a search URL, sends the search query to the search engine, and receives one or more web pages in response. 3. The server 420 sends the search query to the search page on some server, with URL of that web page sent to the engines, the keys are currently text. In such instances a user typically enters one or more key words, where the term “key word” is used in an extremely broad sense to include: (a) words that would likely be found in a dictionary; (b) proper back to the portable device. 2. The server 420 sends keywords, and optionally also a search engine address, to portable device 410. The portable data, which can include links to Internet Web pages. The phase “search term” is contemplated herein to include any keys or other inforrnationused by the search engines to access their indexing system. In the case of most web based search 1. The server 420 composes a search URL (consisting of search engine address and key words) and sends it to the portable device 410. The portable device then executes the search engine query by sending the search URL to the search 60 address. It is speci?cally contemplated that results sets can include the following types of information: Uniform Resource Loca tor (URL); Uniform Resource Identi?er (URI); Internet Pro tocol (IP) address; telephone number; radio frequency or channel; television frequency or channel; and physical loca FIG. 7 are all contemplated to be one- or two-way commu tion or address. The result(s) displayed to the user can be interactive. In such a case, the user can take further action by nications as appropriate. Moreover, it is contemplated that multiple networks will usually be involved. Thus, for web page, or some combination of the two. Or, as discussed example, communications between device 410 and server 420 will very likely take place over some combination of cell phone (not shown) and Internet networks (e.g. 425), while interacting directly with the object, by linking to a referenced 65 above, the results could cause another server/computer or machine to perform some action, such as dispensing a product or changing a channel. US 8,798,322 B2 10 gesturing in sign language. Image/motion recognition tech From a method perspective, methods of using a search an object; running computer software that automatically niques are used to translate the sign language into text or other machine-understandable data, such as text. The machine understandable data is either sent directly to a search engine derives a ?rst search term from at least a portion the image; or is used to determine search terms that in turn are sent to a submitting the ?rst search term to the search engine; and transmitting the information to the device. Some preferred methods further comprise using the device to take a second search engine. The search engine returns information addresses pertinent to the meaning of the sign language or engine to obtain information are contemplated comprising: using a cell phone enabled portable device to take an image of portions thereof. Still another embodiment comprises search using speech input. There, human speech is captured by a sound capture and/or recording device. Speech recognition processing is image of the object; running the computer software to derive a second search term from at least a portion of the second object; and submitting the second search term along with the ?rst search term to the search engine. Other preferred meth then used to recognize the speech and translate it into machine-understandable data (such as text). The machine understandable data is either sent directly to a search engine ods include the step of submitting the ?rst search term can advantageously comprise: sending at least the portion of the or is used to determine search terms that are in turn sent to a image to a distal server; running the software on the server; and the server sending the search term to the search engine. Still other preferred methods include a distal server providing the search term(s) to the device, with the device submitting the search term(s) to the search engine. Analysis of data (whether visual or otherwise) to produce search engine. The search engine returns information addresses pertinent to the meaning of the human speech or portions thereof. An especially preferred embodiment of this invention com 20 prises a search using camera-equipped portable device. Useful techniques include. For example, signal analysis, Fou There, imagery is captured by a portable device with a net work connection (for example, a cellular telephone). Image rier analysis, pattern matching, pattern recognition, image recognition, object recognition, wavelet analysis, component recognition processing is then used to recognize at least one object in the imagery. The recognition process can be per search terms can be accomplished in any suitable manner. analysis, etc. 25 formed in the portable device, in a distant server, or distrib uted and/or otherwise shared and performed partly in each. Based on the identity of the object(s), text keywords corre sponding to the object(s) are retrieved from a database. As EXAMPLES with the image recognition, it is preferred that this process Search terms can be advantageously derived from attri bute(s) including name, type, size, color, position, and loca tion, with the derivation performed by algorithm, table/data 30 occur on a distant server, although it can be performed on the base look-up, hardware device, or other suitable means. For portable device or on a combination of the portable device and the server. The text keywords are then sent to a search engine. example, consider an example wherein the object being This is accomplished by sending the keywords to an Internet imaged is a poster for the color version of a movie named “Modern Times,” starring Charlie Chaplin. The device 410 35 search engine web site as an HTTP transaction, with the search keywords embedded in the URL that is sent to the search engine web site. It is preferred that the HTTP transac tion be initiated from the portable device, so that the search results are returned directly to the portable device. In this case, the search keywords would generally ?rst be made available on the portable device; if they were determined on and/ or the server 420 can identify as attributes the text “Mod ern Times Movie Poster” and “Color Version”, and can from that determine search terms such as “Modern Times”, “Col orized”, “Charlie Chaplin”, and “Classic movies”. The attributes and search terms in this case could be determined 40 the distant server then they are ?rst sent from the server to the by a human user, a machine algorithm, or some combination of the two. portable device. The search engine results are returned to the In another example, a user takes an image of a notebook portable device as a web page which can then be displayed in computer. An algorithm detects the notebook computer in the image and identi?es it as being a Model 5, made by ZZZ Corporation. The algorithm then determines the attribute “ZZZ Model 5” and the corresponding search terms, “online 45 shopping”, “ZZZ”, “notebook”, and “5”. An embodiment of particular interest comprises a search using image and/or video input. The device captures one or 50 the web browser of the portable device. If the HTTP transac tion was initiated by the server, then the results web page is made available for viewing on the portable device by one or more various means (the address of the results web page can be sent to the portable device, or the entire web page can be sent to the portable device, or the web page can be stored or converted into another form on the server after which the more of single images, multiple images, motion imagery, portable device is directed to the address of the stored or and/or video (each and all of these information types are known henceforth as “imagery”). Indeed, the imagery can be captured by more than one electronic imaging device, such as converted page, etc.) a digital camera, a camera-equipped mobile telephone, or a Image Analysis 55 Preferred image analysis techniques are described in the following, in which FIG. 1 shows the overall processing ?ow security camera, or multiple such devices. An object or objects are identi?ed in the imagery via image/object recog nition techniques (software and/ or hardware). The identity of and steps. These steps are described in further detail in the the object(s) is used to look up, in a table/ database, a set of text keywords search terms, which are then provided to a search computer, mobile telephone, personal digital assistant, or following sections. In FIG. 2, for image capture 10, the user 12 utilizes a 60 other similar device 14 equipped with an image sensor (such engine. The search engine returns information addresses as a CCD or CMOS digital camera). The user 12 aligns the (e.g., in the form of a web page with hyperlinks) that are pertinent to the objects identi?ed in the imagery. The user sensor of the image capture device 14 with the object 16 of interest. The linking process is then initiated by suitable then accesses information and/ or computing resources based upon at least one of the information addresses. Another contemplated embodiment comprises a search using sign language input. Imagery is captured of a person means including: the user 12 pressing a button on the device 65 14 or sensor; by the software in the device 14 automatically recognizing that an image is to be acquired; by user voice command; or by any other appropriate means. The device 14 US 8,798,322 B2 11 12 captures a digital image 18 of the scene at which it is pointed. This image 18 is represented as three separate 2-D matrices of enhancement techniques such as parallel computing and data indexing techniques. It is also important to note that the loop pixels, corresponding to the raw RGB (Red, Green, Blue) representation of the input image. For the purposes of stan dardizing the analytical processes in this embodiment, if the comparison, rather than the shared approach shown in the structures can be implemented independently for each inner FIGS. 3A and 3B. device 14 supplies an image in other than RGB format, a Preferably, parallel processing is used to divide tasks transformation to RGB is accomplished. These analyses between multiple CPUs (central processing units) and/or could be carried out in any standard color format, should the need arise. If the server 20 is physically separate from the device 14, then user acquired images are transmitted from the device 14 to the Image Processor/ server 20 using a conventional digital network or wireless network means. If the image 18 has been computers. The overall algorithm may be divided in several ways, such as: compressed (e.g. via lossy JPEG DCT) in a manner that introduces compression artifacts into the reconstructed image 18, these artifacts can be partially removed by, for example, applying a conventional despeckle ?lter to the reconstructed Sharing the Outer Loop In this technique, all CPUs run the entire algorithm, including the outer loop, but one CPU runs the loop for the ?rst N cycles, another CPU for the second N cycles, all Sharing the Comparison In this technique, one CPU performs the loop functions. When the comparisons are performed, they are each passed to a simultaneously. image prior to additional processing. Image type determination 26 can be accomplished with a discriminator algorithm which operates on the input image 18 and determines whether the input image contains recogniz Sharing the database 20 separate CPU to be performed in parallel. This technique entails splitting database searches between CPUs, so that each CPU is responsible for searching one section of able symbols, such as barcodes, matrix codes, or alphanu meric characters. If such symbols are found, the image 18 is the database, and the sections are searched sent to the decode symbol 28 process. Depending on the Loop” technique described above. con?dence level with which the discriminator algorithm ?nds in parallel by multiple CPUs. This is, in essence, a form ofthe “Sharing the Outer 25 the symbols, the image 18 also can or alternatively contain an object of interest and can therefore also or alternatively be sent to the Object Image branch of the process ?ow. For example, if an input image 18 contains both a barcode and an object, depending on the clarity with which the barcode is Actual implementations can be some combination of the above techniques that optimizes the process on the available hardware. 30 detected, the image can be analyzed by both the Object Image and Symbolic Image branches, and that branch which has the where data resides to only search in those parts of the database that contain potential matches. Various forms of indexing may be used, such as hash tables, data compartmentalization highest success in identi?cation will be used to identify and link from the object. The image can then be analyzed to determine the location, size, and nature of the symbols in the decode symbol 28. The symbols are preferably analyzed according to their type, and 35 example of such techniques is, in the shape comparison algo rithm, if a database is to be searched for an entry with an area with a value of A, the algorithm would know which database 40 For object images, one can advantageously perform a “decomposition”, in the input image decomposition step 34, of a high-resolution input image into several different types of quanti?able salient parameters. This allows for multiple inde pendent convergent search processes of the database to occur 45 in parallel, which greatly improves image match speed and match robustness in the database matching 3 6. The best match 38 from either the decode symbol 28, or the image database matching 3 6, or both, is then determined. If a speci?c URL (or other online address) is associated with the image, then an URL Lookup 40 is performed and the Internet address is returned by the URL Return 42. Code examples are set forth in the priority documents, as well as further detail, including 50 entries or data areas have this approximate value and would not need to search the entire database. FIG. 4 shows a simpli?ed con?guration of an alternative analytical technique. Boxes with solid lines represent pro cesses, software, physical objects, or devices. Boxes with dashed lines represent information. The process begins with an object of interest: the target object 100. In the case of consumer applications, the target object 100 could be, for example, beverage can, a music CD box, a DVD video box, a magazine advertisement, a poster, a theatre, a store, a build ing, a car, or any other object that user is interested in or wishes to interact with. In security applications the target object 100 could be, for example, a person, passport, or driv er’s license, etc. In industrial applications the target object segmentation, segment group generation, bounding box gen eration, geometric normalization, wavelet decomposition, (i.e., data within certain value ranges are stored in certain locations), data sorting, and database table indexing. An their content information is extracted. For example, barcodes and alphanumeric characters will result in numerical and/or text information. Another technique employed to maximize speed is data indexing. This technique involves using a priori knowledge of 55 color cube decomposition, shape decomposition, low-resolu 100 could be, for example, a part in a machine, a part on an assembly line, a box in a warehouse, or a spacecraft in orbit, etc. tion grayscale image generation, grayscale comparison, The terminal 102 is a computing device that has an “image” wavelet comparison, color cube comparison, and calculation capture device such as digital camera 103, a video camera, or of combined match score. any other device that an convert a physical object into a digital FIGS. 3A and 3B show a preferred process ?ow that can 60 representation of the object. The imagery can be a single image, a series of images, or a continuous video stream. For occur within a database matching operation. The algorithm is simplicity of explanation this document describes the digital imagery generally in terms of a single image, however the presented here as containing four nested loops with four parallel processes inside the innermost loop. This structure is for presentation and explanation only. Any actual implemen at the innermost layer, can have a different structure in order invention and this system can use all of the imagery types described above. After the camera 103 captures the digital imagery of the to achieve the maximum bene?t from processing speed target object 100, image preprocessing 104 software converts tation, although most likely performing the same operations 65 US 8,798,322 B2 13 14 the digital imagery into image data 105 for transmission to This is accomplished by analyzing the image data 105 and comparing the results to other data, representing images of a plurality of known objects, stored in the database 108, and recognizing the target object 100 if a representation of target and analysis by an identi?cation server 106. Typically a net work connection is provided capable of providing communi cations with the identi?cation server 106. Image data 105 is data extracted or converted from the original imagery of the target object 100 and has information content appropriate for object 100 is stored in the database 108. In some embodiments the terminal 102 includes software, such as a web browser (the browser 110), that receives an information address, connects to that information address via a network or networks, such as the Internet, and exchanges identi?cation of the target object 100 by the object recogni tion 107, which can be software or hardware. Image data 105 can take many forms, depending on the particular embodi ment of the invention. Speci?c examples are given in the information with another computing device at that informa priority documents. tion address. In consumer applications the terminal 102 can The image data 105 is sent from the terminal 102 to the be a portable cellular telephone or Personal Digital Assistant identi?cation server 106. The identi?cation server 106 equipped with a camera 103 and wireless Internet connection. receives the image data 105 and passes it to the object recog nition 107. In security and industrial applications the terminal 102 can be a similar portable hand-held device or can be ?xed in location and/or orientation, and can have either a wireless or wire line The identi?cation server 106 is a set of functions that usually will exist on computing platform separate from the network connection. Other object recognition techniques also exist and include terminal 102, but could exist on the same computing plat form. If the identi?cation server 106 exists on a separate computing device, such as a computer in a data center, then 20 methods that store 3-dimensional models (rather than 2-di mensional images) of obj ects in a database and correlate input the transmission of the image components 105 to the identi images with these models of the target object is performed by ?cation server 106 is accomplished via a network or combi an object recognition technique of which many are available nation of networks, such a cellular telephone network, wire less Intemet, Internet, and wire line network. If the commercially and in the prior art. Such object recognition techniques usually consist of comparing a new input image to a plurality of known images and detecting correspondences identi?cation server 106 exists on the same computing device as the terminal 102 then the transmission consists simply of a transfer of data from one software component or process to another. Placing the identi?cation server 106 on a computing plat form separate from the terminal 102 enables the use of pow 25 between the new input image and one of more of the known images. The known images are views of known objects from a plurality of viewing angles and thus allow recognition of 2-dimensional and 3-dimensional objects in arbitrary orien 30 erful computing resources for the object recognition 107 and database 108 functions, thus providing the power of these FIG. 4 shows the object recognition 107 and the database 108 as separate functions for simplicity. However, in many embodiments the object recognition 107 and the database 108 computing resources to the terminal 102 via network connec tion. For example, an embodiment that identi?es objects out of a database of millions of known objects would be facili tations relative to the camera 103. are so closely interdependent that they can be considered a 35 tated by the large storage, memory capacity, and processing single process. power available in a data center; it is very dif?cult to have It is usually desirable that the database 108 be scalable to enable identi?cation of the target object 100 from a very large such computing power and storage in a portable device. plurality (for example, millions) of known objects in the Whether the terminal 102 and the identi?cation server 106 are on the same computing platform or separate ones is an archi database 108. The algorithms, software, and computing hard 40 tectural decision that depends on system response time, num ber of database records, image recognition algorithm com puting power and storage available in terminal 102, etc., and this decision must be made for each embodiment of the inven tion. Based on current technology, in most embodiments comparison technique for comparing the image data 105 to 45 these functions will be on separate computing platforms. The overall function of the identi?cation server 106 is to 50 larly important in cases where a non-metric distance is used in object recognition 107, because techniques such database clustering and multi-resolution search can not be possible and codes or text, in the image. 55 3. Provide the target object information 109 that corre thus the complete database must be searched by partitioning the database across multiple CPUs. sponds to the target object 100. The target object infor mation 109 usually (depending on the embodiment) As described above, the object recognition 107 can also detect identifying marks on the target object 100. For example, the target object 100 can include an identifying includes an information address corresponding to the target object 100. The object recognition 107 detects and decodes symbols, In addition to such software techniques, a parallel process ing computing architecture can be employed to achieve fast searching of large databases. Parallel processing is particu The object recognition 107 and the database 108 function together to: l. Detect, recognize, and decode symbols, such as bar 2. Recognize the object (the target object 100) in the image. data stored in the database 108, along with database cluster ing and multi-resolution distance comparisons. This tech nique is described in “Fast Exhaustive Multi-Resolution Search Algorithm Based on Clustering for Ef?cient Image Retrieval,” by Song, Kim, and Ra, 2000. determine and provide the target object information 109 cor responding to the target object 100, based on the image data 105. ware must be designed to function together to quickly per form such a search. An example software technique for per forming such searching quickly is to use a metric distance 60 number or a barcode. This information can be decoded and used to identify or help identify the target object 100 in the such as barcodes or text, in the input image. This is accom database 108. This information also can be passed on as part plished via algorithms, software, and/or hardware compo of the target object information 109. If the information is included as part of the target object information 109 then it nents suited for this task. Such components are commercially available (The HALCON software package from MVTec is an example). The object recognition 107 also detects and recognizes images of the target object 100 orportions thereof. 65 can be used by the terminal 102 or content server 111 to identify the speci?c target object 100, out of many such objects that have similar appearance and differ only in the US 8,798,322 B2 15 16 identifying marks. This technique is useful, for example, in speci?c target object 100, the content server 111 connects to and exchanges information with the target object 100 via a cases where the target object 100 is an active device with a network or networks such as the Internet. In this type of embodiment, the terminal 102 is connected to the content server 111 and the content server 111 is connected to the network connection (such as a vending machine) and the content server establishes communication with the target object 100. A combination with a Global Positioning System can also be used to identify like objects by their location. The object recognition 107 can be implemented in hard target object 100. Thus, the terminal 102 and target object 100 can communicate via the content server 111. This enables the user to interact with the target object 100 despite the lack of a direct connection between the target object 100 and the ter minal 102. ware, software, or a combination of both. Examples of each category and additional details are set forth in one or more of the priority documents. In most embodiments the browser 110 will be a web FIG. 5 shows an embodiment that uses a cellular telephone, browser, embedded in the terminal 102, capable of accessing PDA, or such portable device equipped with computational and communicating with web sites via a network or networks such as the Internet. In some embodiments, however, such as capability, a digital camera, and a wireless network connec those that only involve displaying the identity, position, ori tion, as the terminal 202 corresponding to the terminal 102 in FIG. 4. In this embodiment, the terminal 202 communicates entation, or status of the target object 100, the browser 110 with the identi?cation server 206 and the content server 211 can be a software component or application that displays or provides the target object information 109 to a human user or via networks such as a cellular telephone network and the Internet. This embodiment can be used for applications such as the to another software component or application. In embodiments wherein the browser 11 0 is a web browser, 20 following (“user” refers to the person operating the terminal the browser 110 connects to the content server 111 located at 202, and the terminal 202 is a cellular telephone, PDA, or the information address (typically an Internet URL) included in the target object information 109. This connection is effected by the terminal 102 and the browser 110 acting in similar device, and “point and click” refers to the operation of the user capturing imagery of the target object 200 and initi ating the transfer of the image data 205 to the identi?cation concert. The content server 111 is an information server and 25 server 206). computing system. The connection and information The user “points and clicks” the terminal 202 at a compact exchanged between the terminal 102 and the content server disc (CD) containing recorded music or a digital video disc (DVD) containing recorded video. The terminal 202 browser 111 generally is accomplished via standard Internet and wire less network software, protocols (e.g. HTTP, WAP, etc.), and networks, although any information exchange technique can 30 be used. The physical network connection depends on the system architecture of the particular embodiment but in most embodiments will involve a wireless network and the Inter net. This physical network will most likely be the same net work used to connect the terminal 102 and the identi?cation connects to the URL corresponding to the CD or DVD and displays a menu of options from which the user can select. From this menu, the user can listen to streaming audio samples of the CD or streaming video samples of the DVD, or can purchase the CD or DVD. 35 The user “points and clicks” the terminal 202 at a print media advertisement, poster, or billboard advertising a server 106. movie, music recording, video, or other entertainment. The The content server 111 sends content information to the terminal 102 and browser 110. This content information usu browser 210 connects to the URL corresponding to the adver tised item and the user can listen to streaming audio samples, ally is pertinent to the target object 100 and can be text, audio, video, graphics, or information in any form that is usable by purchase streaming video samples, obtain show times, or 40 the browser 110 and terminal 102. The terminal 102 and browser 110 send, in some embodiments, additional informa For example, the programming could consist of a product promotion involving a reduced price during a limited time. tion to the content server 111. This additional information can be information such as the identity of the user of the terminal 102 or the location of the user of the terminal 102 (as deter mined from a GPS system or a radio-frequency ranging sys tem). In some embodiments such information is provided to the content server by the wireless network carrier. The user can perform ongoing interactions with the content server 111. For example, depending on the embodiment of the invention and the applications, the user can: 45 50 users “point and click” on the television screen at speci?c times, based on the on-screen content, to register votes, indi cate actions, or connect to a web site through which they perform real time interactions with the on-screen program. The user “points and clicks” on an object such as a con 55 sumer product, an advertisement for a product, a poster, etc., the terminal 202 makes a telephone call to the company selling the product, and the consumer has a direct discussion with a company representative regarding the company’s to a credit card. In some embodiments the content server 111 can reside within the terminal 102. In such embodiments, the commu nication between the terminal 102 and the content server 111 users that “point and click” on this television programming during the promotion are linked to a web site at which they can purchase the product at the promotional price. Another example is a interactive television programming in which Listen to streaming audio samples if the target object 100 is an audio recording (e.g., compact audio disc). Purchase the target object 100 via on-line transaction, with the purchase amount billed to an account linked to the terminal 102, to the individual user, to a bank account, or purchase the item or tickets. The user “points and clicks” the terminal 202 at a television screen to interact with television programming in real-time. product or service. In this case the company telephone num 60 ber is included in the target object information 209. If the target object information 209 also includes the company URL does not occur via a network but rather occurs within the then the user can interact with the company via both voice and terminal 102. In embodiments wherein the target object 100 includes or is a device capable of communicating with other devices or computers via a network or networks such as the Internet, and Internet (via browser 210) simultaneously. 65 The user “points and clicks” on a vending machine (target object 200) that is equipped with a connection to a network such as the Internet and that has a unique identifying mark, wherein the target object information 109 includes adequate such as a number. The terminal 202 connects to the content identi?cation (such as a sign, number, or barcode) of the server 211 of the company that operates the vending machine.
© Copyright 2026 Paperzz