1/25 Detection and Extraction of Artificial Text for Semantic Indexing Christian Wolf and Jean-Michel Jolion Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex, France January 9th 2002 Dagstuhl Seminar on Content-Based Image and Video Retrieval This presentation can be downloaded from: http://rfv.insa-lyon.fr/~wolf/presentations 2/25 Plan of the presentation Introduction Detection and tracking Enhancement and binarization of the text boxes Experiments and results Open problems Conclusion and Outlook Slides: 6 3 4 2 9 1 25 This work resulted in a patent submitted by France Télécom on May 23th, 2001 under the reference FR 01 06776. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 3/25 Content based image retrieval Result Example image Similarity Function Indexing phase Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 4/25 Similarity measures similar similar Introduction Detection Not similar Enh/Binarization Exp.Results Open problems Conclusion 5/25 Indexing using Text Result Key word Keyword based Search Patrick Mayhew Indexing phase Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel ... ... ... ... ... Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 6/25 Video properties 80 px 12 px 8 px Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 7/25 Text extraction: general scheme Detection of the text in single frames Tracking Image enhancement - Multiple frame integration OCR Segmentation/ Binarisation Video "EVENEMENT" "ACTU" "SPELEOS" "Gouffre Berger (Isére)" "aujourd'hui" "France 3 Alpes" "un spéléologue sauveteur" Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 8/25 Text detection by accumulation of horizontal gradients (LeBourgeois, 1997). Justification: Text forms a regular texture containing vertical edges which are aligned horizontally. Post processing by mathematical morphology. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 9/25 Detection in video sequences Detection per single frame Text occurrences List of rectangles per frame Frame nr. (time) Tracking keeping track of text occurrences Suppression of false alarms Image Enhancement Multiple frame integration Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 10/25 Image enhancement Integration of multiple frames to create a single image of higher quality. Super-resolution (interpolation) M1 M2 M4 M3 An additional weight is included into the interpolation scheme, which decreases the weights of temporal outlier pixels. Multiple frame integration: Averaging Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 11/25 Binarization Niblack: T m k .s Sauvola et al.: s T m.(1 k .( 1)) R Im LC s Contrast in the center of the image Mm s m s xam k R mean of the window standard deviation of the window parameter dynamics of the gray values of the image Mm FC R C The contrast of the window The maximum local contrast I : CL a (Cmax CF ) M minimum gray value of the image s T (1 a )m aM a ( m M ) R Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 12/25 Binarization methods: examples Original image Fisher Fisher (windowed) Yanowitz B. Niblack Sauvola et al. Our method Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 13/25 Binarization using a priori knowledge Bayesian MAP estimation using prior knowledge on the spatial relationships in the image, modeled as a Markov random field. (In collaboration with David Doermann from the Language and Media Processing Laboratory of the University of Maryland) Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 5 different MPEG 1 videos of resolution 384x288. 14/25 62 minutes 93000 frames 413 text appearances Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 15/25 Detection and OCR results Detection results DETECTION Pred. Text Pred. Non-Text Total Positives False alarms Logos Scene text Pos+Log+Scene Total OCR Results, classified by binarization method 301 21 322 350 947 75 72 497 1444 True pos. True neg. False pos. False neg. Introduction Detection % Input 93,5 AIM2 AIM3 34,4 AIM4 AIM5 Total Enh/Binarization Bin. method Niblack Sauvola R=128 R=ad R=ad, shift Niblack Sauvola R=128 R=ad R=ad, shift Niblack Sauvola R=128 R=ad R=ad, shift Niblack Sauvola R=128 R=ad R=ad, shift Niblack Sauvola R=128 R=ad R=ad, shift Exp.Results Recall 67,4 53,8 75,0 78,4 92,5 69,9 85,3 96,2 78,5 48,6 69,8 80,1 62,1 66,7 64,8 69,0 73,1 58,4 73,0 79,6 Precision 87,5 87,6 87,8 90,4 78,1 89,6 92,5 95,3 92,0 87,7 84,8 90,4 71,4 89,3 90,1 91,0 82,6 88,5 88,4 91,5 Open problems Cost 499 616,5 384,5 344,5 196 206 110 51,00 252,00 490,50 360,50 211,50 501,50 324,50 328,00 294,50 1448,5 1637,5 1183 901,5 Conclusion 16/25 Open questions Scene text (general orientations, deformations) Moving text Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 17/25 What is scene text? Frames containing scene text Frames containing artificial text Video frames We do not have enough information about the importance of text in the destination domain. How many frames do contain text and scene text? Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Detection: From artificial text to scene text 18/25 Several constraints have to be removed passing from artificial text to scene text: The constraints on temporal stability need to be abandoned or at least softened (no initial frame integration) Text can be aligned in all orientations (Creation of an oriented feature in multiple directions, similar to invariant features) Contrast is possibly lower because scene text is not designed to be read easily (Is detection of unreadable text necessary?). Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 19/25 Text models Simple Models Complex Models sets of edges or vertical strokes... templates, probabilistic models (MRF)... +Generalize well, respond to many kinds of text - Many false alarms +Powerful less false alarms - Do not generalize well Main problem: Distinction between characters and structures similar to text according to the chosen model. Assumptions are necessary (on the font, size, style, contrast, color, length, etc.) but not sufficient. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 20/25 Sven Dickinson: evolution of models Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 21/25 What is text? Whatever model we choose, we cannot detect/recognize all kinds of text without solving the general image understanding problem. The best thing we can do is to include richer features into the detection process: a composite model for text. Structural analysis (e.g. detection and recognition of characters by strokes). Very hard and very unlikely to work in the case of noisy images, low resolutions and difficult fonts. Statistical modeling of text features (e.g. by learning techniques). Problem: For a robust detection high neighborhood sizes are needed, which lead to combinatorial explosions. E.g.: Texture based methods for small text and segmentation + perceptual grouping, structural methods for big text. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 22/25 Learning techniques: pro et contra Bibliography: Learning directly the gray levels of the input image (Jung 2001) Learning features, i.e. coefficients of the Haar wavelet (Li and Doermann 2000) or edge strength (Lienhart 2000) + Learning is an easy way to handle the complexity of text. - Text can appear in videos in many different fonts, sizes, styles, colors, orientations etc. Learning all different forms is maybe not feasible. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 23/25 Color processing for detection? Original image Sobel on grayscale image 1 0 1 2 0 2 1 0 1 Sobel on L*u*v* image 1 2 1 Deuclid ( I x1, 0 , I x1, 0 ) Saturating distance or non saturating distance? Reflection processing? Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 24/25 Tracking of moving scene text Do we detect the text in single frames (like artificial text), or do we treat the flow in its integrality? Single frames: Multiple frame integration of moving text needs robust registration of the text boxes in different frames (e.g. rough segmentation into text and background pixels before the registration of the text pixels only) . Robust methods, which are able to track objects in clutter, are needed. Detection of moving objects, e.g. by optical flow, spatiotemporal methods. Mosaicing techniques can be employed for image enhancement. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion 25/25 Conclusion and Outlook We developed a system for detection, tracking, enhancement and binarization of artificial text in videos. The total recognition rate for artificial text is surprisingly high, given the quality of the text, but not yet good enough for indexing purposes. The remaining problems in text extraction seem to be typical for applications in visual information management: We went as far as we could with low level features. We can’t do the necessary step to semantic information. What is text? Possible definition: text is, what (a human or an OCR) can recognize as text. We have to include as much a priori knowledge as possible into the process. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion
© Copyright 2026 Paperzz