Which Log for which Information? Gathering Multilinguality Data from Different Log File Types Maria Gäde, Vivien Petras, and Juliane Stiller Humboldt-Universität zu Berlin CLEF 2010 Padova, 21 September 2010 Premise Assume you are building a multilingual digital library and could log every user action with particular consideration for multilingual activities. Which questions could one ask? (Which questions cannot be answered by logging?) Outline: • Europeana • Log file types • Logging multilingual information • Europeana ClickStreamLogger 2 / 16 Europeana • 1,000+ content providers • Portal + APIs • Services September 2010: • 7.8 mio. images “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” • 4.6 mio. texts • 127,000 videos • 68,000 sounds European Parliament, 27 September 2007 3 / 16 Multilingual Europeana • Interface • Search • Browse • Results 4 / 16 Multilingual Europeana 5 / 16 Log File Types Example Apache web server log 123.123.123.123 - - [11/Mar/2010:09:42:06 +0100] "GET /cache/image/?uri=http://images.scran.ac.uk/rb/images/ thumb/0098/00980252.jpg&size=BRIEF_DOC&type=IMAGE HTTP/1.0" 200 2843 "http://www.europeana.eu/portal/briefdoc.html?start=1&view=table&query=italy" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)" 6 / 16 Log File Types Example Google Analytics Map overlay (IP address) Languages (system language) 7 / 16 Log File Types – Missing Information • Web server log (Apache) • • • • Interface language missing Certain actions cannot be distinguished (browse = search) Ajax / Flash actions (saved searches, tags, filter) Reconstruct sessions • Search engine log (Solr) • Only queries • Google Analytics • Queries missing 8 / 16 Logging Multilingual Information Stages of the interaction: • Approaching the system / background information • Launching queries / browsing • Viewing results • Interacting with the results (filter, save, tag, repeat) • User background • Interface language • Query language • Query type • Query content • Query translation • Search results • Result set views • Result translation • Query reformulation • User-generated content • Saved searches / docs 9 / 16 Logging Multilingual Information - Background • User background information • Country of access, system language, referrer site • Interface language • Change stronger intervention 10 / 16 Logging Multilingual Information - Query • Query language • Query processing • Adapting languages to system • Query type • Simple, advanced, fielded (e.g. language restriction) • Pre-selected categories for browsing • Query content • Named entities, dates, numbers (language ambiguous) • Query translation 11 / 16 Logging Multilingual Information - Results • Search results • Document languages • Result set views • Detailed view, external click stronger intervention • Result translation 12 / 16 Logging Multilingual Information – User Activities • Query reformulation / refinement • Language switch • Filtering (language), related-item search • User-generated content • Language of tags • Language of documents being tagged • Saved searches / documents • ??? 13 / 16 Europeana ClickStreamLogger • Interface language • state + change for every activity • Search • Result numbers, distribution of results by language / country • Filtering and related searches • Browse • Browsing activities + starting points • Navigation • Move outside Europeana • Ajax • Save / remove searches / tags • User management • Account creation etc. 14 / 16 What happens now… • Soft roll-outs of new releases change site • Analysis of log data • Interpretation • Re-iteration of “useful information” categories • Re-design user interaction? 15 / 16 www.europeana.eu www.europeanaconnect.eu 16 / 16
© Copyright 2026 Paperzz