PSU/Villanova/VT Discussion Virginia Tech’s Digital Library Research Laboratory Jan. 10, 2005 -- PSU Edward A. Fox, [email protected] Virginia Tech, Blacksburg, VA 24061 USA http://fox.cs.vt.edu/talks/ http://fox.cs.vt.edu/cv.htm Acknowledgements (Selected) • Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS Acknowledgements: Faculty, Staff • Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, … Acknowledgements: Students • Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Saverio Perugini, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Xiaoyan Yu, Baoping Zhang, Qinwei Zhu, … Stepping Stones & Pathways: Improving retrieval by chains of relationships between document topics Fernando Das-Neves, Virginia Tech DLRL A Little Experiment (Compare a simple query with a longer version that explicitly includes stepping stones) • “Literary Style in Sherlock Holmes stories” No. of rel. docs. Literary Style 2 Sherlock Holmes VS. 4 Connan Doyle 20 Sherlock Holmes Literary Style 5 Victorian Novel 5 • Note: Numbers are total relevant web pages in top 20 Google results for the query made up of terms on either end of the link. Another Example • “What is the Relationship between Data Mining and Recommender Systems?” Data Mining 7 Recommender Systems VS. 10 Data Mining Machine Learning Collaborative Filtering Social Networks 9 11 10 15 Recommender Systems • Naïve Results: There are many matches that are possible answers. • Discussion: But, many of the pages with co-occurrences give no real information about the requested relationship. An Alternative Interpretation of a Query in IR: • A query represents two related, separable concepts. • Objective: Retrieve a sequence of documents that support a valid set of chains of relationships between the two concepts. • Input: a query representing two concepts. • Output: two groups of documents + a set of stepping stones (document groups, i.e., clusters) connecting the topics by pathways (relations among clusters). Type of Questions Matching Alternative Interpretation • Ill-defined questions, with non-enumerated answers: – “How or why is X related to Y?” – “What is the X of Y?” • Even if queries with form “give me something about X” lead to relevant docs, it is possible to increase the quantity and quality of information in the query result, when relations are explicit (as a result of our semi-automatic method). Why is this useful? • Questions of this type are common. – For example, such questions often occur during research studies. – These occur often in educational settings, e.g., for homework. – These occur often in workplace settings, requiring gathering and relating of information. • Handling of this type of question by current systems often is inadequate. How to Build Stepping Stones and Pathways? • Our approach involves a belief network, to combine content+structure in document similarity calculation, including citation and co-citation similarities. • Find two relevant document sets, each related to one of the two original sub-queries. • Find a diverse set of strong candidates, each connecting the two subsets, but as different as possible from other candidates. • Create stepping stones by finding similar documents to those candidates; keep the clusters that are heavily cited, or whose documents are highly correlated (in all aspects). • Repeat the process, finding a new stepping stone in between each pair of clusters that are weakly related, until the pathway length is too long, or the similarity is sufficient. Streams, Structures, Spaces, Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications Marcos André Gonçalves Doctoral defense Virginia Tech, Blacksburg, VA 24061 USA Informal 5S Definition: DLs are complex systems that • • • • • help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams) 5Ss Ss Examples Objectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines service managers, responsible for running DL services; actors, that use those services Hypotheses • A formal theory for DLs can be built based on 5S. • The formalization can serve as a basis for modeling and building highquality DLs. 5S Framework and DL Development (Gonçalves) Formal Theory/ Metamodel 5S Requirements 5SGraph 5SL Analysis DL XML Log 5SLGen OO Classes Workflow Design Components Implementation DL Evaluation Test 5SLGen: Automatic DL Generation Requirements (1) 5S Meta Model DL Expert Analysis (2) 5SLGraph DL Designer Practitioner 5SL DL Model component pool ODLSearch, ODLBrowse, ODLRate, ODLReview, ……. Teacher Design (3) Researcher 5SLGen Tailored DL Services Implementation (4) Research Questions 1. Can we formally elaborate 5S? 2. How can we use 5S to formally describe digital libraries? 3. What are the fundamental relationships among the Ss and high-level DL concepts? 4. How can we allow digital librarians to easily express those relationships? 5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties? 6. Where in the life cycle of digital libraries can key aspects of quality be measured and how? Outline • Motivation: the problem – Hypotheses and research questions • Part 1:Theory – 5S: introduction, formal definitions – The formal ontology • Part 2: Tools/Applications – – – – Language Visualization Generation Logging • Part 3: Quality • Conclusions, Future Work 5S and DL formal definitions and compositions (April 2004 TOIS) relation (d. 1) sequence graph (d. 6) (d. 3) measurable(d.12), measure(d.13), probability (d.14), language (d.5) vector (d.15), topological (d.16) spaces sequence tuple (d. 4)* (d. 3) function state (d. 18) event (d.10) (d. 2) 5S grammar (d. 7) streams (d.9) structures (d.10) spaces (d.18) scenarios (d.21) societies (d. 24) services (d.22) structured stream (d.29) digital object (d.30) structural metadata specification (d.25) transmission collection (d. 31) (d.23) repository (d. 33) descriptive metadata specification (d.26) metadata catalog (d.32) (d.34)indexing service hypertext (d.36) browsing service (d.37) digital library (minimal) (d. 38) searching service (d.35) Digital Library Formal Ontology Streams image is_version_of/ cites/links_to contains text describes video audio contains do C Ic Structures ms mss belongs_to describes DM c stores R Measurable is_a Measure employs produces Top employs produces is_a Societies is_a Pr Vec Metric Spaces employs produces inherits_from/includes runs Se extends reuses uses Sc precedes contains happens_before participates_inAc recipient e Scenarios SM association op executes redefines invokes Composition of key infrastructure services universal collection Authoring Digitizing p Describing e doi Cataloguing e e Acquiring p mskj p e p C e de scr Submitting p ibe s DMC Indexing p Ic Linking p Hypertext Composition of additional services Infrastructure Information Satisfaction Services Services (Add_Value) Rating Indexing p Training p Society actor p handle anchor {(doi, acj, rij), I C i I, j } e classCt e e Browsing user model/expr e p Recommending p {dor, r R} Searching p query/category C, {doi, i I} e e e Requesting p e query e e e Filtering Binding p p {dof, f F} biuk e fundamental composite transformer e {doj, j J} e e Visualizing p spj Expanding query p query’ Ontology: Taxonomy of Services Infrastructure Services Repository-Building Creational Preservational Acquiring Authoring Cataloging Crawling (focused) Describing Digitizing Harvesting Submitting Conserving Converting Copying/Replicating Translating (format) Add Value Annotating Classifying Clustering Evaluating Extracting Indexing Linking Logging Measuring Rating Reviewing (peer) Surveying Training (classifier) Translating Visualizing Information Satisfaction Services Binding Browsing Customizing Disseminating Expanding(query) Filtering Recommending Requesting Searching 5SL: a DL Modeling language • Domain specific languages – Address a particular class of problems by offering specific abstractions and notations for the domain at hand – Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. • XML-based realization of 5S – Interoperability – Use of many standard sub-languages (e.g., MIME types, XML Schemas, UML notations) Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel) 5SGen – Version 2: ODL, Services, Scenarios 5SL-Scenario Model (6) DL Designer Component Pool 5SL-Societies Model (1) Java XMI:Class Model (3) ODL Search Wrapping ODL Browse Wrapping import import Xmi2Java (4) Java Classes Model (5) DL Designer StateChart Model (8) XPATH/JDOM Transform (2) . . . Java XPath/JDOM Transform (7) Scenario Synthesis (9) 5SGen Deterministic FSM (10) SMC (11) Java Finite State Machine Class Controller (12) binds Generated DL Services JSP User Interface View (13) The XML Log Format Log Transaction SessionId MachineInfo Timestamp Event StatusInfo Search SearchBy SessionInfo RegisterInfo ErrorInfo Action Browse QueryString Statement Update Collection Catalog StoreSysInfo Timeout PresentationInfo Quality and the Information Life Cycle Active Accura cy Comple ten Conform ess ance Timeliness Similarity Preservability Describing Organizing Indexing Authoring Modifying Semi-Active Pertinence Retention Significance Mining Creation Accessibility Storing Accessing Timeliness Filtering Utilization Archiving Distribution Seeking Discard Inactive Searching Browsing Recommending Relevance Similarity Ac ce ss i bil Networking P res i er v t y ab ilit y Rao Shen’s Preliminary Exam: Hypothesis and Research Questions • The 5S framework provides effective solutions to DL integration. – Formally define the DL integration problem? – Guide integration of domain focused DLs? • • • How to formally model such domain specific DLs? How to integrate formally defined DL models into a union DL model? How to use the union DL model to help design and implement high quality integrated DLs? – Assess the integration? Related Work DL interoperability approach Consists of Intermediary-based Interrelated with mapping-based use mediator wrapper use agent schema mapping used in two architectures Consists of federation Union Archiving use hybrid mapper has an example SemInt composite mapper has an example LSD DL integration formalization based on DL interoperability approach Consists of Intermediary-based Interrelated with mapping-based use mediator wrapper use agent schema mapping used in two architectures Consists of federation Union Archiving use hybrid mapper composite mapper trained by GA Formal Definition of DL Integration • DLi=(Ri, DMi, Servi, Soci), 1 i n – – – – • • • • Ri is a network accessible repository DMi is a set of metadata catalogs for all collections Servi is a set of services Soci is a society UnionRep UnionCat UnionServices UnionSociety Architecture of a Union DL DL1 Union DL DL2 Union Society Society archaeologists Service Searching Society Archaeologists General Public General Public Union Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization Service Browsing Catalog1 Union Catalog Catalog2 Repository1 Union Repository Repository2 Example of Union Service: CitiViz CitiViz: A Visual User Interface to the CITIDEL System ECDL 2004, Bath, England, September 2004 Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and Edward A. Fox [email protected] http://fox.cs.vt.edu A Minimal DL in the 5S Framework Streams Structured Stream Structures Spaces Structural Metadata Specification Scenarios Societies services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Collection Metadata Catalog Repository Minimal DL A Minimal ArchDL in the 5S Framework Streams Structures Structured Stream Spaces Descriptive Metadata specification Scenarios Societies services SpaTemOrg StraDia Arch Descriptive Metadata specification ArchObj indexing browsing searching hypertext ArchDO Arch Metadata catalog ArchColl ArchDColl ArchDR Minimal ArchDL ArchDL Expert Scenario Sub-model ETANA-DL Union Services Descriptions 5S Archaeology MetaModel Structure Sub-model VN Metadata Format HD Catalog Mapping Tool Wrapper4VN Wrapper4HD Inverted Files Search Service XOAI Browse DB Browse Service Component Pool Services DB 5SGen Other XOAI ETANA-DL Services Web Interface Union Catalog Browsing … HD Metadata Format ETANA-DL Metadata Format VN Catalog Harvesting Mapping Searching Browsing … ArchDL Designer 5SGraph Computing and Information Technology Interactive Digital Educational Library (CITIDEL) • Domain: computing / information technology • Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … • Submission & Collection: sub/partner collections www.citidel.org www.CITIDEL.org • Led by Virginia Tech, with co-PIs: – Fox (director, DL systems) – Lee (history) – Perez (user interface, Spanish support) – Students: Ryan Richardson, Kate McDevitt, Jon Pryor, Baoping Zhang • Partners – College of New Jersey (Knox) – Hofstra (Impagliazzo) – Villanova (Cassel) – Penn State (Giles) Digital library architecture for local and interoperable CITIDEL services EDUCATORS Multilingual Searching LEARNERS Browsing Union Metadata Filtering Filtering Profiles OAI Data Provider Annotating ADMINISTRATORS Revising Administering User Profiles Annotations OAI Data Harvester Remote and Peer Digital Libraries (eg. NSDL -CIS) PORTALS SERVICES REPOSITORIES CITIDEL Technology Features •Component architecture (Open Digital Library) •Re-use and compose re-deployable digital library components. •Built Using Open Standards & Technologies •OAI: Used to collect DL Resources and DL Interoperability •XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) •Perl: Component Integration •ESSEX: Search Engine Functionality •Very fast, utilizing in-memory processing •Includes snap-shots for persistence •Multi-scheming (Aaron Krowne, now at Emory U. Library) •Integrates multiple classifications / views through maps, closure •Extensions: clustering, visualization, personalization, … Cluster Search Results from CITIDEL Cluster NDLTD-Computing Naren Ramakrishnan and Saverio Perugini (U. Dayton) CITIDEL + PIPE • Adds Interaction Personalization to CITIDEL •Automatically handles multi-modal conversion to Cell phone, PDA, Etc. •Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained. OCKHAM Library Network (NSDL) NSDL Services NSDL OCKHAM Library Network OCKHAM Services Library Services Teachers Learners Librarians OCKHAM (Ming Luo) • Simplicity (a la OCCAM’s razor) • Support by Mellon and DLF • Four main ideas: 1. Components 2. Lightweight protocols 3. Open reference models (e.g., 5S, OAIS) 4. Community perspective and involvement • Funded by NSF in NSDL, with P2P, with Emory, Notre Dame, Oregon State, … OCKHAM Proposed Services • • • • • • • • Alerting Browsing Cataloging Conversion OAI – Z39.50 Pathfinding Registry (plus others such as from adapted ODL) A Digital Library Case Study • Domain: graduate education, research • Genre:ETDs=electronic theses & dissertations • Submission: http://etd.vt.edu • Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org (supported by Ming Luo) OCLC SRU Interface => Dr. A.K. Tyagi ETD Union Search Mirror Site in China (CALIS) (http://ndltd.calis.edu.cn – popular site!) LOCKSS Extensions: Bing Liu, Xiaoyu Zhang, Ji-Sun Kim • • • • • Lots of copies keep stuff safe Stanford (Vicky Reich) Initial focus on lower levels, journals Shift to OAI, esp. for ETDs Collab with Emory (Martin Halbert) – NDIIP: AmericanSouth, MetaArchive – Help deploy and adapt, apply in other contexts • Another registry • Set of publisher manifests (information providers) • Set of storage systems (archival storage) Hussein Suleman (Capetown, S. Africa) Document Document Document 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 XPMH OA OA XPMH XPMH OA OA OA XPMH XPMH XPMH PMH XPMH OA XPMH XPMH XPMH OA OA OA XPMH open digital library PMH Program Program Program 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 Image Image Image 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 Video Video Video 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 Example Open Digital Library Document Document ETD-1 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 ODLRecent Recent USER INTERFACE ODLUnion PMH Filter PMH ODLUnion Browse Union PMH ODLBrowse ODLUnion PMH Filter PMH Search ODLSearch Students and researchers Program Program ETD-2 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 ETD DL for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Image Image ETD-3 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 Video Video ETD-4 1010100101 1010100101 0100101010 1010100101 0100101010 1001010101 0100101010 1001010101 0101010101 1001010101 0101010101 0101010101 ETD collections Open Digital Library Deployments • NDLTD (www.ndltd.org) • Computer Science Teaching Center (www.cstc.org) • Computing and Information Technology Interactive Digital Educational Library (www.citidel.org) • Open Archives Distributed (NSF, DFG) – enhancements to PhysNet • OCKHAM • Open to others through DL-in-a-box Interest-based User Grouping Model for Collaborative Filtering in Digital Libraries 7th ICADL 2004 Shanghai, P.R. China Dec. 15, 2004 Edward A. Fox, Seonho Kim Virginia Tech, Blacksburg, VA 24061 USA Some Other Students/Projects • Wensi Xi: Matrices, reinforcement, clusters (Microsoft) • Paul Mather: mod/sim of large DLs on clusters; characterization: uses, files (NASA) • Ming Luo: personalization aided by demographics • Ryan Richarson: CLIR with concept maps • Xiaoyan Yu: Stepping Stones and Pathways (NSF, Fernando Das Neves completed & returned to Argentina) • Baoping Zhang: Physics and classification (NSF, DFG) • Several: TREC with GP • New projects: – Superimposed information w. PSU (NSF NSDL) – Quality and metasearch and structure w. Emory (IMLS) • … Conclusion • • • • • Many DL/IR: areas, projects, students Theory Architecture Modeling and simulation Systems development and testing to: validate above, demonstrate innovations • Users, interfaces, visualization, usability
© Copyright 2026 Paperzz