Introduction to Digital Libraries Week 1: (Digital) Libraries Defined Old Dominion University Department of Computer Science CS 695 Fall 2003 Michael L. Nelson <[email protected]> 09/26/03 ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Overview As much as possible, this class will be handled electronically http://www.cs.odu.edu/~mln/teaching/cs695-f03/ http://list.odu.edu/listinfo/cs695-f03/ Contact me anytime: [email protected] 683 6393 ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Objective An interactive class digital libraries is (still) too immature a subject for traditional lecture style classes The students should gain insight and overview of the DL field ideas for areas of DL research and development preparation for the development, use, application or management of DLs to their work environment ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Michael Lesk, Practical Digital Libraries: Books, Bytes & Bucks very highly recommended book!!! it is possible to get through the class without it, but it is an excellent addition to your collection Frakes & Baeza-Yates, Information Retrieval: Data Structures & Algorithms will be used in lecture #3. not required, but a very nice book with real code Baeza-Yates & Ribeiro-Neto, Modern Information Retrieval a follow-on to the above; most likely the book to be used in Dr. Bollen’s IR class in the spring ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Caveats I’m a computer scientist no traditional library experience I’m an information radical “information wants to be free” Projects I’ve been involved with will receive preferential treatment ;-) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Vannevar Bush (1890-1974) Director of the Office of Scientific Research and Development lead 6000 scientists in R&D for WWII previously, science lacked large scale teams also director of NACA (1939)! Predicted many technological advances the “memex” is one whose spirit we are implementing the purpose was to provide scientists the capability to exchange information; to have access to the totality of recorded information ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Memex Integrated computer, keyboard, and desk “mechanized private file and library” remove drudgery from information retrieval suggested implementation was microfilm various user operations are suggested Associative indexing was the main purpose “the process of tying two items together is the important thing” prelude to hypertext... ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Memex Information could come pre-associatively indexed, but the key point was user customization WWW still does not provide that today Bush observes that tools change our way of doing, and expand the horizons before us full impact of WWW and DLs still not known Interesting: Bush’s AM article did not predict freetext searching... knowledge trails only; Yahoo minus keyword searching ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] from Lesk, http://community.bellcore.com/lesk/columbia/session2/ ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] from Lesk, http://community.bellcore.com/lesk/columbia/session1/ ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] What is a Digital Library (DL)? “a collection of information that is both digitized and organized” (Lesk, p. 1) there are any number of alternate definitions, but this seems fair enough no mention of architecture, implementation, content, etc. ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] from: http://community.bellcore.com/lesk/libbuzz.gif M. Lesk I m unaware of new data, but digitallibrary seems to have totally replaced electroniclibrary -- MLN ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] How is a DL different from a database? A traditional SQL database has as its basic element data items in a relation: select name from employee, project where employee.deptnumber = “25” AND project.number = “100” databases exploit known structures and relations DBMS retrieval is not probabilistic (Frakes, Baeza-Yates, p. 3) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] How is a DL different from traditional IR systems? The difference is less clear IR systems can be considered the precursors to DLs The basic unit of a IR system is a document and the focus is on textual retrieval exact matching - Boolean, text pattern searching inexact matching - probabilistic, vector space, clustering ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] How is a DL different from traditional IR systems? We will consider DLs to be a superset of IR systems Nomenclature change partly due to change in implementation technology IR -> DL generally coincides with the spread of WWW Typically, IRs provide metadata access only, where DLs access to metadata + data ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] How is a DL different from the WWW? The keyword is organization The WWW as a whole has no real organization Some meta searchers (Yahoo, Lycos) attempt to add an organizational framework to their web holdings However, most are focused on keyword searching (i.e., Altavista) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] How is a DL different from the WWW? Another key difference is who controls the input into the system most meta searchers hunt down their holdings Lycos is short for Lycosidae lycosa (the wolf spider ), which pursues its prey and does not build a web (Mauldin, IEEE Expert, 1/97) some (Yahoo) have humans in the loop for review and classification To date, DLs are generally more tightly controlled, and have a targeted customer set ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] DL = Content + Services digital library = collection of information both digitized and organized -- M. Lesk, 1997 non-WWW Access WWW (http) Access (most common) (now uncommo n) Digital Library Services (searching, browsing, citation anlaysis usage analysis, alerts) Vector and/or Boolean Search Engin es RDBMS File Sys tems (traditional IR) Content Other Techno logies Why not just use the WWW? WWW by itself has low archival & management characteristics Why not use a RDBMS? In the same way that a card catalog is not a TL, a RDBMS is candidate technology for use in DLs DL is the union of the content and services defined on the content ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] How is a DL Different from a Traditional Library? TL has as its focus physical objects even if the card catalog (metadata) is electronic, the purpose is to point you to a physical location trafficking in physical objects has both obvious and subtle implications object can exist only in 1 place if you have it, I can t have it (zero-sum distribution) I have to go to the object, or wait for it to come to me ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] TLs vs. DLs DLs clearly better than TLs at: Dissemination, storing information variety However, TL objects are more survivable Who will archive the research information? the publishers? the institutions? the authors? Will the average DL object still be accessible in 10 years? ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] How is a DL Different from a Traditional Library? Digital Library removing the physical restriction has obvious benefits multiple access, multiple listings, electronic transmission also complicates many other issues... intellectual property, terms and conditions, etc. Note that a TL offers additional social and educational benefits Most TLs also offer hybrid services too. ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] from Lesk, http://community.bellcore.com/lesk/columbia/session1/ ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] TLs vs. DLs Where does publishing stop, and libraries begin? there has always been tensions between TLs and traditional publishers, but the roles were fairly well defined DLs can muddle the separation of these responsibilities result: conflict, and/or new models ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Public and Special Libraries A service like Yahoo is more like a public library accession is still controlled by the staff the customer set is the general public the holdings are broadly scoped Most DLs that we will study are more like Special Libraries customer set is small and focused holdings are narrowly scoped accession is perhaps even more important ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Public and Special Libraries Special Libraries traditional - NASA LaRC Technical Library digital - NASA Technical Report Server Public Libraries traditional Norfolk Public Library digital - Yahoo When in doubt, apply the “Popular Mechanics” test... ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Scientific and Technical Information (STI) Unless otherwise specified, a DL in the context of this class will be short for “Special DL” Specifically, we will investigate DLs that serve STI STI uses generally represent the vanguard of application of information technology but almost never the commercial driver examples: Internet, Mosaic, etc. ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] STI DLs The technology used for STI DLs today will soon be used in a broader, more general interest applications So while limiting our discussion to STI DLs is generally helpful, we must remember that STI DLs are a subset of the application of DL technology ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] DL Economic Drivers from Lesk, http://community.bellcore.com/lesk/columbia/session1/ ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] DL Economic Drivers from Lesk, http://community.bellcore.com/lesk/columbia/session3/ ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] DL Economic Drivers from Lesk, http://community.bellcore.com/lesk/columbia/session1/ ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] What is STI? STI is the collection of materials, independent of format,used in research, development, and other technical activities reports, data sets, images, videos, software, etc. It is also the output of such R&D activities STI includes both white and grey literature ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] White and Grey Literature The line between the two is not always clear Grey Net offers an admittedly obsolete definition of grey literature: that type of publication unavailable through normal book-selling channels, often produced in small quantities with limited distribution, promotion, and exploitation http://www.greynet.org/ (no longer supported by MCB as of late 2000) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] White and Grey Literature Grey Net also admits that electronic publishing has changed this definition, and a suitable replacement is still under debate Intuitively though: White: author and publisher are often different, the work has been independently reviewed, obtaining the work is straightforward Grey: may not be reviewed, often publishedfrom the source origin, may be difficult to obtain ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Literature Examples White Journals, books, edited conference proceedings, etc. Grey technical reports, government reports, unedited proceedings, non-document STI, etc. ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] So Why Worry About Grey Literature? While White is generally perceived as having a higher pedigree, easier to obtain (in a sense), etc., it is generally less timely and is often a summary or abstract of a larger body of work Some technologies can become obsolete in the time it takes to move from Grey to White NASA LaRC wind tunnel example ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Pyramid of STI Journal Articles Conference Papers time Technical Reports software raw data notes video / images Figure 2: Pyramid of Publications Rests on Unpublished STI ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] History of STI Distribution Originally, scientists published books to document their findings but the delay was terribly long Then, scientists exchanged personal letters among themselves for rapidity but this is point-to-point communication, not broadcast ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] History of STI Distribution The current system of journals evolved in the 17th century as the synthesis of both previous models more timely than books, more available than letters in fact, some journals with the emphasis on “speed” still have “Letters” in their title historical information from (Odlyzko, 1995) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] But Are Journals Still Relevant? People still publish in them (tenure and promotions are still largely “count the journal publications” exercises) But do people read them? The current use of journals is now: “a medium for priority claiming, quality control, and archiving scientific work” (Bennion, 1994) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Unavailable, or Not Worth Citing? from Lesk, http://community.bellcore.com/lesk/columbia/session13/ figure 9.7 in text ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] But Are Journals Still Relevant? How important is refereeing anyway? Most rejected papers end up published somewhere else (Lesk, p. 214) Referees have rejected many worthy papers, including some that are the most cited in their respective journals (Campanario, 1996) this is another well studied problem, contact me for more details ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] But Are Journals Still Relevant? Different disciplines have adapted: physics - “the small amount of filtering provided by refereed journals plays no effective role in our research” (Ginsparg, 1994) math - “it is rare for experts in any mathematical subject to learn of a major new development in their area through a journal publication” (Odlyzko, 1995) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] But Are Journals Still Relevant? computer science “in his area, journals have become irrelevant” (Odlyzko, quoting Rob Pike) “if it did not happen at a conference, it didn’t happen” (Odlyzko, quoting Joan Feigenbaum) “if I read it in a journal, I’m not in the loop” (Grycz, 1992) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Solutions by Discipline Physics pre-prints Mathematics pre-prints Computer Science technical reports, conference proceedings Chemistry still mainly journals, but review is cursory (Quinn, 1995) Economics working papers ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Journal System - Economic Problems 20,000 primary research journals (Bennion, 1994) the number of scientific papers published annually doubles every 10-15 years (Price, 1956) STI does not enjoy economies of scale intended audiences are generally static; the content becomes more specialized (Odlyzko, 1995) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Journal System - Economic Problems Because of the academic pressures, journals tend to stay the same size, but the number of titles goes up (Quandt, 1996) The acquisition budget of a library is constant (or decreasing), so it must be more selective in which titles it provides If libraries cancel subscriptions, the cost to the remainder of the subscribers goes up ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Journal System - Economic Problems The rising cost causes other libraries to cancel subscriptions, causing the price to go up further... Journals driving themselves out of business is a well studied problem - contact me for more information Odlyzko estimates that: American universities spend as much buying mathematics journals as the NSF spends doing mathematical research ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Journal System - Economic Problems Chemical Abstracts (Lesk, pp. 203-204) begun in 1950s, used to cost dozens of dollars per year, and invidual chemists subscribed today, it costs $17,400 / year. Okerson & Stubbs, 1992 university book purchases down 15% 1986-1991 journals/faculty 14 -> 12 in same period by year 2017, libraries would buy nothing at all! ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] from Lesk, http://community.bellcore.com/lesk/columbia/session1/ figure 9.2 in text ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Journal System - Coverage Problems But journals only cover a fraction of available STI approximately 100K domestic, unrestricted STI technical reports (grey literature) produced annually (Esler & Nelson, 1998) Print journals, by definition, cannot provide access to non-report STI software, datasets, etc. ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Electronic Journals? An experiment that most scholars agree is good, is the eventual path, and is a great idea for everyone else’s papers... until tenure is given based on publications in electronic journals, they will not be fully accepted ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Electronic Journals from McEldowney, http://poe.acc.virginia.edu/~pm9k/libsci/charts.html ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] But How Much is STI? from McEldowney, http://poe.acc.virginia.edu/~pm9k/libsci/charts.html no data for electronic journals is given, but it seems likely that it follows a similar distribution ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Many DL Projects Are “Journal Centric” Many DL projects (JSTOR, TULIP, etc.) are focused on automating the traditional journal methods this is acceptable for archiving past issues, but seems unsatisfying for future STI ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] My Prediction for Journals Highly specialized titles will go completely electronic, driven by the rising cost and static readership economics and academic acceptance will determine when this happens Popular titles with broader appeal will exist in a hybrid format, both paper and electronic version subscribersare likely to receive the value added material (soft copy, additional materials, etc.) ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected] Common Shortcomings of Current DLs Focused on journals, despite their decreasing to some fields Inadequate treatment of grey literature, the grist of technical exchange Non-document STI (software, datasets, etc.) not handled ODU CS CS 695 Fall 2003 Michael L. Nelson [email protected]
© Copyright 2026 Paperzz