bby220-Lecture03-bilgi-erisim-sistemleri-2

Bilgi Erişim: Temel
Kavramlar
Yaşar Tonta
Hacettepe Üniversitesi
[email protected]
yunus.hacettepe.edu.tr/~tonta/
DOK324/BBY220 Bilgi Erişim İlkeleri
AB 2005, Gaziantep
2-4 Şubat 2005 - 1
Plan
•
•
•
•
•
•
Bilgi tanımı
Belge tanımı
Bilgi erişim sistemlerinin mantıksal yapısı
Temel kavramlar
Erişim kuralları
Performans ölçümleri
AB 2005, Gaziantep
2-4 Şubat 2005 - 2
Felsefede Bilgi (Knowledge)
• Bilgi
– Bilme etkinliği
– Bu etkinlik sonucu elde edilen çıktı
• Bilgi etkinlikleri
–
–
–
–
–
–
–
–
algılama
anlama
düşünme
muhakeme etme
yorumlama
açıklama
doğrulama
değerlendirme
Kaynak: Kuçuradi, 1995, s. 97
AB 2005, Gaziantep
2-4 Şubat 2005 - 3
Bilgi Araştırmalarında Bilgi (Information)
• Süreç olarak bilgi (information-asprocess)
• Bilgi olarak bilgi (information-asknowledge)
• Nesne olarak bilgi (information-asobject)
AB 2005, Gaziantep
2-4 Şubat 2005 - 4
Bilgiye Farklı Bakış Açıları
SOYUT
SOMUT
Bilgi olarak bilgi
VARLIK
Bilgi (knowledge)
Nesne olarak bilgi
Veri, belge, kayıtlı bilgi
Süreç olarak bilgi
SÜREÇ Bilgilenme
Bilgi işleme, veri işleme, belge
işleme, bilgi mühendisliği
Kaynak: Buckland, 1991, s. 6
AB 2005, Gaziantep
2-4 Şubat 2005 - 5
Belge
• docere: öğretmek, bilgilendirmek
• –ment: araçlar
• “bir fiziksel ya da entellektüel olguyu temsil
etmek, yeniden yaratmak ya da ispatlamak için
korunan ya da kaydedilen tüm somut ve
sembolik dizinsel işaretler” (Suzanne Briet)
• Belge örnekleri: kil tablet, yontu, papirüs, harita,
yazma, kitap, dergi, resim, film, kaset, CD-ROM,
DVD, Web sayfası, dijital belgeler, vs.
AB 2005, Gaziantep
2-4 Şubat 2005 - 6
Farklı Disiplinlerde Belge
• Belge: biçim + işaret + ortam
• Biçim:
– Hattatlar, müzik ve sinema yapımcıları, örüntü
tanıma uzmanları, kütüphaneciler, arşivciler,
müzeciler
• İşaret:
– Dilbilimciler, bilgisayarcılar, yapay zeka uzmanları
• Ortam:
– Arşivciler, tarihçiler, hukukçular, diplomatik
bilimciler, yayıncılar, kütüphaneciler, vd.
AB 2005, Gaziantep
2-4 Şubat 2005 - 7
Bilgi Yönetimi (Information Management)
• her türlü örgütün etkin olarak işletilmesiyle
ilgili bilginin sağlanması, düzenlenmesi,
denetimi, yayımı ve kullanımına yönetim
ilkelerinin uygulanması
• “doğru karar vermek için doğru formda, doğru
kişiye, doğru maliyetle, doğru zamanda,
doğru yerde, doğru bilgiyi sağlamak”
AB 2005, Gaziantep
2-4 Şubat 2005 - 8
Bilgi Yönetimi (Knowledge Management)
• bir örgütün misyonunu gerçekleştirmesi için
örgütün entellektüel sermayesinin kullanımına
dayanan bir yönetim uygulaması
• Entellektüel sermaye: örgüt çalışanlarının
geliştirdiği ya da biriktirdiği deneyim, hizmet
ve ürünlerden sağlanan bilgi (knowledge).
• Bilgi (knowledge):
– Belirtik (nesne olarak bilgi)
– Örtük (bilgi olarak bilgi)
AB 2005, Gaziantep
2-4 Şubat 2005 - 9
Bilgi Yöneticisi Neyi Yönetir?
• İnsan beyninde saklı örtük bilgileri mi?
• Üzerinde bilgi taşıdığı varsayılan nesneleri (belgeleri) mi?
• Yoksa her ikisini de mi?
– Kütüphanecilik
– Arşivcilik
– Dokümantasyon - Belge yönetimi – Kayıt yönetimi - İdari
dokümantasyon (records management, document management)
– Veri yönetimi, Bilgi kaynakları yönetimi, Bilgi teknolojisi yönetimi
– Bilgibilim, bilgi araştırmaları
– Bilgi yönetimi (üzerinde bilgi taşıyan belgelerin yönetimi)
AB 2005, Gaziantep
2-4 Şubat 2005 - 10
Bilgi Yönetimi (Information Management)
• Belgelerin sağlanması, düzenlenmesi,
yaşatılması, kullanımı, korunması,
arşivlenmesi
• Kullanıcıların bilgi gereksinimlerinin
saptanması ve karşılanması
• Bilgi sistemlerinin tasarlanması, kurulması
ve işletilmesi
• Bilgi teknolojisi yönetimi
AB 2005, Gaziantep
2-4 Şubat 2005 - 11
Bilgi Erişim
• “bilgi toplama, sınıflama,
kataloglama, depolama, büyük
miktardaki verilerden arama
yapma ve bu verilerden
istenen bilgiyi üretme (veya
gösterme) tekniği ve süreci”
AB 2005, Gaziantep
2-4 Şubat 2005 - 12
Bilgi Erişimin Temel İkilemi
• “Hakkında bilgi bulmak
için bilmediğin bir şeyi
tanımlama gereği”
(Hjerrpe)
AB 2005, Gaziantep
2-4 Şubat 2005 - 13
Bilgi Keşfetme, Tanımlama, Düzenleme ve Erişim
Keşfetme
Keşfetme
Tanımlama
Tanımlama
Düzenleme
Düzenleme
Erişim
AB 2005, Gaziantep
Erişim
2-4 Şubat 2005 - 14
Belge Erişim Sisteminin Mantıksal Düzenlemesi
Belgeler
Kullanıcılar
Dizinleme
Gömü Sözlük
Sorgu
formülasyonu
Dizin
tutanakları
Erişim
kuralı
Formel sorgu
cümlesi
Kaynak: Maron, 1984
AB 2005, Gaziantep
2-4 Şubat 2005 - 15
İdeal Bilgi Erişim Sistemi
• İlgili belgelerin tümüne ve salt ilgili
belgelere erişim sağlamalı
• “İlgililik” kavramı
– Nesnel ilgililik
– Öznel ilgililik
• Birbirine benzeyen bilgileri bir araya
getirmek, benzemeyenleri ayırmak
AB 2005, Gaziantep
2-4 Şubat 2005 - 16
Background Concepts for IR
• User Information Needs
• Controlled Vocabularies (Pre and Postcoordination)
• Indexing Languages
• IR definitions and concepts
–
–
–
–
–
Documents
Queries
Collections
Evaluation
Relevance
AB 2005, Gaziantep
2-4 Şubat 2005 - 17
User Information Need
• Why build IR systems at all?
• People have different and highly varied
needs for information
• People often do not know what they want,
or may not be able to express it in a
usable form
– Boulding’s “Image”
• How to satisfy these user needs for
information?
AB 2005, Gaziantep
2-4 Şubat 2005 - 18
Controlled Vocabularies
• Vocabulary control is the attempt to provide a
standardized and consistent set of terms (such
as subject headings, names, classifications, etc.)
with the intent of aiding the searcher in finding
information.
• Controlled vocabularies are a kind of metadata:
– Data about data
– Information about information
AB 2005, Gaziantep
2-4 Şubat 2005 - 19
Pre- and Postcoordination
• Precoordination relies on the indexer
(librarian, etc.) to construct some
adequate representation of the meaning of
a document.
• Postcoordination relies on the user or
searcher to combine more atomic
concepts in the attempt to describe the
documents that would be considered
relevant.
AB 2005, Gaziantep
2-4 Şubat 2005 - 20
Structure of an IR System
Search
Line
Interest profiles
& Queries
Formulating query in
terms of
descriptors
Information Storage and Retrieval System
Rules of the game =
Rules for subject indexing +
Thesaurus (which consists of
Lead-In
Vocabulary
and
Indexing
Language
Storage of
profiles
Store1: Profiles/
Search requests
Indexing
(Descriptive and
Subject)
Storage of
Documents
Comparison/
Matching
Potentially
Relevant
Documents
AB 2005, Gaziantep
Documents
& data
Store2: Document
representations
Adapted from Soergel, p. 19
2-4 Şubat 2005 - 21
Storage
Line
Uses of Controlled Vocabularies
• Library Subject Headings, Classification
and Authority Files.
• Commercial Journal Indexing Services
and databases
• Yahoo, and other Web classification
schemes
• Online and Manual Systems within
organizations
– SunSolve
– MacArthur
AB 2005, Gaziantep
2-4 Şubat 2005 - 22
Types of Indexing Languages
• Uncontrolled Keyword Indexing
• Indexing Languages
– Controlled, but not structured
• Thesauri
– Controlled and Structured
• Classification Systems
– Controlled, Structured, and Coded
• Faceted Classification Systems
AB 2005, Gaziantep
2-4 Şubat 2005 - 23
Thesauri
• A Thesaurus is a collection of selected
vocabulary (preferred terms or descriptors)
with links among Synonymous, Equivalent,
Broader, Narrower and other Related
Terms
AB 2005, Gaziantep
2-4 Şubat 2005 - 24
Thesauri (cont.)
• National and International Standards for
Thesauri
– ANSI/NISO z39.19--1994 -- American National Standard
Guidelines for the Construction, Format and Management of
Monolingual Thesauri
– ANSI/NISO Draft Standard Z39.4-199x -- American National
Standard Guidelines for Indexes in Information Retrieval
– ISO 2788 -- Documentation -- Guidelines for the establishment
and development of monolingual thesauri
– ISO 5964-- Documentation -- Guidelines for the establishment
and development of multilingual thesauri
AB 2005, Gaziantep
2-4 Şubat 2005 - 25
Development of a Thesaurus
• Term Selection.
• Merging and Development of Concept
Classes.
• Definition of Broad Subject Fields and
Subfields.
• Development of Classificatory structure
• Review, Testing, Application, Revision.
AB 2005, Gaziantep
2-4 Şubat 2005 - 26
Categorization Summary
• Processes of categorization underlie many of the
issues having to do with information organization
• Categorization is messier than our computer
systems would like
• Human categories have graded membership,
consisting of family resemblances.
• Family resemblance is expressed in part by
which subset of features are shared
• It is also determined by underlying
understandings of the world that do not get
represented in most systems
AB 2005, Gaziantep
2-4 Şubat 2005 - 27
Classification Systems
• A classification system is an indexing
language often based on a broad ordering
of topical areas. Thesauri and
classification systems both use this broad
ordering and maintain a structure of
broader, narrower, and related topics.
Classification schemes commonly use a
coded notation for representing a topic
and it’s place in relation to other terms.
AB 2005, Gaziantep
2-4 Şubat 2005 - 28
Classification Systems (cont.)
• Examples:
– The Library of Congress Classification System
– The Dewey Decimal Classification System
– The ACM Computing Reviews Categories
– The American Mathematical Society
Classification System
AB 2005, Gaziantep
2-4 Şubat 2005 - 29
Central Concepts in IR
•
•
•
•
•
Documents
Queries
Collections
Evaluation
Relevance
AB 2005, Gaziantep
2-4 Şubat 2005 - 30
Documents
• What do we mean by a document?
– Full document?
– Document surrogates?
– Pages?
• Buckland “What is a Document”, “What is a
‘Digital Document’”
• Are IR systems better called Document Retrieval
systems?
• A document is a representation of some
aggregation of information, treated as a unit.
AB 2005, Gaziantep
2-4 Şubat 2005 - 31
Collection
• A collection is some physical or logical
aggregation of documents
– A database
– A Library
– An index?
– Others?
AB 2005, Gaziantep
2-4 Şubat 2005 - 32
Queries
• A query is some expression of a user’s
information needs
• Can take many forms
– Natural language description of need
– Formal query in a query language
• Queries may not be accurate expressions of the
information need
– Differences between conversation with a person and
formal query expression
AB 2005, Gaziantep
2-4 Şubat 2005 - 33
Evaluation
• Why Evaluate?
• What to Evaluate?
• How to Evaluate?
AB 2005, Gaziantep
2-4 Şubat 2005 - 34
Why Evaluate?
• Determine if the system is desirable
• Make comparative assessments
• Others?
AB 2005, Gaziantep
2-4 Şubat 2005 - 35
What to Evaluate?
• How much of the information need is
satisfied.
• How much was learned about a topic.
• Incidental learning:
– How much was learned about the collection.
– How much was learned about other topics.
• How inviting the system is.
AB 2005, Gaziantep
2-4 Şubat 2005 - 36
What to Evaluate?
What can be measured that reflects users’ ability
to use system? (Cleverdon 66)
effectiveness
–
–
–
–
–
Coverage of Information
Form of Presentation
Effort required/Ease of Use
Time and Space Efficiency
Recall
• proportion of relevant material actually retrieved
– Precision
• proportion of retrieved material actually relevant
AB 2005, Gaziantep
2-4 Şubat 2005 - 37
Relevance
• In what ways can a document be
relevant to a query?
– Answer precise question precisely.
– Partially answer question.
– Suggest a source for more information.
– Give background information.
– Remind the user of other knowledge.
– Others ...
AB 2005, Gaziantep
2-4 Şubat 2005 - 38
Relevance
• “Intuitively, we understand quite well what
relevance means. It is a primitive ‘y’ know’
concept, as is information for which we
hardly need a definition. … if and when
any productive contact [in communication]
is desired, consciously or not, we involve
and use this intuitive notion or relevance.”
» Saracevic, 1975 p. 324
AB 2005, Gaziantep
2-4 Şubat 2005 - 39
Relevance
• How relevant is the document
– for this user, for this information need.
• Subjective, but
• Measurable to some extent
– How often do people agree a document is relevant to
a query?
• How well does it answer the question?
– Complete answer? Partial?
– Background Information?
– Hints for further exploration?
AB 2005, Gaziantep
2-4 Şubat 2005 - 40
Relevance Research and Thought
• Review to 1975 by Saracevic
• Reconsideration of user-centered
relevance by Schamber, Eisenberg and
Nilan, 1990
• Special Issue of JASIS on relevance (April
1994, 45(3))
AB 2005, Gaziantep
2-4 Şubat 2005 - 41
Saracevic
• Relevance is considered as a measure of
effectiveness of the contact between a source
and a destination in a communications process
–
–
–
–
–
–
Systems view
Destinations view
Subject Literature view
Subject Knowledge view
Pertinence
Pragmatic view
AB 2005, Gaziantep
2-4 Şubat 2005 - 42
Define your own relevance
• Relevance is the (A) gage of relevance of
an (B) aspect of relevance existing
between an (C) object judged and a (D)
frame of reference as judged by an (E)
assessor
• Where…
From Saracevic, 1975 and Schamber 1990
AB 2005, Gaziantep
2-4 Şubat 2005 - 43
A. Gages
•
•
•
•
•
•
•
Measure
Degree
Extent
Judgement
Estimate
Appraisal
Relation
AB 2005, Gaziantep
2-4 Şubat 2005 - 44
B. Aspect
•
•
•
•
•
•
•
Utility
Matching
Informativeness
Satisfaction
Appropriateness
Usefulness
Correspondence
AB 2005, Gaziantep
2-4 Şubat 2005 - 45
C. Object judged
•
•
•
•
•
•
•
Document
Document representation
Reference
Textual form
Information provided
Fact
Article
AB 2005, Gaziantep
2-4 Şubat 2005 - 46
D. Frame of reference
•
•
•
•
•
•
•
Question
Question representation
Research stage
Information need
Information used
Point of view
request
AB 2005, Gaziantep
2-4 Şubat 2005 - 47
E. Assessor
•
•
•
•
•
•
•
Requester
Intermediary
Expert
User
Person
Judge
Information specialist
AB 2005, Gaziantep
2-4 Şubat 2005 - 48
Schamber, Eisenberg and Nilan
• “Relevance is the measure of retrieval
performance in all information systems,
including full-text, multimedia, questionanswering, database management and
knowledge-based systems.”
• Systems-oriented relevance: Topicality
• User-Oriented relevance
• Relevance as a multi-dimensional concept
AB 2005, Gaziantep
2-4 Şubat 2005 - 49
Schamber, et al. Conclusions
• “Relevance is a multidimensional concept whose
meaning is largely dependent on users’
perceptions of information and their own
information need situations
• Relevance is a dynamic concept that depends
on users’ judgements of the quality of the
relationship between information and information
need at a certain point in time.
• Relevance is a complex but systematic and
measureable concept if approached
conceptually and operationally from the user’s
perspective.”
AB 2005, Gaziantep
2-4 Şubat 2005 - 50
Froehlich
• Centrality and inadequacy of Topicality as
the basis for relevance
• Suggestions for a synthesis of views
AB 2005, Gaziantep
2-4 Şubat 2005 - 51
Janes’ View of Relevance
Satisfaction
Topicality
Relevance
Utility
Pertinence
AB 2005, Gaziantep
2-4 Şubat 2005 - 52