Using Literary Warrant to Define a Version of the DDC

Using Literary Warrant to Define a
Version of the DDC for Automated
Classification Services
OCLC Online Computer Library Center
Diane Vizine-Goetz
Research Scientist, OCLC Research
Julianne Beall
Assistant Editor, DDC
ISKO Conference
London, 13-16 July 2004
© 2004 OCLC Online Computer
Library Center, Inc.
Exploratory Study
Defining a version of the DDC
– To facilitate automatic assignment of
DDC numbers to electronic
documents
– Based on literary warrant for topics
in electronic resources
2
DDC for Automated Classification
Machine classification service
– A database of concepts used to classify a
document
– Software that generates a prioritized list
of concepts that characterize the content
of the document (Scorpion)
3
Checking Literary Warrant
Primary source for checking
literary warrant: BUBL
– Ca. 12,000 Internet resources
Canadian Information By Subject
– Ca. 10,000 Internet resources
KidsClick!
– Ca. 6,400 Internet resources
4
http://bubl.ac.uk/link/ddc.html
5
BUBL Site Statistics
Dewey
Class
Number
of sites
Site
Status ok
US
Sites
UK
Sites
500
510
135
167
103
36
59
65
27
43
520
530
186
139
133
111
84
68
25
20
540
118
82
38
33
550
247
196
127
30
Total
992
761
441
178
6
http://www.nlc-bnc.ca/caninfo/esub.htm
7
http://sunsite.berkeley.edu/KidsClick!/dewey.html
8
Defining a Version of the DDC
Starting point: classification numbers in
Abridged Edition 14
True abridgment: the truncated number for a
topic is always the same as the full number
for the topic, except shorter, e.g.:
– 551.64 Forecasting and forecasts of
specific phenomena
• Cut back to 551.6 Climatology and
weather
9
Database Record
Class number
Caption
Superordinate hierarchy
Notes that describe what is found in a class
Relative Index entries
Mapped terminology
10
Keywords from 551.64 Added to 551.6;
551.64 Deleted
Class-here note: methods of forecasting
specific phenomena specific areas
Relative Index entries, e.g.,
– Acid rain—weather forecasting
– Hurricanes—weather forecasting
– Rain—weather forecasting
Subject Headings for Children LCSH
– Storms—Forecasting
11
Enriching Terminology for Numbers
Built from Table 1
Example: built number 520.6
520 Astronomy and allied sciences
Relative Index terms that approximate
the whole of 520:
–
–
–
–
Astronomy
Celestial bodies
Outer space
Space—astronomy
12
Built Number 520.6
Relative Index terms from T1—06, e.g.:
– Associations
– Organizations
Combined entries for 520.6, e.g.:
–
–
–
–
–
–
Astronomy—associations
Astronomy—organisations
Astronomy—organizations
Celestial bodies—associations
Celestial bodies—organisations
Celestial bodies—organizations
13
Subdivisions Added or Enriched
505
506
507.2
507.4
509
509.2
510.28
510.5
510.6
510.71
510.9
520.6
526.06
530.05
530.06
530.071
540.5
540.6
540.71
550.5
550.6
550.71
551.4606
551.4607
Science Serial publications
Science Organizations
Science Research; statistical methods
Science Museums, collections, exhibits
Science Historical, geographic, persons treatment
Science Persons
Mathematics Auxiliary techniques and procedures; apparatus, equipment
Mathematics Serial publications
Mathematics Organizations
Mathematics Education
Mathematics Historical, geographic, persons treatment
Astronomy Organizations
Cartography Organizations
Physics Serial publications
Physics Organizations
Physics Education
Chemistry Serial publications
Chemistry Organizations
Chemistry Education
Earth sciences Serial publications
Earth sciences Organizations
Earth sciences Education
Hydrosphere and submarine geology Oceanography Organizations
Hydrosphere and submarine geology Oceanography Education and research
1_05
1_06
1_072
1_074
1_09
1_092
1_0285
1_05
1_06
1_071
1_09
1_06
1_06
1_05
1_06
1_071
1_05
1_06
1_071
1_05
1_06
1_071
1_06
1_071; 1_072
14
Added UK Spellings for Index Entries
512.7
519.6
Number theory
Mathematical optimization
Factorisation—number
theory
Factorization—number
theory
Number theory
Mathematical optimisation
Mathematical optimization
Optimisation—mathematical
Optimization—mathematics
Prime numbers
15
Results: Scorpion & BUBL
Match Type
Exact
Partial
Exact or Partial
Non-match
Total
A14 base A14.v1 A14.v2 A14.v3
139
139
129
183
155
155
186
167
294
294
315
350
455
455
434
399
749
749
749
749
A14.v1 base file + UK spelling
A14.v2 base file + UK spelling + SS added/enriched
A14.v3 base file + UK spelling + SS added/enriched + truncation
16
17
18
19
20
21
Next Steps
Analyze where the truncation and the
enriched terminology were useful and
where not; revise the v3 database
accordingly
Extend approach to additional classes
and projects (ePrints UK)
22
Links
Research : Projects : ePrints-UK
– http://www.oclc.org/research/projects/
mswitch/epuk.htm
Dewey
– http://www.oclc.org/dewey/
23