Download PDF

Exploration
of metadata
change in a
digital
repository
Oksana L. Zavalina, Assistant Professor
Priya Kizhakkethil, PhD student
2
Preliminary results presentation
 Ongoing
exploratory research into the changes
in metadata records over time

large regional distributed digital library that versions
its metadata records:
 Findings
regarding:

metadata elements receiving the most editing

the most prevalent types of metadata change

distribution of metadata change types across
metadata elements.
O.L.Zavalina, & P.Kizhakkethil © 2015
3
Background: Computer Science
 Change
in texts, strings, files, scripts, etc.
 Mechanisms

for identifying change
edit distance (e.g., Bille, 2005)
 File
comparison tools (DIFF, COMM, PRETTY DIFF,
PROMPTDIFF, etc.) for isolating differences
between:





files
programs
applications
ontologies
multiple versions of the same entities.
(e.g., Cheney, 2010; Horwitz, 1990; Noy et al., 2004)
O.L.Zavalina, & P.Kizhakkethil © 2015
4
Background: Information Science
Metadata quality
research:


suggested the link
BUT
between metadata
change and metadata
quality
emphasized the need
to measure the
metadata change and
its outcomes for the
users
Almost no published
research identifying
and measuring
metadata change

In part due to
unavailability – until
recently – of opensource (or inexpensive
proprietary) information
systems that allow
metadata versioning.
(Stvilia et al., 2004;
Stvilia & Gasser, 2008)
O.L.Zavalina, & P.Kizhakkethil © 2015
5
Background: Information Science:
evaluating metadata change

Zavalina et al., 2008: small-scale qualitative analysis
as part of metadata quality study in IMLS DCC
aggregation (140+ collection-level records)



frequency of revisions in collection-level IMLS DCC metadata
records.
2 broad categories of metadata revisions : change & addition
Tarver et. al., 2014: “big data” quantitative analysis of
metadata change in UNT Digital Collections over a
period of 4 years (600,000+ item-level records):


overall frequency distribution of metadata change events
change in metadata record length, access status, etc.
O.L.Zavalina, & P.Kizhakkethil © 2015
6
Gaps to address
 Lack
of studies
identifying and
measuring metadata
change in
information science
research
Need for deeper,
more granular
investigation (microanalysis) of
metadata change

 Broad
scope of
existing studies
(macro-analysis)
at the level of
individual records,
metadata elements,
and data values.
O.L.Zavalina, & P.Kizhakkethil © 2015
7
Data collection: main sample
Dublin Core records created
by human metadata creators

created between October 1, 2009, &
December 31, 2012

visible to end-users both at the time of
creation & at the time of data collection
(April 2014)

representing different collections in
repository, and describing different
kinds/genres of information objects

edited at least once, with last
editing in January-April 2014
O.L.Zavalina, & P.Kizhakkethil © 2015
157 records X 2
(initial & latest
versions) = 314
Initial
Interm
ediate
Latest
8
Data collection: main sample
#
of Record Versions:
created
Dublin Core records
0.6%
0.6%
by human metadata
1.3% creators




2%1, 2009, &
created between October
4%
9%
December 31, 2012
2
4
5
0.6%
6
22%
7
visible to end-users both at the time of
157 records X 23
10%of data collection
creation & at the time
(initial & latest8
(April 2014)
versions)
21% = 314
13%
9
representing different collections in
repository, and describing different
17%
kinds/genres of information objects
edited at least once, with last
editing in January-April 2014
O.L.Zavalina, & P.Kizhakkethil © 2015
10
Initial
Interm
ediate
11
Latest
15
17
9
Data collection: sample #2
Subsample (expanded)
from the main sample
 33% of all records edited
11 records X 4
(all versions) = 44
3 times
Initial
1st
revision
2nd
revision
O.L.Zavalina, & P.Kizhakkethil © 2015
Latest
(3rd
revision)
10
Research questions
How and when do metadata records change
in a digital repository?

What categories of change can be identified &
what is the relative frequency of their
occurrence?

In which metadata fields does change occur the
most often?
How is metadata change related to:





Main
sample &
sample 2
Main
sample
record age?
number of editing events?
fluctuations in the record length?
How is the metadata change distributed across
editing events (over time)?
O.L.Zavalina, & P.Kizhakkethil © 2015
Main
sample &
sample 2
11
Metadata change: basic statistics
Type of
change
addition
Total no.
of
metadata
change
instances
observed
254
deletion
534
0
9
3.38
4
1.76
modification
475
0
14
3.01
3
2.17
change
overall
1263
1
18
6.68
7
3.64
Number of fields with change
per record
Min
Max
Mean
Median
St.
Dev.
0
8
1.62
1
0.98
O.L.Zavalina, & P.Kizhakkethil © 2015
Delections
Additions
O.L.Zavalina, & P.Kizhakkethil © 2015
Modifications
note
degree
identifier
format
resourceType
rights
institution
collection
relation
citation
source
coverage
primarySource
subject
description
language
date
publisher
contributor
creator
title
12
Metadata change types by
record field (% of records)
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
13
Metadata change examples
(iConference 2012 paper record)
O.L.Zavalina, & P.Kizhakkethil © 2015
36.9%
35.7%
26.1%
24.2%
12.7%
7.0%
0.6%
1.9%0.0%0.0%5.1%0.0%0.0%0.0%
0.6%0.0%0.0%1.3%0.0%0.0%
0.0%
O.L.Zavalina, & P.Kizhakkethil © 2015
note
degree
50%
note
degree
identifier
format
resourceType
rights
institution
23%
identif…
format
resour…
rights
institut…
collection
81%
collec…
relation
50%
relation
71%
citation
97%100%
citation
25%
source
coverage
cover…
68%
source
primarySource
subject
50%
primar…
subject
description
language
46%
descri…
langu…
date
publisher
94%
date
publis…
contributor
creator
title
35%
contri…
creator
title
14
No change vs. multiple change
96% 96% 97% 98% 99%
71%
50%
32%
15
Change in the number of field
instances (% of records)
54.8%
40.8%
31.8%
11.5%
7.0%
3.2%2.5%2.5%1.3%1.3%1.3%1.3%0.6%0.6%0.6%0.6%0.6%0.6%0.6%
0.0%0.0%
results in substantial change ― mostly increase,
sometimes decrease ― in the length of the record
O.L.Zavalina, & P.Kizhakkethil © 2015
16
Metadata change: correlations
Pearson’s r
Record
age
# of
record
versions
Record age
X
-0.23944
0.49472
-0.248
# of record
versions
-0.23944
X
0.2099
0.7741
0.2099
X
0.1167
0.7741
0.1167
X
record length
increase/decr 0.49472
ease: initial to
latest version
# of edited
fields in
record
-0.248
record length # of edited
increase/decr
fields in
ease: initial to
record
latest version
O.L.Zavalina, & P.Kizhakkethil © 2015
17
No. of instances in 11 records
changed 3 times
When does metadata change?
addition
deletion
modification
40
35
32
30
25
22
20
15
14
10
9
6
5
0
0
1st
9
2
0
2nd
3rd
Editing events
O.L.Zavalina, & P.Kizhakkethil © 2015
18
Implications
Digital library/
repository
development:
•
improving and
maintaining
metadata
quality
•
strategically
distributing
scarce
resources
O.L.Zavalina, & P.Kizhakkethil © 2015
19
Future/concurrent research
Similar studies in various environments:
•
digital libraries
•
institutional repositories:
•
•
•
research products (publications, presentations
etc.)
research data
bibliographic databases (e.g., WorldCat)
which use other metadata schemes
& enable metadata versioning
O.L.Zavalina, & P.Kizhakkethil © 2015
20
Works cited









Bille, P. (2005). A survey on tree edit distance and related problems. Theoretical
Computer Science, 337(1-3), 217-239.
Cheney, A. (2010). Pretty Diff - Documentation. Retrieved from
http://prettydiff.com/documentation.php
Higgins, S. (2008). The DCC curation lifecycle model. International Journal of Digital
Curation, 3(1), 134-140
Horwitz, S. (1990). Identifying the semantic and textual differences between two
versions of a program. ACM SIGPLAN Notices, 25(6), 234-245. DOI:
http://dx.doi.org/10.1145/93548.93574
Noy, N., Kunnatur, S., Klein, M., & Musen, M. (2004). Tracking changes during ontology
evolution. Lecture Notes in Computer Science, 3298, 259-273.
Stvilia, B., Gasser, L., Twidale, M., Shreeves, S., & Cole, T. (2004). Metadata quality for
federated collections. Proceedings of ICIQ04, 111-112
Stvilia, B., & Gasser, L. (2008). Value based metadata quality assessment. Library &
Information Science Research, 30 (1), 67-74. Retrieved from
http://dx.doi.org/10.1016/j.lisr.2007.06.006
Tarver, H., Zavalina, O., Phillips, M., Alemneh, D., & Shakeri, S. (2014). How descriptive
metadata changes in the UNT Libraries’ Collections: A case study, Proceedings of the
International Conference and Workshop on Dublin Core and Metadata
Applications, Austin, Texas.
Zavalina, O.L., Palmer, C.L., Jackson, A.S., & Han, M.-J. (2008). Evaluating descriptive
richness in collection-level metadata. Journal of Library Metadata, 8 (4), 263-292.
O.L.Zavalina, & P.Kizhakkethil © 2015
21
THANK YOU!
Questions?
Comments?
Ideas?
Your feedback is very welcome!
Oksana L. Zavalina: [email protected]
O.L.Zavalina, & P.Kizhakkethil © 2015