Exploration of metadata change in a digital repository Oksana L. Zavalina, Assistant Professor Priya Kizhakkethil, PhD student 2 Preliminary results presentation Ongoing exploratory research into the changes in metadata records over time large regional distributed digital library that versions its metadata records: Findings regarding: metadata elements receiving the most editing the most prevalent types of metadata change distribution of metadata change types across metadata elements. O.L.Zavalina, & P.Kizhakkethil © 2015 3 Background: Computer Science Change in texts, strings, files, scripts, etc. Mechanisms for identifying change edit distance (e.g., Bille, 2005) File comparison tools (DIFF, COMM, PRETTY DIFF, PROMPTDIFF, etc.) for isolating differences between: files programs applications ontologies multiple versions of the same entities. (e.g., Cheney, 2010; Horwitz, 1990; Noy et al., 2004) O.L.Zavalina, & P.Kizhakkethil © 2015 4 Background: Information Science Metadata quality research: suggested the link BUT between metadata change and metadata quality emphasized the need to measure the metadata change and its outcomes for the users Almost no published research identifying and measuring metadata change In part due to unavailability – until recently – of opensource (or inexpensive proprietary) information systems that allow metadata versioning. (Stvilia et al., 2004; Stvilia & Gasser, 2008) O.L.Zavalina, & P.Kizhakkethil © 2015 5 Background: Information Science: evaluating metadata change Zavalina et al., 2008: small-scale qualitative analysis as part of metadata quality study in IMLS DCC aggregation (140+ collection-level records) frequency of revisions in collection-level IMLS DCC metadata records. 2 broad categories of metadata revisions : change & addition Tarver et. al., 2014: “big data” quantitative analysis of metadata change in UNT Digital Collections over a period of 4 years (600,000+ item-level records): overall frequency distribution of metadata change events change in metadata record length, access status, etc. O.L.Zavalina, & P.Kizhakkethil © 2015 6 Gaps to address Lack of studies identifying and measuring metadata change in information science research Need for deeper, more granular investigation (microanalysis) of metadata change Broad scope of existing studies (macro-analysis) at the level of individual records, metadata elements, and data values. O.L.Zavalina, & P.Kizhakkethil © 2015 7 Data collection: main sample Dublin Core records created by human metadata creators created between October 1, 2009, & December 31, 2012 visible to end-users both at the time of creation & at the time of data collection (April 2014) representing different collections in repository, and describing different kinds/genres of information objects edited at least once, with last editing in January-April 2014 O.L.Zavalina, & P.Kizhakkethil © 2015 157 records X 2 (initial & latest versions) = 314 Initial Interm ediate Latest 8 Data collection: main sample # of Record Versions: created Dublin Core records 0.6% 0.6% by human metadata 1.3% creators 2%1, 2009, & created between October 4% 9% December 31, 2012 2 4 5 0.6% 6 22% 7 visible to end-users both at the time of 157 records X 23 10%of data collection creation & at the time (initial & latest8 (April 2014) versions) 21% = 314 13% 9 representing different collections in repository, and describing different 17% kinds/genres of information objects edited at least once, with last editing in January-April 2014 O.L.Zavalina, & P.Kizhakkethil © 2015 10 Initial Interm ediate 11 Latest 15 17 9 Data collection: sample #2 Subsample (expanded) from the main sample 33% of all records edited 11 records X 4 (all versions) = 44 3 times Initial 1st revision 2nd revision O.L.Zavalina, & P.Kizhakkethil © 2015 Latest (3rd revision) 10 Research questions How and when do metadata records change in a digital repository? What categories of change can be identified & what is the relative frequency of their occurrence? In which metadata fields does change occur the most often? How is metadata change related to: Main sample & sample 2 Main sample record age? number of editing events? fluctuations in the record length? How is the metadata change distributed across editing events (over time)? O.L.Zavalina, & P.Kizhakkethil © 2015 Main sample & sample 2 11 Metadata change: basic statistics Type of change addition Total no. of metadata change instances observed 254 deletion 534 0 9 3.38 4 1.76 modification 475 0 14 3.01 3 2.17 change overall 1263 1 18 6.68 7 3.64 Number of fields with change per record Min Max Mean Median St. Dev. 0 8 1.62 1 0.98 O.L.Zavalina, & P.Kizhakkethil © 2015 Delections Additions O.L.Zavalina, & P.Kizhakkethil © 2015 Modifications note degree identifier format resourceType rights institution collection relation citation source coverage primarySource subject description language date publisher contributor creator title 12 Metadata change types by record field (% of records) 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% 13 Metadata change examples (iConference 2012 paper record) O.L.Zavalina, & P.Kizhakkethil © 2015 36.9% 35.7% 26.1% 24.2% 12.7% 7.0% 0.6% 1.9%0.0%0.0%5.1%0.0%0.0%0.0% 0.6%0.0%0.0%1.3%0.0%0.0% 0.0% O.L.Zavalina, & P.Kizhakkethil © 2015 note degree 50% note degree identifier format resourceType rights institution 23% identif… format resour… rights institut… collection 81% collec… relation 50% relation 71% citation 97%100% citation 25% source coverage cover… 68% source primarySource subject 50% primar… subject description language 46% descri… langu… date publisher 94% date publis… contributor creator title 35% contri… creator title 14 No change vs. multiple change 96% 96% 97% 98% 99% 71% 50% 32% 15 Change in the number of field instances (% of records) 54.8% 40.8% 31.8% 11.5% 7.0% 3.2%2.5%2.5%1.3%1.3%1.3%1.3%0.6%0.6%0.6%0.6%0.6%0.6%0.6% 0.0%0.0% results in substantial change ― mostly increase, sometimes decrease ― in the length of the record O.L.Zavalina, & P.Kizhakkethil © 2015 16 Metadata change: correlations Pearson’s r Record age # of record versions Record age X -0.23944 0.49472 -0.248 # of record versions -0.23944 X 0.2099 0.7741 0.2099 X 0.1167 0.7741 0.1167 X record length increase/decr 0.49472 ease: initial to latest version # of edited fields in record -0.248 record length # of edited increase/decr fields in ease: initial to record latest version O.L.Zavalina, & P.Kizhakkethil © 2015 17 No. of instances in 11 records changed 3 times When does metadata change? addition deletion modification 40 35 32 30 25 22 20 15 14 10 9 6 5 0 0 1st 9 2 0 2nd 3rd Editing events O.L.Zavalina, & P.Kizhakkethil © 2015 18 Implications Digital library/ repository development: • improving and maintaining metadata quality • strategically distributing scarce resources O.L.Zavalina, & P.Kizhakkethil © 2015 19 Future/concurrent research Similar studies in various environments: • digital libraries • institutional repositories: • • • research products (publications, presentations etc.) research data bibliographic databases (e.g., WorldCat) which use other metadata schemes & enable metadata versioning O.L.Zavalina, & P.Kizhakkethil © 2015 20 Works cited Bille, P. (2005). A survey on tree edit distance and related problems. Theoretical Computer Science, 337(1-3), 217-239. Cheney, A. (2010). Pretty Diff - Documentation. Retrieved from http://prettydiff.com/documentation.php Higgins, S. (2008). The DCC curation lifecycle model. International Journal of Digital Curation, 3(1), 134-140 Horwitz, S. (1990). Identifying the semantic and textual differences between two versions of a program. ACM SIGPLAN Notices, 25(6), 234-245. DOI: http://dx.doi.org/10.1145/93548.93574 Noy, N., Kunnatur, S., Klein, M., & Musen, M. (2004). Tracking changes during ontology evolution. Lecture Notes in Computer Science, 3298, 259-273. Stvilia, B., Gasser, L., Twidale, M., Shreeves, S., & Cole, T. (2004). Metadata quality for federated collections. Proceedings of ICIQ04, 111-112 Stvilia, B., & Gasser, L. (2008). Value based metadata quality assessment. Library & Information Science Research, 30 (1), 67-74. Retrieved from http://dx.doi.org/10.1016/j.lisr.2007.06.006 Tarver, H., Zavalina, O., Phillips, M., Alemneh, D., & Shakeri, S. (2014). How descriptive metadata changes in the UNT Libraries’ Collections: A case study, Proceedings of the International Conference and Workshop on Dublin Core and Metadata Applications, Austin, Texas. Zavalina, O.L., Palmer, C.L., Jackson, A.S., & Han, M.-J. (2008). Evaluating descriptive richness in collection-level metadata. Journal of Library Metadata, 8 (4), 263-292. O.L.Zavalina, & P.Kizhakkethil © 2015 21 THANK YOU! Questions? Comments? Ideas? Your feedback is very welcome! Oksana L. Zavalina: [email protected] O.L.Zavalina, & P.Kizhakkethil © 2015
© Copyright 2024 Paperzz