#EMEARC17 Resolving Identities of Entities Entities and their Attributes Marta Cichoń, National Library of Poland #EMEARC17 Objectives • Data Atomization in the Bibliographic Database • Using controlled vocabulary and matching with other standard vocabularies • Populating standard MARC21 fields with values assigning attributes to entities • Creating additional links between entities in our database • From unstructured data to structured information • Improving possibilities for data retrieval and linking #EMEARC17 New opportunities with MARC 21 Bibliographic Records: Authority Records: 336 337 338 380 381 368 370 371 372 373 374 375 376 377 378 380 385 386 388 Content Type (R) Media Type (R) Carrier Type (R) Form of Work (R) Other Distinguishing Characteristics of Work or Expression (R) 385 Audience Characteristics (R) 386 Creator/Contributor Characteristics (R) 388 Time Period of Creation (R) 658 Index Term-Curriculum Objective (R) (here as field of knowledge – content objective) #EMEARC17 Other Attributes of Person or Corporate Body (R) Associated Place (R) Address (R) Field of Activity (R) Associated Group (R) Occupation (R) Gender (R) Family Information (R) Associated Language (R) Fuller Form of Personal Name (NR) Form of work (R) Audience Characteristics (R) Creator/Contributor Characteristics (R) Time Period of Creation (R) The same MARC 21, new approach • Shared entities for name and subject authority control • Allowing additional “attributes” • Allowing better data segmentation • Using new attributes to populate additional facets in the Faceted Search • Using the enriched content of records to extract specific data in various formats #EMEARC17 Entities and their Attributes Type of Entity: Persons o 043 – Geographic Area Code (NR) o o o o o o o o #EMEARC17 046 – Special Coded Dates (R) 368|c – Other designation (R) 368|d – Title of Person (R) 370 – Associated Place (R) 372 – Field of Activity (R) 373 – Associated Group (R) 374 – Occupation (R) 375 – Gender (R) 377 – Associated Language (R) Entities and their Attributes Type of Entity: Corporate Bodies o 043 – Geographic Area Code (NR) o o o o o o 046 – Special Coded Dates (R) 368|a – Type of corporate body (R) 370 – Associated Place (R) 371 – Address (R) 372 – Field of Activity (R) 377 – Associated Language (R) #EMEARC17 Entities and their Attributes Type of Entity: Events o o o o o o 043 – Geographic Area Code (NR) 046 – Special Coded Dates (R) 368|a – Type of meeting/event (R) 370 – Associated Place (R) 372 – Field of Activity (R) 377 – Associated Language (R) #EMEARC17 Entities and their Attributes Type of Entity: Places o o o o 034 - Coded Cartographic Mathematical Data (R) 043 – Geographic Area Code (NR) 045 – Time Period (NR) 368|b – Type of jurisdiction (R) #EMEARC17 Semi-automatic, automatic and manual method • Semi-automatic method of populating records with attributes extracted from existing records’ data with help of open source tool MarcEdit • Manual method of populating records with new attributes using knowledge of experienced catalogers • Automatic method of mapping attributes from other datastes #EMEARC17 Semi-automatic method Parsing the unstructured bibliographic data into structured data elements - example of parsing the birth and death dates recorded in MARC 100|d into separate subfields MARC 046|f and 046|g using regular expressions: • Selecting and exporting headings, where 100|d starts with regular expression (\(.*) as dates are recorded in this subfield in brackets • Copying all data from field 100|d to field 046 in a new file • Using ‘Edit Indicators Utility’ tool to correct the indicators and ‘Edit Subfield’ tool to delete the surplus subfield ‘a’ in the copied 046 field • Using the “Swap Field Utility’ tool to add subfields ‘f’ and ‘g’ in the 046 field with copied identical data • Using the ‘Edit Subfield’ tool to edit data in fields ‘f’ and ‘g’ by means of regular expression #EMEARC17 Semi-automatic method Parsing the unstructured bibliographic data into structured data elements example of parsing the birth and death dates recorded in MARC 100|d into separate subfields MARC 046|f and 046|g using regular expressions #EMEARC17 Automatic method Based on mappings of properties values example: dcterms #EMEARC17 MARC 21 dcterms namespace properties 024 dcterms:identifier 034 dcterms:spatial 045 dcterms:temporal 336 dcterms:type 337 dcterms:requires 338 dcterms:medium 385 dcterms:educationLevel dcterms:audience 388 dcterms:created MARC 21 Automatic method Based on mappings of properties values example: schema.org #EMEARC17 schema.org vocabulary equivalent property 024 schema:sameAs 034 schema:geo 046 (for separate subfields) 368 370 (for separate subfields) schema:birthDate, schema:deathDate; schema:foundingDate schema:AdministrativeArea schema:birthPlace, schema:deathPlace, schema:foundingPlace, schema:location 371 schema:address 372 schema:industry 373 schema:affiliation 374 schema:jobTitle 375 schema:gender 376 schema:relatedTo 377 schema:inLanguage 385 schema:audienceType 386 schema:nationality 388 schema:dateCreated Structured record data “Wytwórnia filmowa” is Polish language equivalent for “Film Studio Company #EMEARC17 Better record data segmentation #EMEARC17 More possibilities of data retrieval • Retrieving selected datasets basing on attributes related criteria in various formats by means of data extraction service data.bn.org.pl • Using standard RESTful API • Queries returning data in standard formats #EMEARC17 More possibilities of data retrieval Sample queries: • http://data.bn.org.pl/api/authorities.xml?jurisdiction=Morza • http://data.bn.org.pl/api/authorities.xml?fieldOfActivity=nie miecki • http://data.bn.org.pl/api/bibs.json?audienceGroup=Dzieci #EMEARC17 More possibilities of data retrieval http://data.bn.org.pl GUI - search queries interface available #EMEARC17 Populating additional facets in the Faceted Search New Facets: Audience Cultural/ Ethnic Identity Time Period of Creation Content Objective Form/ Genre: Form Genre #EMEARC17 Thank you! [email protected] #EMEARC17
© Copyright 2026 Paperzz