Finding similar images in a photo collection using IPTC

Finding similar images in a photo collection using IPTC
Martijn Kleppe [email protected]
Erasmus University Rotterdam
In recent products of historiography (popular and scientific) ever more similar looking photographs of
(mostly symbolic) happenings are used as an instrument to tell a story about important events or
structural trends. However, finding similar images in a large set of books or photographs can be a
challenge given the high level of subjectivity when interpreting photographs (Finnegan 2006; Rose
2007) and the lack of standardized thesauri to describe photographs (Kleppe 2012). This poster
presents an approach to find the most recurring images in a set of 5.000 photographs by adding
information about the content of the photo in IPTC fields that are embedded within the digital file of
the photograph.
The International Press Telecommunication Council (IPTC) develops technical standards for
1
newsorganisations. By applying the same standards, a photographer can embed the info of a photo
inside the file and send this to an editor at a newspaper who can download the info in their local ICTinfrastructure. This technique not only facilitates the exchange of files between journalists, but also
cultural heritage institutions can use IPTC to include information about their objects in the digital files
(Grijsen 2012; Reser 2012).
For our research on the recurring use of photographs in history textbooks (Kleppe 2013a) we
analyzed over 5.000 photographs that were included in Dutch History textbooks, published in the
period 1970 – 2000. All photos were digitized and analyzed by assigning 41 variables (such as topic,
person and year). We used the software program Fotostation Pro to store the values of each variable
by using the IPTC fields.2 It not only allowed us to do full text searches through all assigned values
and share our research data with other researchers who could ‘read’ the information in the IPTC fields
in their photo – editing and viewing software. Moreover, we were also able to export all values to csvfiles that were importable into statistical software such as SPSS.
By making frequency tables we calculated which topics were most present in the set of photographs
and then manually went over these topics to find the images that were used most often. Results show
that a photograph of socialist politician Pieter Jelles Troelstra of 1912 is used most often in the
analyzed textbooks. On the photo Troelstra gives a speech in which he pleas for universal suffrage.
Since our database contained all info on how this particular photo was used in all textbooks, we could
then return to our database and examine in which context the photo was used. We found that in onethird of all history textbooks, this photo is incorrectly dated since it is used to illustrate Troelstra’s
failed attempt to start a revolution in 1918. This outcome gives ground for several follow-up
historiographical research questions focusing on both the afterlife of photographs (Kroes 2007) as
well as the selection processes of historical gatekeepers (Kleppe 2013).
Even though our database is relatively small, the case-study of the photo of Troelstra shows that by
adding data in the IPTC-fields, we were able to quickly track-down all the textbooks in which the photo
is used and determine the context in which the photo is used. Studying this afterlife can even be taken
a step further when databases with the same approach can be linked, e.g. collections that are
described with the ICONCLASS System (Brandhorst 2013) or the GTAA (Oomen 2010).
1
2
http://www.iptc.org
http://www.fotoware.com/en/Products/FotoStation/
1
Therefor we also made our database available for future researchers (Kleppe 2013b). All images and
metadata can be downloaded by both humanities scholars as well as by computational researchers
who further want to explore the possibilities of data enriched with IPTC info or use the images to train
image recognition software.
The photo of Pieter Jelles Troelstra (top) and a screenshot of a menu in Fotostation Pro by which the info of the photo are
included in the IPTC fields of the file.
2
Literature
Brandhorst, H. (2012). The Iconography of the Pleasures and Problems of Drink: Thoughts on the
Opportunities and Challenges for Access and Collaboration in the Digital Age. Visual
Resources, 28(4), 384-390.
Finnegan, C.A. (2006). What is this a picture of? Some Thoughts on Images and Archives, Rethoric &
Public Affairs, 116 – 123.
Grijsen, C. (2012). In perspectief: behoud en beheer van born-digital fotoarchieven. Fotografisch
Geheugen 75 (2012) 24 - 26.
Kleppe, M. (2013a). Canonieke Icoonfoto's. De rol van (pers)foto's in de Nederlandse
geschiedschrijving (Delft).
Kleppe, M. (2013b), Foto’s in Nederlandse Geschiedenisschoolboeken (FiNGS)
http://www.persistent-identifier.nl/?identifier=urn:nbn:nl:ui:13-l37n-bi.
Kleppe, M. (2012). Wat is het onderwerp op een foto? De kansen en problemen bij het opzetten van
een eigen fotodatabase, Tijdschrift voor Mediageschiedenis, 93 – 107.
Kroes, R. (2007). Photographic memories: Private pictures, public images, and American history.
Dartmouth College.
Oomen, Johan. & Brugman, Hennie (2010) Thesauri gekoppeld, Digitale Bibliotheek 2 (5) 18- 21.
Reser, G., & Bauman, J. (2012). The Past, Present, and Future of Embedded Metadata for the LongTerm Maintenance of and Access to Digital Image Files. International Journal of Digital Library
Systems (IJDLS), 3(1), 53-64.
Rose, G. (2007). Visual Methodologies – An Introduction to the interpretation of Visual Materials
(London).
3