Advantages of Application of Data Enrichment Methods for the Art

Noname manuscript No.
(will be inserted by the editor)
Advantages of Application of Data Enrichment
Methods for the Art Market: An Empirical Study
Dominik Filipiak · Agata Filipowska
Received: date / Accepted: date
Abstract Research carried out on art market indices is mostly focused on
methods stemming from econometrics. Data availability and quality seems to
play an important role in all conducted studies. An intense effort to digitalise
information about artworks constitutes a chance to take advantage of rich
data sources that may influence this research. We are proposing a method
for building an accurate description of the art market, including indices, using state-of-the-art achievements of data enrichment, data fusion and data
science. This will lead to the extension of the amount of data available for
further analysis. Such high-quality data can be applied inter alia for building
high precision art market indices, especially these which make use of hedonic
regression or probit models. The output of the proposed approach may help to
spot new art market trends, especially because of the scope of the art market
description. Although conducted research is focused on the Polish art market,
the method itself can be applied to any country. The paper is to present both
the method, as well as the initial application results for the Polish art market.
Keywords Art market · Hedonic indices · Data enrichment
JEL classification Z11 · C80
D. Filipiak
Department of Information Systems
Poznań University of Economics and Business
E-mail: [email protected]
A. Filipowska
Department of Information Systems
Poznań University of Economics and Business
E-mail: [email protected]
2
Dominik Filipiak, Agata Filipowska
1 Introduction
Treating artworks as an investment in periods of prosperity has a long history.
However, research made to precisely measure the art market has only been
conducted since several decades. These efforts consider describing trends, outlining a shape of the market and an appraisal. Creating reliable indices to allow
to compare artworks with other forms of investments (alternative as well as
more traditional) plays an especially important role in this research.
Data availability and quality seem to play an important role in all conducted studies. An intense effort to digitalise information about artworks constitutes a chance to take advantage of rich data sources that may influence this
research. We are proposing a method for building an accurate description of
the art market, including indices, using state-of-the-art achievements of data
enrichment, data fusion and data science. This will lead to an extension of the
amount of data available for further analysis.
Such high-quality data can be applied inter alia for building high precision
art market indices, especially these which make a use of hedonic regression or
probit models. The output of the proposed approach may help to spot new art
market trends, especially because of the scope of the art market description.
Although the conducted research is focused on the Polish art market, the
method itself can be applied to any country. The paper is to present both the
method, as well as the initial application results for the Polish art market.
This paper is organised as follows. The next section contains a review
of work related to the subject matter. The following section describes the
proposed method. After that, the used dataset and the experiment results are
described. A brief summary with a view for future research constitutes the
last section.
2 Related Work
Efforts to measure the art market focus on creating various types of indices.
Their purpose is to [5]:
– outline general art market trends and provide a way to compare investment
in arts with other forms of investment, such as stocks and bonds,
– measure the art market volatility in comparison with other types of investment,
– investigate how different social and economic incentives influence the art
market,
– appraise the overall value of artworks.
Two main methods of art market index building may be outlined, based
on their creation process. The first group considers repeated-sales regression
(RSR). Intuitively, it relies on artworks sold at least twice and the ratio of the
first to the second price. Though this approach may seem rational, it suffers
from a lack of data. Art is a long-term investment, therefore it is hard to collect
Advantages of Application of Data Enrichment Methods for the Art Market
3
enough observations to build a reliable index. An example of this kind is the
famous Mei & Moses index, which bases on Christie’s and Sotheby’s repeated
sales [10].
The second approach employs so-called hedonic regression, which is a form
of a linear regression. Usually, it compares a natural logarithm of the hammer
price of a given artwork to explanatory variables, consecutively measuring a
lot’s features during a given period. These variables may be numerical (like
the size of a painting) or dummy (like the medium used or the year of a sale).
Dummies are equal to 0 or 1 depending on a presence of a given attribute.
This approach is widely used in the art market analysis [12] [9], because it can
take into account all sold lots. An example of an index built based on hedonic
regression is the German Art All 2-step hedonic index [7].
Collins et al. [3] address the problem of selection bias and time instability in
the index by the Heckman 2-stage procedure. Jones et al. [6] argue that usage
of a logarithm in HR yields indices that are hard to interpret in an economic
way. Bocart [2] suggested a heteroskedastic HR model with a non-parametric
local likelihood estimator.
The concept of enriching data with semantic information is not an entirely
new idea [1]. Due to its wide range of possible applications, data enrichment
has been used for different subjects. For example, van der Waal et al. used it in
the domain of government open data [13]. Paulheim et al. investigate a more
general approach for adding background knowledge in the data mining field
[11]. They outline linking as the first step in combining original information
with ancillary data sources. Basing on a tool presented in their paper, various
types of linking may be enlisted:
–
–
–
–
pattern-based linking,
label-based linking,
lookup linking,
SameAs linking.
Though widely described in the literature, methods based on hedonic regression are still prone to selection bias, quality and size of a considered
dataset. With limited data about the art market, it is crucial to ensure that
the research is conducted on the best available set of observations and all
possible variables are taken into account. Therefore, it is a place for semantic
enrichment, which is the core of the presented method.
3 Method
Issues regarding the art market remain the same - index construction and
comparing it to other forms of assets have been studied many times. Our
solution shows a nouvelle approach to the old problems. The presented method
for measuring the art market consists of the following steps:
1. data collection,
2. data linking,
4
Dominik Filipiak, Agata Filipowska
3. data enrichment,
4. index building.
The first step is a base for all art market related research. Keeping in mind
the so-called garbage in, garbage out rule, a data collection process must be
performed with caution and precisely – especially taking into account further
steps, which include data linking. Therefore, a sufficient amount of time needs
to be spent on data cleansing and preparation. The data enrichment shows
our contribution to the art market. To the best of our knowledge, Linked
Open Data (and DBpedia in particular) was not used in this field of study
before. Finally, the index building step is to present the usability of previously
mentioned actions - yielded indices (thanks to data enrichment) should be
more accurate and describe the art market more precisely.
DBpedia, tightly coupled with Wikipedia, provides an access to information about numerous extracted concepts - especially from infoboxes, which are
available in an easy to parse way. Making a vast amount of crowd-sourced
data structured, it comes with a SPARQL API to facilitate querying about
sophisticated structures [8].
Wikipedia (and therefore DBpedia) is perceived not as a suitable place
for publishing art market data. One may find brief information about auction
houses. Apart from art world giants such as Christie’s or Sotheby’s, it is not
much more than simple descriptions and historical context. Especially for local
institutions, such as these existing on the Polish art market, there is very little
information presented.
Art (in general) is described well. Rich information about artists or particular artworks is available on an English DBpedia. As for various described
topics, they may be part of subjects divided by genre, medium, nationality or
a period in the art category hierarchy. This makes it possible to utilise accommodated knowledge in an automated way. As further sections of this paper
will explain, it may constitute an input for equations depicting the art market.
The data collection process in this case consists of several steps:
1.
2.
3.
4.
source identification,
crawlers preparation,
crawling,
post-processing.
Since auction houses are publishing sales results on the Internet, one can gain
access to historical information. This helps to minimise the asymmetry of
information (between professional traders and those who are new to this field)
and, as a consequence, popularise the art market. However, to access the data
about - for example - Picasso’s paintings at auctions, one will have to browse
numerous websites. To bridge this gap, companies such as artnet1 or Artprice2
try to collect all data published on auction houses’ pages. However, these kinds
of sites are often paid, still not easily processable by computers and finally,
due to legal issues, can’t be used in research.
1
2
http://artnet.com
http://artprice.com
Advantages of Application of Data Enrichment Methods for the Art Market
5
Therefore, to perform experiments, data has to be gathered on one’s own.
To overcome this problem, crawlers may be used. A crawler (sometimes called a
spider) is a program written to systematically visit given Web pages and collect
selected information. Popular examples of software or libraries supporting the
data collection are Apache Nutch3 or Scrapy4 .
In the conducted research, special attention is given to the Polish art market. Therefore, sites of the four biggest Polish auction houses (Desa Unicum,
Rempex, Agra-Art, and Polswiss Art) are considered as primary sources of
data. Fortunately, auction houses often publish information in a systematised,
crawling-friendly way. Writing a crawler and preparing a set of XPath rules to
extract data is considered engineering work and it is beyond the scope of this
paper.
As a result of this step, crawled lots are stored in a structured file and can be
treated as observations, statistically speaking. Results often need refinement,
however. Missing data, encoding problems or typos are common issues, to
name a few. Some auction houses do not publish hammer prices directly on
their pages - they are doing it in the form of PDF files. Therefore, the data
extraction involves not only crawling, but also processing of PDF files. It may
be done using e.g. software like Apache Tika5 .
To make use of DBpedia in the presented research, it is needed to perform
data linking. For lots in auction sales, it can be done in two dimensions: artwork
title and artist name. This process shall help to establish a link between the
raw data obtained while crawling websites of auction house and DBpedia. By
linking, data enrichment may be achieved.
It can be assumed that if a given artwork has its own Wikipedia page, it is
somehow widely recognised and - therefore - that fact influences its hammer
price. Regarding the Polish art market, it is quite a rare situation when a lot
has its own page. The situation is quite different when it comes to the artist’s
page - a lot of information can be extracted, since many of creators of lots
have been described in detail (like Andy Warhol6 ).
Data linking basically relies on translating artists’ names to relevant resources’ URIs on DBpedia (pattern-based linking according to the classification provided by Paulheim et al.). Fortunately, it is a common situation where
an artist’s URI is constructed by concatenation of a name and a surname (with
an underscore in the middle). For example, for the famous Polish painter Wojciech Kossak the resource URI is constructed as presented in listing.
However, collected data about artwork creators can contain mistakes (e.g.
misplaced first and second name) and typos. Moreover, data from auction
houses may be simply incompatible - for example, one auction house publishes the artist name and surname written in capitalised letters, whereas
another one stores it altogether with the artists’ date of birth. Regular ex3
4
5
6
http://nutch.apache.org
http://scrapy.org
https://tika.apache.org
https://en.wikipedia.org/wiki/Andy_Warhol
6
Dominik Filipiak, Agata Filipowska
pressions can cope with simple and repeatable data transformation, but they
are not enough. Therefore, fuzzy string matching algorithms are employed.
Unfortunately, fuzzy string matching is not an ultimate solution to overcome
all problems related to typos. For instance, a clustering algorithm based on a
Levensthein distance may very well deal with typos in artists’ names.
The situation where an artwork’s author is not known is not a rare event,
and therefore sometimes information like ”XIX century” is presented instead
of an expected name and surname. For a matching algorithm based on Levenhstein distance, there is barely any difference between ”XX century” and
”XIX century”. On the other hand, more sophisticated solutions may not handle the simplest cases. That is why it is needed to discard the choice of one
conventional solution.
OpenRefine7 , a free data cleansing tool, is used in this research to maintain
and pre-process data collected by crawlers. It comes with a set of clustering
algorithms to group and pinpoint similar entries in the collected data. Not to
be confused with a broad definition of clustering, in this particular application
it is understood as finding groups of different values that might be alternative
representations of the same thing. A set of provided clustering algorithms
contains these based on key collision methods, n-gram fingerprint, k-nearest
neighbours, Levensthein distance and PPM8 . As a result, a user may select
which rows present the same (semantic) information and choose a common
lexical content to perform a unification.
This solution needs interaction with a user and requires minimum art history knowledge, but since it can’t be done fully automatically, OpenRefine
comes with a convenient set of tools to overcome this problem. Linked data
paves the way to the art market index enrichment by providing more detailed
information. Regarding this section, future work may consider writing a rulebased fuzzy string matching, which will suit the best algorithm to the specific
content and perform transformation on the fly.
A set of newly obtained explanatory variables (like mentioned style or
important works) can be employed in any regression model which uses lots’
qualities as explanatory variables. As in this very basic hedonic example, estimated by Ordinary Least Squares method:
ln Pit = α +
z
X
j=1
βj Xij +
τ
X
γt Dit + εit
(1)
t=0
where ln Pit is a natural logarithm of a price of a given painting i ∈ {1, 2, ..., N }
at time t ∈ {1, 2, ..., τ }; α, β and γ are regression coefficients for parameters.
Xij represents hedonic variables included in the model, whereas Dit stands for
time dummy variables.
A set of hedonic variables includes the artist name, a painting’s size, year
of creation and all other information obtained in the data collection process as
well as the enrichment process to describe a given painting. Some of variables
7
8
https://github.com/OpenRefine/OpenRefine
https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth
Advantages of Application of Data Enrichment Methods for the Art Market
7
are numeric (like size or price), others are so-called dummy variables (they
are equal to 0 or 1, for example denoting an artist’s affiliation). With a huge
amount of well-described observations, the next step is building an index. The
construction of a simple art market index considers the following equation:
Indext = eγt
(2)
Because of an enriched set of Xij , estimated γt is more accurate. As a
consequence, a yielded index for a period t is more precise. The usage of an
ancillary data source can also help to discover new statistically significant
concepts related to artworks. It is a clear illustration of an application of
the presented method due to its simplicity. As it was mentioned, enriching
information is suitable for any method taking into account a lot’s features.
4 Evaluation
During the experiment we used data from the four biggest Polish auction
houses – Desa Unicum, Agra Art, Polswiss Art, and Rempex. Data cleansing
tools were applied as described in the previous section. Table 1 depicts some
basic characteristics of the dataset. By observation we mean a single lot sold or
not during an auction. A quick glimpse at the data shows some disproportions
in the number of unique authors. Desa Unicum has almost two times more
artists compared to Agra Art, having a very similar amount of sold lots. This
may be due to the fact that the first auction house is known from regularly
promoting young artists. It is also worth to mention that data are obtained
from different years for various auction houses. Since this article is focused
on improving the quality of the data, this is not an obstacle. For the index
building process, however, the data should be well balanced regarding years
of sales.
Table 1 Datasets characteristcs
Auction House
Characteristics
Number
Desa Unicum
observations
unique authors
25837
5691
Agra Art
observations
unique authors
23746
2792
Polswiss Art
observations
unique authors
2599
1584
Rempex
observations
unique authors
14758
3846
Birth and death dates, style and nationality are among the most commonly used hedonic variables in the art market domain. Therefore we examined artists in terms of the presence of four RDF properties:
8
–
–
–
–
Dominik Filipiak, Agata Filipowska
dbpedia-owl:birthDate for artists’ birth dates,
dbpedia-owl:deathDate for artists’ death dates,
prop-pl:styl for artists’ styles,
prop-pl:narodowość for artists’ nationalities.
SPARQL queries were written to connect raw artist information and DBpedia entities. Tables 2 and 3 summarise the conducted experiment in quantitative terms. There is a small number of caveats which concern the assessment
of the presented method. Since many lots of a popular author can be found in a
given auction house offer, figures behind enriching unique authors compared to
overall changes in datasets show a huge disproportion. The presented method
might affect nearly one quarter of the whole dataset in terms of updated observations, whereas it constitutes only up to roughly 5% of the number of unique
artist. It is then no surprise that popular artists whose artworks are being
sold most frequently are the ones which have the most complete description
on DBpedia.
Table 2 Number of found entities in the Desa Unicum dataset
Auction House
Attribute
Entites found
Distinct entities
Desa Unicum
Birth date
Death date
Style
Nationality
4145
3556
1942
2850
380
277
58
25
Agra Art
Birth date
Death date
Style
Nationality
7253
6491
3809
5964
303
230
49
26
Polswiss Art
Birth date
Death date
Style
Nationality
270
223
120
188
83
64
49
20
Rempex
Birth date
Death date
Style
Nationality
4566
4075
2714
2917
354
284
59
30
(all)
Birth date
Death date
Style
Nationality
16234
14345
8609
11925
605
434
80
42
Table 4 shows results of the experiment regarding artists’ nationality.
As it was expected, the Polish nationality dominated this ranking. Artists
of Lemkos, Austrian, Lithuanian, and Jewish descent were also popular. A
closer look at the results shows some inconsistency in Wikipedia/DBpedia
conventions. For example – with regard to nationality – polska, Polak, Polka
means the very same (Polish), as well as http://pl.dbpedia.org/resource/Polacy
Advantages of Application of Data Enrichment Methods for the Art Market
9
Table 3 Changes in the dataset
Auction House
Attribute
Observations updated
Unique authors updated
Desa Unicum
Birth date
Death date
Style
Nationality
16.04%
13.76%
7.52%
11.03%
6.68%
4.87%
1.02%
0.44%
Agra Art
Birth date
Death date
Style
Nationality
30.54%
27.34%
16.04%
25.12%
10.85%
8.24%
1.76%
0.93%
Polswiss Art
Birth date
Death date
Style
Nationality
10.39%
8.58%
5.54%
7.46%
5.24%
4.04%
1.77%
0.44%
Rempex
Birth date
Death date
Style
Nationality
30.94%
27.61%
18.39%
19.77%
9.20%
7.38%
1.53%
0.78%
(all)
Birth date
Death date
Style
Nationality
24.25%
21.43%
12.86%
17.81%
4.49%
3.22%
0.59%
0.31%
(an entity for Poles) and http://pl.dbpedia.org/resource/Polska (an entity for
Poland). Therefore, as a continuation and next step of this research it will be
needed to tackle this issue by mitigating the problem of semantically similar
entities and strings, with regard to grammatical and language-specific differences. This issue is a complex problem, since Poland, for instance, intuitively
has an identical meaning as Poles in this context, whereas technically these
words mean different things.
Table 4 Most popular nationalities found in the whole dataset
prop-pl:narodowość
1
2
3
4
5
6
7
8
9
10
polska
http://pl.dbpedia.org/resource/Polacy
Polak
http://pl.dbpedia.org/resource/Lemkowie
Polka
http://pl.dbpedia.org/resource/Polska
Polska
austriacka
litewska
http://pl.dbpedia.org/resource/Żydzi
count
5032
5028
715
243
185
167
95
81
64
52
The carried out experiment shows that Polish artists representing Realism,
Impressionism and Symbolism are among the most popular in Polish auction
10
Dominik Filipiak, Agata Filipowska
houses, at least those who are structurally described on Wikipedia (Table 5).
However, these results are biased in the same way as the nationalities are.
Plain strings appear to be mixed with DBpedia entities (symbolizm and the
named entity Symbolizm as a symbolism). Another example of inconsistency
is related to the Realism entity representation in two ambiguous forms (Realizm (malarstwo) and Realizm).
Although a large number of entities were updated, there has to be some
post-processing to disambiguate all these entities and strings before using the
results of this method in, for example, regression analysis. Detailed results for
particular auction houses can be found in Appendix A in tables 6, 7, 8, 9, 10,
11, 12, 13
Table 5 Most popular styles found in the whole dataset
prop-pl:styl
1
2
3
4
5
6
7
8
9
10
http://pl.dbpedia.org/resource/Realizm (malarstwo)
http://pl.dbpedia.org/resource/Impresjonizm
http://pl.dbpedia.org/resource/Symbolizm
http://pl.dbpedia.org/resource/Modernizm (sztuka)
symbolizm
http://pl.dbpedia.org/resource/Ekspresjonizm (sztuka)
http://pl.dbpedia.org/resource/Realizm
http://pl.dbpedia.org/resource/Sztuka konceptualna
koloryzm
http://pl.dbpedia.org/resource/Prymitywizm (malarstwo)
count
1429
1008
704
581
504
448
301
250
244
243
5 Summary
In this paper we have shown a method for improving the quality of art market
datasets. As an example, the method has been tested on the Polish art market
data and yielded sound results. Realism, Impressionism and Symbolism were
the most popular styles among artists whose work is sold at Polish auction
houses.
Initial results are promising, but there is still room for improvement. The
most important directions consider refining the results obtained in this research, such as linking nationalities with the same semantic meaning but different grammatical form. This research was based only on the Polish DBpedia. Employing other languages might help to analyse other artists, especially
these which weren’t born in Poland. Auction houses also often provide an
artist and/or artwork description. After employing Natural Language Processing methods, this may be used as a source of additional data. Providing
the widest range of possible hedonic variables is an ultimate goal behind this
research. Future work will include the employment of deep neural networks,
which should precisely connect particular artworks with styles [4] basing only
Advantages of Application of Data Enrichment Methods for the Art Market
11
on a single image. Having done the enrichment, it is possible to conduct a
traditional index-based art market research.
References
1. Abramowicz, W., Kaczmarek, T., Wecel, K.: How Much Intelligence in the Semantic
Web? In: P.S. Szczepaniak, J. Kacprzyk, A. Niewiadomski (eds.) Advances in Web Intelligence, Lecture Notes in Computer Science, vol. 3528, pp. 1–6. Springer Berlin Heidelberg (2005). DOI 10.1007/11495772 1. URL http://dx.doi.org/10.1007/11495772_1
2. Bocart, F.Y., Hafner, C.M.: Econometric analysis of volatile art markets. Computational
Statistics & Data Analysis 56(11), 3091–3104 (2012). DOI 10.1016/j.csda.2011.10.019.
URL http://www.sciencedirect.com/science/article/pii/S0167947311003902
3. Collins, A., Scorcu, A., Zanola, R.: Reconsidering hedonic art price indexes. Economics
Letters 104(2), 57–60 (2009). DOI 10.1016/j.econlet.2009.03.025. URL http://www.
sciencedirect.com/science/article/pii/S0165176509001165
4. Filipiak, D., Agt-Rickauer, H., Hentschel, C., Filipowska, A., Sack, H.: Quantitative
analysis of art market using ontologies, named entity recognition and machine learning:
A case study (2016)
5. Ginsburgh, V., Mei, J., Moses, M.: The Computation of Prices Indices. In: Handbook
of the Economics of Art and Culture, vol. 1, pp. 947–979. Elsevier (2006). URL http:
//ideas.repec.org/h/eee/artchp/1-27.html
6. Jones, A.M., Zanola, R.: Retransformation bias in the adjacent art price index.
ACEI Working Paper Series (1) (2011). URL http://ideas.repec.org/p/cue/wpaper/
awp-01-2011.html
7. Kräussl, R., Wiehenkamp, C.: A call on art investments. Review of Derivatives Research
15(1), 1–23 (2011). DOI 10.1007/s11147-011-9061-x. URL http://link.springer.com/
10.1007/s11147-011-9061-x
8. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal (2014)
9. Locatelli-Biey, M., Zanola, R.: The sculpture market: An adjacent year regression index.
Journal of Cultural Economics pp. 65–78 (2002). URL http://www.springerlink.com/
index/H3764046423N7P74.pdf
10. Mei, J., Moses, M.: Art as an Investment and the Underperformance of Masterpieces
(2002). URL Moreinfo:http://www.collectorcarbook.com/author.html
11. Paulheim, H., Ristoski, P., Mitichkin, E., Bizer, C.: Data mining with background knowledge from the web (2014)
12. Renneboog, L., Spaenjers, C.: Buying beauty: On prices and returns in the art market.
Management Science 59(1) (2013)
13. van der Waal, S., Wecel, K., Ermilov, I., Janev, V., Milosevic, U., Wainwright, M.:
Lifting open data portals to the data web. In: S. Auer, V. Bryl, S. Tramp (eds.) Linked
Open Data – Creating Knowledge Out of Interlinked Data, Lecture Notes in Computer
Science, vol. 8661, pp. 175–195. Springer International Publishing (2014). DOI 10.1007/
978-3-319-09846-3 9. URL http://dx.doi.org/10.1007/978-3-319-09846-3_9
12
Dominik Filipiak, Agata Filipowska
A
Appendix: Detailed Results
Table 6 Most popular nationalities found in the Desa Unicum dataset
prop-pl:narodowość
1
2
3
4
5
6
7
8
9
10
polska
http://pl.dbpedia.org/resource/Polacy
Polak
http://pl.dbpedia.org/resource/Lemkowie
http://pl.dbpedia.org/resource/Polska
Polka
Polska
amerykańska
http://pl.dbpedia.org/resource/Żydzi
Żydowska
count
1181
1099
217
117
72
44
38
16
15
9
Table 7 Most popular styles found in the Desa Unicum dataset
prop-pl:styl
1
2
3
4
5
6
7
8
9
10
count
http://pl.dbpedia.org/resource/Realizm (malarstwo)
http://pl.dbpedia.org/resource/Symbolizm
http://pl.dbpedia.org/resource/Impresjonizm
http://pl.dbpedia.org/resource/Prymitywizm (malarstwo)
http://pl.dbpedia.org/resource/Modernizm (sztuka)
symbolizm
http://pl.dbpedia.org/resource/Ekspresjonizm (sztuka)
http://pl.dbpedia.org/resource/Art déco
http://pl.dbpedia.org/resource/Realizm
http://pl.dbpedia.org/resource/Sztuka konceptualna
Table 8 Most popular nationalities found in the Agra dataset
prop-pl:narodowość
1
2
3
4
5
6
7
8
9
10
http://pl.dbpedia.org/resource/Polacy
polska
Polak
Polka
austriacka
litewska
http://pl.dbpedia.org/resource/Polska
Polska
http://pl.dbpedia.org/resource/Żydzi
Żydowska
count
2734
2443
320
110
73
64
46
36
32
29
259
213
183
117
116
111
108
66
63
53
Advantages of Application of Data Enrichment Methods for the Art Market
13
Table 9 Most popular styles found in the Agra dataset
prop-pl:styl
1
2
3
4
5
6
7
8
9
10
count
http://pl.dbpedia.org/resource/Realizm (malarstwo)
http://pl.dbpedia.org/resource/Impresjonizm
symbolizm
http://pl.dbpedia.org/resource/Modernizm (sztuka)
http://pl.dbpedia.org/resource/Symbolizm
http://pl.dbpedia.org/resource/Sztuka konceptualna
koloryzm
http://pl.dbpedia.org/resource/Secesja (sztuka)
http://pl.dbpedia.org/resource/Styl zakopiański
http://pl.dbpedia.org/resource/Surrealizm
840
583
234
192
167
155
146
131
125
109
Table 10 Most popular nationalities found in the Polswiss dataset
prop-pl:narodowość
1
2
3
4
5
6
7
polska
http://pl.dbpedia.org/resource/Polacy
Polak
http://pl.dbpedia.org/resource/Polska
Polka
http://pl.dbpedia.org/resource/Lemkowie
http://pl.dbpedia.org/resource/Żydzi
count
106
62
15
4
3
2
2
Table 11 Most popular styles found in the Polswiss dataset
prop-pl:styl
1
2
3
4
5
6
7
8
9
10
http://pl.dbpedia.org/resource/Realizm (malarstwo)
symbolizm
http://pl.dbpedia.org/resource/Symbolizm
http://pl.dbpedia.org/resource/Realizm
http://pl.dbpedia.org/resource/Modernizm (sztuka)
http://pl.dbpedia.org/resource/Ekspresjonizm (sztuka)
http://pl.dbpedia.org/resource/Neoekspresjonizm
http://pl.dbpedia.org/resource/Abstrakcja konkretna
http://pl.dbpedia.org/resource/Ekspresjonizm abstrakcyjny
http://pl.dbpedia.org/resource/Sztuka konkretna
count
17
13
11
11
9
8
7
7
7
7
14
Dominik Filipiak, Agata Filipowska
Table 12 Most popular nationalities found in the Rempex dataset
prop-pl:narodowość
1
2
3
4
5
6
7
8
9
10
polska
http://pl.dbpedia.org/resource/Polacy
Polak
http://pl.dbpedia.org/resource/Lemkowie
http://pl.dbpedia.org/resource/Polska
Polka
Polska
Żydowska
czeska
http://pl.dbpedia.org/resource/Rosjanie
count
1302
1133
163
124
45
28
21
13
9
9
Table 13 Most popular styles found in the Rempex dataset
prop-pl:styl
1
2
3
4
5
6
7
8
9
10
http://pl.dbpedia.org/resource/Symbolizm
http://pl.dbpedia.org/resource/Realizm (malarstwo)
http://pl.dbpedia.org/resource/Modernizm (sztuka)
http://pl.dbpedia.org/resource/Ekspresjonizm (sztuka)
http://pl.dbpedia.org/resource/Impresjonizm
symbolizm
http://pl.dbpedia.org/resource/Prymitywizm (malarstwo)
http://pl.dbpedia.org/resource/Realizm
http://pl.dbpedia.org/resource/Art déco
http://pl.dbpedia.org/resource/Piktorializm
count
313
313
264
244
236
146
124
118
101
87