A wordnet for the Balkans

A wordnet for the Balkans
As the information over the Internet has grown so too have
the complications in finding precisely what you are looking
for, especially if in a language other than your own. Aiding
Balkan researchers is BalkaNet, a multilingual lexical
database for several of the region’s languages.
Building on the results of the EuroWordNet project that developed
wordnets for West European languages, the BalkaNet IST project
extends the wordnet approach to the less studied Balkan languages
in an effort to strengthen ties between the academic and research
communities in the region and with others elsewhere in Europe.
First developed for English by Princeton University in the United States, wordnets consist of a
semantic lexicon that groups words into sets of synonyms called synsets, providing short
definitions and recording the various semantic relations between words. The result is a
combination of a multilingual dictionary, thesaurus and translation tool, which can be employed,
as in the case of BalkaNet, for a conceptual rather than word-specific Internet search engine.
BalkaNet incorporates Bulgarian, Greek, Romanian, Serbian and Turkish, as well as extending a
Czech wordnet previously developed by the EuroWordNet project.
“A researcher who is unsure of what keywords to use in a search could, for example, use this to
find additional keywords related to the information he is looking for because the system links
words to a concept in any of the languages we have incorporated,” explains project manager
Sofia Stamou. “It could be used by a Greek who needs to find synonyms in Bulgarian or Turkish,
something that is particularly useful for people in cross-border areas or researchers working in
different countries,” adds BalkaNet coordinator Dimitrios Christodoulakis.
The two project leaders at the University of Patras in Greece note that the wordnet approach
allows researchers to use their own words when carrying out a search, rather than being tied to
the specific wordings and rationale of electronic databases that would only produce a match if the
right keywords are used.
In the case of Balkanet, the project partners had to overcome the problem of a lack of existing
resources for Balkan languages, especially digitalised ones, and in some instances had to produce
their own lexicons. A pilot application was used to test the quality of the translation and the
completeness of the system. “We used it to align the wording and annotate George Orwell’s book
1984 across the six languages,” Stamou says.
The partners are currently using the system themselves and are planning to make it available to
the broader research community in the near future.
Contact:
Dimitrios Christodoulakis
Database Laboratory
Computer Engineering and Informatics Department
University Of Patras
GR-26500 Patras, Greece
Tel: +30-2610960385
Fax: +30-2610960438
Email: [email protected]
Source: Based on information from Balkanet, 18 Jan 2005
Legal notice:
This feature article is published by the IST Results service and offers news and views on innovations, emerging from EUfunded Information Society Research.
The views expressed in the article have not been adopted or in any way approved by the European Commission and should
not be relied upon as a statement of the Commission or the Information Society and Media DG.
© European Communities, 2005
Reproduction is authorised provided the source is acknowledged.