A wordnet for the Balkans As the information over the Internet has grown so too have the complications in finding precisely what you are looking for, especially if in a language other than your own. Aiding Balkan researchers is BalkaNet, a multilingual lexical database for several of the region’s languages. Building on the results of the EuroWordNet project that developed wordnets for West European languages, the BalkaNet IST project extends the wordnet approach to the less studied Balkan languages in an effort to strengthen ties between the academic and research communities in the region and with others elsewhere in Europe. First developed for English by Princeton University in the United States, wordnets consist of a semantic lexicon that groups words into sets of synonyms called synsets, providing short definitions and recording the various semantic relations between words. The result is a combination of a multilingual dictionary, thesaurus and translation tool, which can be employed, as in the case of BalkaNet, for a conceptual rather than word-specific Internet search engine. BalkaNet incorporates Bulgarian, Greek, Romanian, Serbian and Turkish, as well as extending a Czech wordnet previously developed by the EuroWordNet project. “A researcher who is unsure of what keywords to use in a search could, for example, use this to find additional keywords related to the information he is looking for because the system links words to a concept in any of the languages we have incorporated,” explains project manager Sofia Stamou. “It could be used by a Greek who needs to find synonyms in Bulgarian or Turkish, something that is particularly useful for people in cross-border areas or researchers working in different countries,” adds BalkaNet coordinator Dimitrios Christodoulakis. The two project leaders at the University of Patras in Greece note that the wordnet approach allows researchers to use their own words when carrying out a search, rather than being tied to the specific wordings and rationale of electronic databases that would only produce a match if the right keywords are used. In the case of Balkanet, the project partners had to overcome the problem of a lack of existing resources for Balkan languages, especially digitalised ones, and in some instances had to produce their own lexicons. A pilot application was used to test the quality of the translation and the completeness of the system. “We used it to align the wording and annotate George Orwell’s book 1984 across the six languages,” Stamou says. The partners are currently using the system themselves and are planning to make it available to the broader research community in the near future. Contact: Dimitrios Christodoulakis Database Laboratory Computer Engineering and Informatics Department University Of Patras GR-26500 Patras, Greece Tel: +30-2610960385 Fax: +30-2610960438 Email: [email protected] Source: Based on information from Balkanet, 18 Jan 2005 Legal notice: This feature article is published by the IST Results service and offers news and views on innovations, emerging from EUfunded Information Society Research. The views expressed in the article have not been adopted or in any way approved by the European Commission and should not be relied upon as a statement of the Commission or the Information Society and Media DG. © European Communities, 2005 Reproduction is authorised provided the source is acknowledged.
© Copyright 2026 Paperzz