Hyphenators — language specifications version 6.2.3.8/9. Languages: Our hyphenators are not based on a hyphenated dictionary data base. New hyphenator languages: Welsh, Latin, Kazakh (Latin/Cyrillic), Khmer (Cambodia), Northern Kurdish, Swahili, Zulu, Xhosa, Hebrew, Irish/Gaelic (see the Windows Unicode Demo). Recent upgraded hyphenator languages: German, Swiss German, Austrian German, French, Canadian French, Dutch, Flemish, Surinam Dutch, English (5x), Catalan (nova ortografia), Danish, Swedish (3x), Norwegian (2x), Nynorsk (2x), Spanish, Afrikaans, Tagalog, Bulgarian, Lithuanian, Latvian, Polish, Basque, Frisian, Azerbaijanian, Iberian Portuguese, Brazilian Portuguese, Afrikaans, Slovene, Estonian, Finnish, Bahasa Indonesia, Bahasa Melayu, Icelandic, Czech, Slovak, Italian. 81 language modules Dutch, Flemish, Surinam Dutch (Upgrade May 2017 ε < 0.00012 ‰) (3x) supports the generally accepted spelling (the Netherlands), progressive spelling (Belgium), and the 1996 & October 2005 spelling reforms — four principles have been integrated in one hyphenator. Support the Belgium (Flemish), Surinam and Dutch idiom. The hyphenator recognizes compound boundaries and covers the Dutch idiom in the most extensive way. The Dutch module has separate entries for the languages Dutch (1) Flemish (2) Surinam Dutch (3) Hyphenation/Orthographic reform 1995 & 2005, 2015, applies special hyphenations English (Upgrade March 2017, ε 0.00015 ‰) (6x) supports phonetical hyphenation according to the world’s most trusted dictionaries: Webster’s Third New International Dictionary, Webster’s New Twentieth Century Unabridged Dictionary (2nd edition), and Longman’s Dictionary of Contemporary English; based on an unabridged learning corpus, coming close in size to Webster’s Unabridged Dictionaries; a common hyphenator is available for the British, Canadian, and American idiom. The hyphenator solves the irregularity of the alternation of English strong and weak syllables. The new double layer model enables the user to disregard certain secondary divisions (adapt~able instead of adapt~a~ble). Hyphenation agrees with Oxford Colour Spelling Dictionary (1995). Compared to the other dictionaries, this last dictionary has fewer syllables. The English module has separate entries for British, American, Canadian, Australian, New Zealand and South African English, and supplies you with superior hyphenation and hyphenation density. British English (1), American English (2), Canadian English (3), Australian English (4)*, New-Zealand English (5)*, South-African English (6)*. * Br. English varieties comes with the British English module (1). German old and 2 varieties new (Upgrade June 2017, ε < 0.0014 ‰) (3x) supports every characteristic German hyphenation according to the most recent Duden Rechtschreibung August 2006-2013 and Wahrig 2009. For German reformed two hyphenation styles have been implemented, one in agreement with the Duden(s) 1996-2004, and the other prefers the Duden 2013 hyphenations, if meaningful strict eingedeutschte syllables are used. The German hyphenator recognizes compound boundaries independent of the spelling reform. The new feature for “der Verwendung von Großbuchstaben SS für ß” correctly hyphenates both “Schreibungsweisen”. A special effort has been made to support medical and other scientific domains. The German hyphenators have been compared to over two million German, Swiss German & Austrian German words as an independent estimate of accuracy. Hyphenation/Orthographic reform 1996 & 2006, applies special hyphenations Swiss German old and 2 varieties new (Upgrade June 2017) (3x) respond accurately to the typical Swiss German deviations and local idiom (including the ß to ss transcription). Hyphenation/Orthographic reform 1996 & 2006, applies special hyphenations Austrian German old and 2 varieties new (Upgrade June 2017) (3x) respond accurately to the typical Austrian German deviations and local idiom. Hyphenation/Orthographic reform 1996 & 2006, applies special hyphenations French (two versions, Upgrade June 2017) (2x) accepts etymological syllabification according to Grevisse’s “le bon usage.” A second version accepts phonetical hyphenation rules recommended by the leading French linguist Nina Catach in Paris. Both French versions use the new multi-layer technique to enable or disable hyphenation of muettes. The hyphenator covers French idiom nearly completely and has been tested on a full-size French corpus. Orthographic recommendations 1990, special hyphenations (muettes) Canadian French (two versions, Upgrade June 2017) (2x) accepts etymological syllabification according to Grevisse’s “le bon usage.” A second version accepts phonetical hyphenation rules recommended by the leading French linguist Nina Catach in Paris. Both Canadian French versions use the new double layer technique to enable or disable hyphenation of muettes. Orthographic recommendations 1990, special hyphenations (muettes) Spanish Penninsular, Argentine, Mexican & Latin American (Upgrade July 2016, ε < 0.0008 ‰) (2x) supports the official hyphenation rules as published in the Ortografía de la lengua española (2010), and large dictionary publishers; completely covers the Spanish and Latin American idiom. Be carefull with foreign words in Spanish! Do not hyphenate as “Burn the Wit-|ch” (El País, 23, 4-6-2016). It is not a syllable! Hyphenation/Orthographic reform 2010, 2015 Italian (Upgrade May 2014, ε 0.0008 ‰) supports phonetical hyphenation, in Italian: “la sillabazione: basata prevalentemente sul criterio di tenere uniti i gruppi consonantici attestati, anche una sola volta, come iniziale di parola”. In addition the new hyphenator handles hiatuses accurately, elisions (al-l’I.ta-lia), conjugations, declensions, and words that came from English and other foreign languages (beat-nik and not be-at-nik). Applies special hyphenations Iberian Portuguese (previous and acordo ortográfico) (Upgrade February 2016, ε 0.006 ‰) (2x) based on the vowel as the syllabic unit, but falling diphthongs and final diphthongs are kept together. Doubling of the hyphen is supported (repetir o hífen na linha sequinte). Hyphenation/Orthographic reform 2009, 2010, applies special hyphenations Brazilian Portuguese (previous and acordo ortográfico) (Upgrade February 2016, ε 0.006 ‰) (2x) based on the vowel as the syllabic unit, but falling diphthongs and final diphthongs are kept together. Doubling of the hyphen is supported (repetir o hífen na linha sequinte). Hyphenation/Orthographic reform 2009, 2010, applies special hyphenations Czech (Upgrade July 2014) supports the reformed spelling. As is the case in every Slavic language, a number of additive vowels and consonants exists, which have a large impact on hyphenation. Syllables that solely consist of consonants are supported (ji-tr-nice). Slovak (Upgrade July 2014) supports the standard Slovak orthography. As is the case in every Slavic language, a number of additive vowels and consonants exists, which have a large impact on hyphenation. Syllables that solely consist of consonants are supported (ji-tr-nice). Swedish (Upgrade January 2017 (3x), ε < 0.0009 ‰) accepts the mekaniska principen, but compounded words are divided into their morphological roots. An overwhelming occurrence of compounds, and newly created forms, makes it a challenge worth accepting. You can switch between c-k or ck- hyphenation, and between within-word vowel-vowel hyphenation. A third hyphenator model also supports morphological hyphenation as specified in the Svenska Akademiens ordlista över svenska språket (SAOL), however, Mediespråksgruppen (tt.se) tar avstånd från SAOL:s avstavningssystem, yet it has been introduced into the schooling system many years ago. The fact that Mediespråksgruppen rejected SAOL hyphenation might be related to technical hindrances to build a hyphenator just like that. Still it is possible given the use of a proper language model. All hyphenators are test against a 1.5 million word Swedish hyphenation corpus. Hyphenation/Reform 1998, applies special hyphenation Finnish (Upgrade January 2015, ε < 0.0010 ‰) is tuned to the peculiarities of the Finnish language and shares attributes with all Finno-Ugric languages. It has a rich structure, including a large number of falling and rising diphthongs. The phonetical base of the syllable is accepted, here, fully hyphenated despite it’s overwhelming inflection structure. Compounded words are divided into their elements: not kym~menot~te~lu but kym-men-ot-te-lu; not kynt~ti~läkruu~nu but kynt-ti-lä-kruu-nu. You may also find its resemblance to the neighboring Estonian remarkable. Applies special hyphenations Catalan (Upgrade (nova ortografia), May 2017) supports the mixed French and Spanish origins of the Catalan language. A peculiarity of Catalan, needing special care, is the l geminada (l·l). This version supports the spelling reform of October 2016. Orthographic reform October 2016, applies special hyphenations. Danish (Upgrade January, 2017, ε < 0.0001 ‰) accepts the hyphenation rules of the renewed Dansk Sprognævns Retskrivningsordbog (2012). Nearly any compound and newly created neologism is correctly hyphenated; please note that it even hyphenates Norwegian according to consonant rules. Hyphenation/Orthographic reform 2012, applies special hyphenations Norwegian (Upgrade January 2017, ε 0.0017 ‰) (2x) accepts consonant rules (20) or the morphological rules of the Nordisk institutt of the University of Bergen(21). Orthographic reform 2005, applies special hyphenations Nynorsk (Upgrade January 2017, ε < 0.0017 ‰) (2x) accepts consonant rules (20) or the morphological rules of the Nordisk institutt of the University of Bergen(21). Orthographic reform 2012, applies special hyphenations Icelandic (Upgrade July 2014, ε < 0.0018 ‰) accepts morphological rules which separate the attached article and nominative, dative, accusative, and genitive cases and is capable of dividing a pileup of compounds, even neologisms, e.g., að~al~aust~ur~dælu~kerf~is~ins, not að~a~l~aust~ur~dælu~kerf~is~ins; flugu~fót~anna, not flug~u~fót~anna; kassa~gít~ar, not kassagít~ar; hinu~meg~in not, don’t know hinumegin Estonian (Upgrade April 2015) behaves like the Finnish hyphenator and is capable of correctly hyphenating Estonian compounds and diphthongs. However, there are more diphthongs in the Estonian language than in the Finnish language which increases complexity. Taken orphans in to account we hyphenate as las~te~aia~laps and not as las~te~ai~a~laps, or as aa~to~mi~ener~gia~agen~tuu~ri and not as aa~to~mi~e~ner~gi~a~a~gen~tuu~ri. New Greek (Upgrade March 2011, ε < 0.0015 ‰) is tuned in to the Greek script, the Elot codepage or Unicode (Greek and Greek Extended). It hyphenates more than between alpha and omega — not just the beginning and the end (Classical Greek), but a new era in progress (Modern Greek). Present-day Greek has evolved and is flourishing with diacritics and highly significant for hyphenation. Latin (March 2011) is an extinct language, but still spoken in the Roman Catholic church. Together with Greek, Latin had an enormous influence on nearly any European language. The Latin hyphenator divides Latin prefixes, suffixes, and enclitics. Polish (Upgrade January 2014) hyphenation of the Polish language is hindered by an immense number of consonants, quite often unpronounceable for non-Polish speakers. However, the hyphenator’s model has been fully adapted to these difficult syllables and pronunciation of Polish words is the guiding priciple to hyphenate. Applies special hyphenations Latvian (Upgrade February 2016) is tuned to the properties of Baltic languages. Words are richly declined. Latvian uses additional consonants and vowels, which are recognized by the hyphenator. Azerbaijanian (Upgrade February 2016) is one of the new Transcaucasian republics that are now independent from the former USSR. Azerbaijanian is related to Turkish. The Azerbaijani now use a Latin script. There is no standard Byte CodePage script. Turkish (Upgrade January 2015) Present-day Turkish is spoken in SW Asia, but in earlier times the Turkish region reached into the north of China. In Chinese history, the name Tu-kiu was mentioned 600 years ago. Turkish is characterized by a lot of additive particles that change the meaning of a word. A word can take numerous forms and different parallel hyphenations. Applies non-visual special hyphenations Lithuanian (Upgrade February 2016) is one of the Baltic languages which is richly declined. The (semi-)diphthongs, palatals, and affricates have been taken into consideration for hyphenation. Afrikaans (Upgrade April 2016, ε < 0.0040 ‰) the Afrikaans language evolved from 17th-century Dutch and is an official language of South Africa. Its hyphenation has much in common with the Dutch language. Afrikanization of spelling has given the Afrikaans language its own identity. The Afrikaans hyphenator takes all Afrikaans peculiarities into consideration, including diaeresis hyphenation. Applies special hyphenations Russian (Upgrade July 2014) accepts Cyrillic characters, but does not complicate hyphenation. It is the nature of the Russian language: an abundance of prefixes and suffixes, modifying different moods in a fine gradation. Basque (Upgrade February 2016) the Basque language is one of Europe’s most exotic minority languages, probably unrelated to any other language in the world. The Basque hyphenator is tuned in to all those peculiarities of real-life language. Hungarian (Upgrade April 2006) the Hungarian language has lost many of its Uralic characteristics and many words have been borrowed from the Turkish and European languages. The language is flavoured with compounds and special hyphenations (briddzsel -> bridsz-dszel). Applies special hyphenations Bahasa Indonesia (Upgrade July 2014) the Bahasa Indonesia (Standard Indonesian) is an Austronesian language full of prefixes, suffixes, infixes, in general terms affixes including large classes of sound changes. Hyphenation is inextricably tied to meaning, even when the boundaries are masked by sound changes (mengarang from meng + karang) hyphenation is affected. Bahasa Melayu (Upgrade July 2014) what counts for Bahasa Indonesia applies as well to Bahasa Melayu. Byelorussian (Upgrade July 2007) is the language of the new nation of Belarus. It was proclaimed the country’s sole official language, but Russian remains dominant. Byelorussian is written in the Cyrillic alphabet. Bulgarian (Upgrade February 2016) is spoken by 90 % of the population of Bulgaria, 7 million people. Modern Bulgarian alphabet is the same as the Russian alphabet. Serbian (Upgrade July 2014) or srpski jezik is written in the Cyrillic alphabet. Serbian is closely related to Croatian, however, Serbian characters are written with single symbols Џ, Љ, and Њ. (Dž, Lj, Nj ). Like words in any Slavic language Serbian words can have many prefixes to be hyphenated. Galician (Upgrade July 2010) is now spoken in Spanish Galicia, situated north of Portugal. It is a Romance language related to Portuguese. The orthography differs slightly from Spanish. Applies special hyphenations Rhaeto-Romance (Februari 2002) is the collective for three Romance dialects spoken in the northeastern Italy and southeastern Switzerland. Greenlandic (April 2002) is an Eskimo language spoken in Greenland. Greenlandic is written in the Latin alphabet. Words can be very long and one word can be a complete sentence. Ukrainian (Upgrade July 2007) is the national language of Ukraine. It is spoken by a population of 35 million people. Ukrainian has many Polish loan words, but the influences of Russian can be found in the east of Ukraine too. Romanian (Upgrade June 2009) is the national language of Romania. It is a Romance language written in the latin script. One third of all Romanian words are of French origin. The Romanian hyphenator supports the Unicode characters s-comma below and t-comma below which replace s-cedilla and t-cedilla. Croatian (Upgrade July 2014) or hrvatski jezik is written in the Latin alphabet. Croatian is closely related to Serbian. Croatian includes a few digraphs which sound like a single consonant (Dž, Lj, Nj ). Like words in any Slavic language Croatian words can have many prefixes to be hyphenated. Bosnian (Upgrade July 2014) or Bosanski Jezik exists since Bosnia & Herzegovina became independent. Bosnian has developed its own identity, written in Latin and closely related to Croatian. Frisian (Upgrade February 2016) or Frysk is spoken in Friesland the northernmost province of The Netherlands. Frisian is closer related to English than Dutch. However, some of the hyphenation rules are similar to Dutch hyphenation rules. Applies special hyphenations Tagalog/Pilipino (Upgrade March 2016) is the national language of the Philippines. Three centuries of Spanish rule left a strong imprint on the vocabulary. The pre-, in- and suffixes to modify word meaning make hyphenation irregular. Slovene (Upgrade May 2015) or Slovenski jezik is written in the Latin alphabet. Slovene includes a few digraphs (Dž, Lj, Nj). Slovene has many prefixes and inflections. Some syllables divide consonants only: hm-kniti, kr-tina, tr-den. Applies special hyphenations (beseda z vezajem) Thai (Upgrade December 2004) The Thai people build sentences in a different way. Therefore, the Thai module is not a hyphenator in the traditional sense, but it is a word segmentation tool, that takes context into consideration. Applies non-visual special hyphenations Macedonian (Upgrade July 2007) is the principal language of the new nation of Macedonia, it is closely related to Bulgarian and written in the Cyrillic alphabet. Maltese (Upgrade January 2006) is one of the official languages of the islands of Malta, it is a Semitic language written in the Latin alphabet, including <ċ> <ħ> <ġ> and <ż>, the variety of root words has a great impact on hyphenation. Applies special hyphenations Sámi (Upgrade July 2014) hyphenation agrees with the Nord Saami language as spoken in Finnmark county in the north of Norway. Hebrew (December 2006) is written in Hebrew consonants only and therefore hyphenation is partially uncertain. Within this uncertainty the hyphenator accepts graphical hyphenations. Irish/Gaelic (December 2006) is a Celtic language mainly spoken in Ireland. Zulu (September 2008) is a Bantu language mainly spoken in the Republic of South Africa. Zulu is one of the 11 South African languages and is very different from Afrikaans and the other Indo-Euopean language and so is hyphenation: be~nga~ka~la~li, ma~fu~ngwa~se. Xhosa (September 2008) is a Bantu language mainly spoken in the Transkei, Ciskei and Eastern Cape regions of the Republic of South Africa. Xhosa is one of the 11 official South African languages. It is very different from Afrikaans and the other Indo-Euopean language, and so is hyphenation: isi~Mpo~ndo, ye~Bha~nki. Swahili (March 2009) is a Bantu language mainly spoken in East Africa. Swahili is principal language of Tanzania, Zanzibar, Uganga and many neighbouring countries. Hyphenation examples are: ne~nda, ni~ra~ku~pi~ga, ni~li~m~pi~ga, u~nga~ma. Kurdish (Northern) (July 2009) belongs to the Iranian group of languages. Kurdish is spoken in Turkey, Iraq, Iran, Armenia, Georgia and Azerbaijan. The latin script is used for the Northern variety of Kurdish. Khmer (Cambodia) (November 2009) belongs to the Austroasiatic languages. Khmer has its own script known as Aksar Khmer. In Khmer no spaces are inserted between words. Yet words have to be segmented, even unknown words. Kazakh (Latin/Cyrillic) (Upgrade May 2010) belongs to the Turkic family languages. Kazakh is written in the Cyrillic, Arabic or Latin script. An official transition to the Latin script could happen in a 10 to 12 year period. Despite being developed for the Latin script Cyrillic hyphenation is nearby. Welsh (August 2015) is a Celtic language which is hyphenated according to morphological principles. In addition a lot of sound changes (mutations) replicates these principles. visit download page June, 2017, *TALO bv, Bussum, The Netherlands
© Copyright 2026 Paperzz