Corpus-based analysis of the Czech syllable

Corpus-based analysis of the Czech syllable
Aleš Bičan, Institute of the Czech Language, Academy of Sciences of the Czech Republic,
<[email protected]>
(1)
(a)
(b)
Goals of the presentation
To provide an overview of the syllable structure in Czech backed up with statistical data.
To show that the co-occurrence of syllables within words is not random, but is subject to
certain tendencies.
(2)
(a)
Facts about the syllable in Czech
A Czech syllable must contain either a vowel or a syllabic liquid (/r/ or /l/)
ven “outside”, pár “couple”, brouk “beetle”
prst “finger”, vlk “wolf”
A Czech syllable can being with no consonant or up to five consonants
on “he”, jen “only”, prase “pig”, strach “fear”, vzpřímený “erect”, vzkvět [fskvjɛt]
“prosperity”
A Czech syllable can end in no consonant or up to three consonants
já “me”, pes “dog”, kost “bone”, zábst “to freeze”
For syllabification see Kučera – Monroe (1968), Ludvíková in Těšitelová et al. (1985).
(b)
(b)
(c)
(3)
Previous statistical analyses:
Kučera – Monroe 1968 (43,000 syllables)
Ludvíková 1985 (5,000 syllables)
– both based on actual texts
My source: Slovník spisovné češtiny (third edition 2003)
– 146,754 syllables in over 46,000 words (a part of the Czech Phonological Lexical
Corpus, <http://www.ujc.cas.cz/phword>)
– based on lexemes
(4)
Logically possible syllable types for Czech
-Ø
-C
-CC
-CCC
V
VC
VCC
VCCC
ØCV
CVC
CVCC
CVCCC
CCCV
CCVC
CCVCC
CCVCCC
CCCCCV
CCCVC
CCCVCC
CCCVCCC
CCCCCCCV
CCCCVC
CCCCVCC CCCCVCCC
CCCC-
(5)
Percentual proportion of the syllable types
-Ø
-C
-CC
3.01 %
1.41 %
0.1 %
Ø47.99 %
18.22 %
3.74 %
C17.02 %
5.51 %
1.05 %
CC1.31 %
0.38 %
0.05 %
CCC0.05 %
0.018 %
0.001 %
CCCC69.39 %
25.55 %
4.94 %
Total
-CCC
0.0007 %
0.08 %
0.03 %
0.005 %
0%
0.12 %
Total
4.52 %
70.03 %
23.62 %
1.76 %
0.07 %
100 %
1
(6) Percentual proportion of the syllable types in various classes of words
Class 1: Nouns, adjectives, numerals
-Ø
-C
-CC
-CCC
Total
2.8 %
1.37 %
0.11 %
0%
4.28 %
Ø51.68 %
14.85 %
4.93 %
0.07 %
71.54 %
C16.53 %
4.62 %
1.33 %
0.01 %
22.49 %
CC1.2 %
0.38 %
0.05 %
0.001 %
1.64 %
CCC0.04 %
0.018 %
0.001 %
0%
0.06 %
CCCC72.25 %
21.24 %
6.42 %
0.08 %
100 %
Total
Class 2: Pronouns
ØCCCCCCCCCCTotal
-Ø
1.57 %
50.63 %
21.07 %
0%
0%
73.27 %
-C
0.31 %
18.55 %
4.72 %
0%
0%
23.58 %
-CC
0%
2.83 %
0.31 %
0%
0%
3.14 %
-CCC
0%
0%
0%
0%
0%
0%
Total
1.89 %
72.01 %
26.1 %
0%
0%
100 %
-Ø
3.78 %
59.7 %
13.1 %
2.52 %
0.5 %
79.6 %
-C
0%
10.08 %
2.02 %
0%
0%
12.09 %
-CC
0%
4.79 %
1.26 %
0%
0%
6.05 %
-CCC
0%
1.76 %
0.50 %
0%
0%
2.27 %
Total
3.78 %
76.32 %
16.88 %
2.52 %
0.5 %
100 %
-Ø
2.69 %
49.46 %
24.57 %
1.45 %
0.04 %
78.22 %
-C
1.33 %
12.06 %
4.04 %
0.38 %
0.008 %
17.83 %
-CC
0.12 %
3.1 %
0.57 %
0.07 %
0.008 %
3.86 %
-CCC
0%
0.07 %
0%
0.03 %
0%
0.1 %
Total
4.15 %
64.69 %
29.17 %
1.93 %
0.06 %
100 %
-CCC
0%
0%
0%
0%
0%
0%
Total
6.82 %
72.38 %
19.97 %
0.83 %
0%
100 %
Class 3: Verbs
ØCCCCCCCCCCTotal
Class 4: Adverbs
ØCCCCCCCCCCTotal
Class 5: Prepositions, conjunctions, particles, onomatopoeia
-Ø
-C
-CC
4.16 %
2.16 %
0.45 %
Ø51.75 %
17.8 %
2.83 %
C14.81 %
4.66 %
0.45 %
CC0.33 %
0.33 %
0.17 %
CCC0%
0%
0%
CCCC71.05 %
24.96 %
3.99 %
Total
2
(7)
(a)
(b)
(c)
(d)
Conclusions from the tables under (6)
All parts of speech prefer the same syllable types: i) Open syllables, i) syllables ending in one
consonant, i) syllable type CV is most common.
Indeclinable words have a simpler syllable structure; they do not allow syllables beginning
with four consonants and ending in three consonants.
Pronouns have the simplest syllable structure: they only allow syllables to begin and end with
up to two consonants.
All closed syllables in verbs begin with a single consonant or a combination of no more than
two consonants.
(8)
Average number of syllables per word, and percentual proportions in words contains 1–
4 syllables
Words
Words
Words
Average
Words
containing containing containing containing
number of
3
4
2
syllables
1
syllables
syllables
syllables
per word
syllable
3.81 %
21.5 %
36.56 %
25.72 %
Nouns, adj., numer. 3.25
2.19
24.83 %
39.31 %
27.59 %
8.28 %
Pronouns
3.04
2.01
%
24.48
%
46.35
%
22.76
%
Verbs
3.21
2.16 %
21.11 %
40.72 %
27.55 %
Adverbs
37.46 %
38.11 %
17.26 %
5.54 %
Indeclinable words 1.96
Total
3.19
3.44 %
22.33 %
39.17 %
25.02 %
(9)
Percentual proportions of different nuclei
Occurrence in syllabic nuclei
Short vowels (S)
77.63 %
Long vowels (G)
19.10 %
Diphthongs: (D)
2.00 %
Syllabic liquids: (R) 1.28 %
(10) Vocalic patterns
E.g. SGS = words with three syllables, the first being short, the second long and the first
short: polévka “soup”
SS = words with two syllables, both being short, maso “meat”
(11) Five most common vocalic patterns for word containing 2–4 syllables
Two-syllable words
Three-syllable words
Four-syllable words
SS (48.39 %)
SSS (40.2 %)
SSSS (39.11 %)
SG (20.95 %)
SSG (20.42 %)
SSSG (24.02 %)
GS (16.14 %)
SGS (13.22 %)
SSGS (8.39 %)
SD (3.59 %)
GSS (6.41 %)
SGSG (5.08 %)
DS (2.86 %)
GSG (4.99 %)
SSGG (4.54 %)
3
(12) Percentual proportions of the number of closed syllables per word
Number of syllables in words
2
3
4
21,48 %
24,28 %
28,07 %
0 closed syllable
63 %
55,54 %
48,41 %
1 closed syllable
15,52 %
18,92 %
20,67 %
2 closed syllables
–
1,25 %
2,75 %
3 closed syllables
–
–
0,1 %
4 closed syllables
(13) Findings of the research
(a) A syllable can begin with four consonants (exceptionally with five) and end in three
consonants, but no syllable can contain more then six consonants in total.
(b) Preference for open syllables.
(c) Preference for syllables beginning with a single consonant.
(d) CV is the most common syllable type.
(e) These preferences are shared by all parts of speech, but the syllable structure of pronouns and
indeclinable words is less complex.
(f) Preference for words containing only short vowels.
(g) Preference for words containing one closed syllable at the end of words or no closed syllable
at all.
References
Czech Phonological Lexical Corpus: <http://www.ujc.cas.cz/phword>
Kučera, H. – Monroe, G. K. (1968): A Comparative Quantitative Phonology of Russian, Czech, and
German. New York: Elsevier.
Těšitelová, M. a kol. (1985): Kvantitativní charakteristiky současné češtiny. Praha: Academia.
4