Malayalam Text Compression

Malayalam Text Compression
Sajilal Divakaran
School of Engineering and
Computing Sciences,
FTMS College, Kuala Lumpur, Malaysia
[email protected]
Biji C. L.
University of Kerala,
Thiruvananthapuram,
Kerala, India 695581
[email protected]
Anjali C.
University of Kerala,
Thiruvananthapuram,
Kerala, India 695581
[email protected]
Achuthsankar S. Nair
University of Kerala,
Thiruvananthapuram,
Kerala, India 695581
[email protected]
Abstract
indicates that the Malayalam content is
steadily increasing since 2006. Moreover
the searchable archives of Malayalam
publications including eBooks and
journals are likely to increase in the
upcoming years. This opens up a way to
seriously think about a Malayalam text
compression for the optimum use of
resources. Every language normally has
certain hidden statistically significant
features
and
certain
redundancy.
Exploiting all these features help us to
frame a suitable text compression tool.
Being motivated by the language studies
of English based on Shannon theory, an
informational analysis of Malayalam
language text is being proposed in our
frame work. Interestingly all language
structure has certain bias to the input
message. Some Characters are more likely
to occur than others. In general, the
symbols in language follow an unequal
probability distribution.
In natural language processing and
analysis, a very large number of problems
remain unaddressed particularly in
Malayalam computing. For instance, the
informational analysis of Malayalam
language text is itself not widely studied.
Language studies of English, based on the
concepts of information theory are quite
well established, as evidenced by the
success of text compression methods for
English. However to the best of our
knowledge, not a single attempt has been
reported
about
Malayalam
text
compression even though the Unicode
based Malayalam content is increasing in
Malayalam blogs, Wikipedia and Websites.
The general motivation behind every
compression is the optimum use of
resources such as data, space or
transmission capacity.
The availability of standard Unicode
script and Google online language
translation service in the internet triggers
the use of Malayalam language. The
statistics of Malayalam Wikipedia clearly
Every compression algorithm tries to
represent the input message in a new
form with a fewer number of bits by
exploiting the probability distribution.
9
The proposed Malayalam text compressor
follows a variable length encoding
technique in which most probable
Unicode character is represented by less
number of bits. Moreover we were able
to derive a theoretical limit for Malayalam
text compression as 21%. A compression
tool is developed using Java/J2EE with
Apache tomcat as web server. Since
similar work was not reported we have
created a small dataset from the
Malayalam blogs, Wikipedia and Websites
for testing the performance of developed
tool.
The proposed Malayalam text
compressor based on variable length
coding has achieved a compression ratio
of 17% for the best case. The performance
analysis of proposed algorithm is carried
out by considering percentage of
compression and compression ratio.
developed
Tamil.
independent
of
Languages are generally carriers of
communication. The computer technology
has so advanced that people can now
convey messages and shares their
thoughts using their own mother tongue.
It is appreciable that Kerala Government
has
given
more
importance
to
Malayalam Computing in information
technology. This explores a new world of
opportunities for many, who are not even
proficient with English language to get in
touch with the global world. The
Malayalam content started appearing in
internet during early 2000’s. The
statistics of Malayalam Wikipedia content
shows a progressive rise since 2006.
Based on the statistics published by
Malayalam Wikipedia on April 2012[2],
there are nearly 24, 000 articles.
Keywords: Compression, Entropy coding,
Natural language processing
Wiki Malayalam Content Statistics
Introduction
Malayalam Content
I.
literature
Malayalam is the mother tongue of about
3 crore people residing in Kerala, the
southern state of India. To add a few
historical piece of information, Malayalam
is the youngest of the four major
Dravidian languages spoken in South
India and it is the official language of
Kerala. It is from the traditions of Sanskrit,
the Indo-Aryan language, that Malayalam
draws its rich diversity of words and
compound
alphabets
(conjuncts).
Malayalam is closer to the pre - Tamil
Malayalam in phonology, morphology and
syntax, the major feature which sets apart
the two being the heavy Sanskrit
borrowing in Malayalam [1]. It is only
from the 8th century AD that Malayalam
30000
25000
20000
15000
10000
5000
0
YEAR
Figure 1.1: Wiki Malayalam Content Statistics
Growth rate of Malayalam article
displayed in Figure 1.1 clearly indicates a
need for Malayalam compression tools in
the mere future. Compression is required
for effective storage of information and
for its smooth transmission over channel.
10
Compression is employed everywhere
starting from images found in web, which
in general follows JPEG or GIF standards
and audio files follow mp3 standard.
Moreover
several
file
system
automatically compress the file, when
stored. The possibility of compression
was first studied in detail with English
language by the great American
Mathematician Claude Elwood Shannon.
The seminal paper of Claude E. Shannon
[3] clearly stated that the sequence of
English language are not framed in
random, it usually follow a statistical
structure. For example, the occurrence of
‘e’ is more frequent than ‘q’. This
structure can be exploited to achieve a
smaller representation of input file. With
the same assumption, as a first step we
took the frequency of occurrence of
Malayalam characters from a study report
[4]. In addition to the character specified
in the report, space and full stop was
included. The informational analysis of
Malayalam text is carried out by creating
a dataset from popular Malayalam blogs
and websites [5, 6]. Till date, state of the
art works are not known to perform
compression in Malayalam text. Hence no
benchmarks exist for comparison with the
proposed work.
II.
statistical nature of communication
process was first recognized by the great
research Mathematician Claude Elwood
Shannon and he used mathematics to
unify the theory [3]. In the famous work,
C.E. Shannon emphasized that languages
are not framed in random manner, there
is a specific style being followed in
framing
language.
Most
of
the
advancement in digital technology ever
since happened including the art of
connecting people together through the
social networking sites, blogs, and email
has the inspiration of C.E. Shannon’s novel
idea.
The Information Content in a message is
the amount of surprise it creates in us [7];
in other words an unusual scenario has
more information than a usual scenario.
Shannon defined the measure of
information contained in a message,
based on the probability of each symbol in
it. Suppose there are n symbols {a1, a2 …an}
emanating independent of each other
from a source, with probabilities {p1,
p2 …pn} respectively. Then the information
content of any message of size k made out
of these symbols is given by
k
I   log pi …………..… (1)
i 1
I.e. Information content of an English
word such as “vande matharam” can be
computed using standard probability of
occurrence of English alphabet [7], as
54.79.
The symbol ai which has a
probability of pi to occur, is expected to
occur n*pi times in the whole message.
Thus the total information IT, of the
message is given by
Entropy and Compressibility
Communication is the process of sharing
ideas, thoughts, facts and information
from one person to another. Languages
are being developed as a mean to provide
effective communication. Irrespective of
the diversity in human biological traits,
every communication system follows a
common process of transmitting message
from one point to another. The hidden
IT   (n * pi ) log pi
11
…………. (2)
And the average information per symbol
is the Information Entropy H, given by
H  IT / n  1 / n *  (n * pi ) log pi   pi log pi .… (3)
be theoretically compressed by almost
21%. Practically, algorithm overheads will
make the possible compression a little
lesser. Based on these observations we
created a simple statistically significant
Malayalam text compression tool.
Entropy is the measure of uncertainty.
The significance of Information entropy is
that it tells us the minimum number of
bits required to encode the message
digitally. Thus entropy provides a lower
bound for the best possible lossless
compression strategy. Intuitively, entropy
reveals the extent to which a message can
be compressed. C.E. Shannon used English
language to define a measure of
information [3]. The number of bits
required to represent English text, if all
letters and space are considered to have
the same probability, is log2 (27) = 4.75
bits. Ideally, different letters of English
alphabet has different occurrence rate.
English text can be compressed to 42.55%,
by taking the advantage of redundancy [7].
III.
Malayalam Text Compressor
Compressed Data
Variable Length
Encoding
Malayalam
Text
A similar approach can be adapted to
Malayalam language. In the experimental
analysis, based on the report [5], the
frequency of occurrence of Malayalam
character is selected. In addition, we
considered the frequency of occurrences
of space as well as full stop and computed
the entropy. Thus the total character set
under consideration for our study is
limited to 125 (Appendix Table 1).
Similarly, in the case of Malayalam
Language, the number of bits required to
represent, if all letters, space and full stop
are considered to have the same
probability, is log2 (125) = 6.97 bits and
the calculated entropy based on its
frequency of occurrence is ( Appendix
Table 1) 5.47bits. The percentage of
compressibility for Malayalam language
may be computed as 21%. Thus it is
possible to state that Malayalam text can
Unicode
Translation
The general motivation for developing
compression tool is the effective use of
resources. It also makes the file transfer
easy and fast. We tried to provide a
prototype compressor and de-compressor
for the Malayalam Language. The
development of any compression tool has
two main stages (i) an encoding algorithm
which takes a message and generates a
new compressed representation with a
fewer bits. (ii) A decoding algorithm that
reconstructs the original message from
compressed representation [8]. Encoding
forms the heart of any compression
algorithm. Encoding is of two types (i)
Fixed length encoding and (ii) Variable
length encoding. By taking the advantage
of probabilistic model, a variable length
code is preferred for better compressed
representation. This helps to reduce the
storage requirement of files. Figure 3.1
shows the schematic representation of
proposed algorithm.
Figure 3.1: Schematic Representation of proposed
Malayalam Text Compressor
12
The input to the compressor is a
Malayalam text file in UTF-8 encoding.
The Malayalam alphabets include vowels
(svaram), consonants (vyanjanam) and
chills. In our experiments, we have
selected a total of 125 Malayalam
characters. Since our main intention is to
develop a prototype, no separate study is
conducted to find out the probability
values of Malayalam characters. We have
used the results of the study report [4] for
finding the entropy of Malayalam. Similar
studies in English language shows that
space is having a frequency of occurrence
slightly higher than the most frequent
letter ‘e’ and among them punctuations
(here full stop only) are having the fourth
place. The same is used for our work also.
bit Unicode representation. The output of
compression algorithm includes the
variable length code along with a
overhead of 3 bit codes.
The
decompression algorithm takes the
compressed file along with overhead
and does the reverse operation to obtain
back the Unicode Malayalam Characters.
IV.
Results & Discussion
To test our compression algorithm, in the
absence of standard dataset, we report
the results in five selected web resources.
We have chosen them to ensure a mix of
classical and modern writing. Textual
content from the following web resources
[9, 10, 11,12, 13] have been taken,
 Wiki Grandasala (Contains the
classical poems and classical
articles).
 Wiki esopkathakal (Contains many
Malayalam esop fables)
 Mini-minilokaam
(A
popular
contemporary blog contains many
short stories)
 Pattepedam ramji blogspot ( A
popular blog contains many stories)
 Mathrubhumi News (A popular
malayalam newspaper)
The first step is to convert the Malayalam
sequence to be compressed into
corresponding Unicode (Appendix Table
1). Unicode is a 16-bit fixed code that
assigns a unique number to every
character in use. It is a standard used for
storing and transmitting documents in
natural languages like Malayalam, Spanish,
and Chinese etc. Based on the probability
of occurrence of each Unicode character, a
variable length encoding is performed.
The most probable Unicode characters
are represented by shorter codes and vice
versa. Based on our experimentation, we
found that 125 characters can be
represented by codes with length of 1 – 6
bits [Appendix table 2]. The total number
of characters under consideration is 125.
Ideally, all 125 characters can be
represented using a fixed 7 bit
representation. Inorder to provide a
realistic performance analysis, we
compare our result with standard 7 bit
representation rather than considering 16
The selected files have size from 1KB to
1MB. In each case we have chosen 5
different text passages. The results are
given in Table 4.1 & Figure 4.1.
13
Input File 5
Input File 4
Input File 3
Input File 2
12
10
8
6
4
2
0
Input File 1
% Compression
Performance Measure
Malayalam Content: Classical Poems
Figure 4.1. a. Percentage Compression of Proposed
Malayalam Compressor
It was noticed that when the input files
have Malayalam characters along with
numbers and other symbols apart from
the 125 characters we have selected, the
percentage compression is reduced to
much lower level. Inorder to analyze the
performance we have selected some
famous literary poems written by the
great poets kumaranasan and Ulloor. The
poems selected are Nalini, Leela, Karuna,
ChandalaBikshuki and Bhakthi Deepika.
As a worst case, the proposed
compression algorithm provides a
compression of 8.4%.
Table 4.1 Performance Measure of Proposed Algorithm
In normal case we need 7 bits to
represent the whole character set (125
characters). As per our proposed method
a variable code is assigned based on the
probability distribution functions. The
percentage compression [7] for proposed
algorithm is calculated as follows.
00
The proposed compressor provides best
compression of 17%.
These values
indicate that we have reached up to 75%
of the theoretical limit dictated by
entropy which is 21%. As a worst case the
proposed
algorithm
provides
a
compression of 8.4%. Figure 4.1.a, 4.1.b,
4.1.c shows the performance measure for
various selected input files.
Input File 5
Input File 4
14.5
14
13.5
13
12.5
12
11.5
11
10.5
Input File 1
% Compression
Performance Measure
Input File 3
it equired
it required
) (
)
before Compression
after Compression
)
it equired before Compression
Input File 2
(
Compression (
Malayalam Content: Short Stories
Figure 4.1.b Percentage Compression of Proposed
Malayalam Compressor
14
When the input files have less symbols
and numbers, the percentage of
compression is further improved to an
average of 12.7%.
For the analysis
we used malayalam content from
Wiki esopkathakal, Mini-minilokaam,
Pattepedam ramji blogspot. When we
selected passages containing only the
considered 125 Malayalam characters, the
percentage of compression is further
improved to 17%.
The Malayalam
content is extracted from editorials of
online Mathrubhumi news.
required to represent Malayalam text, if
we consider that all letters, space and full
stop have same probability is log2 (125) =
6.97 bits. Based on this it can be
concluded that Malayalam text can be
compressed to a maximum of 21%. A
standard dataset is not available till date
for Malayalam and it is required to
develop one for testing. For testing
similar works, we intended to develop a
bench mark dataset as the future
extension. The dataset which we have
used for this work is taken from the
Malayalam Wikipedia and blogs. We
obtained a compression ratio of 17% in
the best case. The compression can be
further improved by proving an adaptive
transition table rather than static
transition table. The future enhancement
include, the realization of Huffman based
Malayalam compressor and much more
realistic compressor, which take both
Malayalam and English alphabets along
with arithmetic numbers.
Performance Measure
18
17.5
17
16
Input File 5
Input File 4
Input File 3
Input File 2
15.5
Input File 1
% Compression
16.5
Malayalm Content: Editorials
References
Figure 4.1.c Percentage Compression of Proposed
Malayalam Compressor
[1]
A voluminous dataset is to be compiled to
conduct further studies. However, the
present results themselves are unique as
there are no comparable results reported
in literature.
V.
[2]
Conclusion
Malayalam Text Compression opens a
very fresh area of research. A
comprehensive study on Malayalam text
compression is done and a prototype is
developed, perhaps for the first time. We
estimated the entropy of Malayalam as
5.47 bits/character. The number of bits
[3]
15
C.E.
Shannon,
(1948).
“A
Mathematical
Theory
of
Communication”. The Bell system,
Technical Journal, Vol.27, pp.379423.
S. Prema and Manu Joseph, (2001).
“Malayalam frequency count study
report”, Department of Linguistics,
University of Kerala.
K. S. Arun and Achuthsankar S. Nair,
(20 2). “It's 60 years since “kpb
wcy xz” became more informative
than ‘I love you’”. IEEE Potentials,
Vol. 29, pp. 16-19.
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Salomon,
(2004).
“Data
Compression:
The
Complete
eference”, Springer, pp. 1-14.
Grantha, Vattezhuthu, Kolezhuthu,
Malayanma, Devanagiri, Brahmi and
Tamil alphabets, Available: http://cradhakrishnan.info/alphabet.htm.
Accessed on 20 Dec. 2012.
Wiki,
Available
:
http://ml.wikipedia.org/wiki,
Accessed on 13 Jan. 2013.
Thanimalayalam,
Available:
http://thanimalayalam.org/,
Accessed on 17 Feb. 2013
Malayalam
blogkut,
Available:
http://malayalam.blogkut.com/,
Accessed on 3 Dec. 2012.
Malayalam Wiki Source, Available:
http://ml.wikisource.org, Accessed
on 11 Dec. 2012.
Mini-kathakal,
Available:
http://mini-kathakal.blogspot.in/,
Accessed on 23 Dec. 2012.
Pattepadamramji,
Available:
http://pattepadamramji.blogspot.in
/, Accessed on 2 Jan 2013.
Malayalam,
Available:
http://org/wiki/esopkathakal/,
Accessed on 10 Jan. 2013.
Mathrubhumi,
Available:
http://www.mathrubhumi.com/,
Accessed on 21 Feb. 2013.
Appendix
1.
Entropy of Malayalam Characters
Alphabet
അ
ആ
ഇ
ഈ
Total Frequency
Occurrence
Count
Probability
- log2(p)
14311
0.00820
6.9302
6724
0.00385
8.0209
6539
1109
0.00375
0.00064
8.0589
10.6096
16
ഉ
3691
0.00212
8.8817
ഊ
102
0.00006
14.0247
16.6096
ഋ
13
0.00001
എ
11366
0.00651
7.2631
ഏ
1382
0.00079
10.3059
ഐ
959
0.00055
10.8283
ഒ
3258
0.00187
9.0627
ഓ
1933
0.00111
9.8152
ഔ
115
0.00007
13.8023
അo
464
0.00027
11.8548
ക
53088
0.03042
5.0388
ഖ
2192
0.00126
9.6324
ഗ
10640
0.00610
7.3570
ഘ
1533
0.00088
10.1503
11.4002
ങ
649
0.00037
ച
8780
0.00503
7.6352
ഛ
38
0.00002
15.6096
ജ
10606
0.00608
7.3617
ഝ
12
0.00001
16.6096
ഞ
723
0.00041
11.2521
ട
29255
0.01676
5.8988
ഠ
698
0.00040
11.2877
ഡ
8019
0.00460
7.7642
ഢ
55
0.00003
15.0247
ണ
17570
0.01007
6.6338
ത
42772
0.02451
5.3505
ഥ
987
0.00057
10.7768
ദ
9439
0.00541
7.5302
ധ
5058
0.00290
8.4297
ന
47153
0.02702
5.2098
പ
37563
0.02153
5.5375
ഫ
5127
0.00294
8.4100
ബ
8925
0.00511
7.6125
ഭ
6811
0.00390
8.0023
മ
42978
0.02463
5.3434
യ
51155
0.02931
5.0925
ര
46469
0.02663
5.2308
റ
19509
0.01118
6.4829
ല
26390
0.01512
6.0474
ള
17082
0.00979
6.6745
ഴ
6582
0.00377
8.0512
വ
45964
0.02634
5.2466
ബ്ബ
124
0.00007
13.8023
ശ
10617
0.00608
7.3617
ഗ്ഗ
96
0.00006
14.0247
ഷ
9151
0.00524
7.5762
പമ
88
0.00005
14.2877
സ
41359
0.02370
5.3990
ന്ഥ
83
0.00005
14.2877
ഹ
7194
0.00412
7.9231
ഗ്ന
78
0.00004
14.6096
‍ക്ക
28881
0.01655
5.9170
ത്ഭ
74
0.00004
14.6096
ന്ന
23059
0.01321
6.2422
ഹ്ന
51
0.00003
15.0247
ത്ത
18699
0.01072
6.5436
ണ്മ
37
0.00002
15.6096
ട്ട
10970
0.00629
7.3127
ഗ്മ
18
0.00001
16.6096
പ്പ
9502
0.00545
7.5195
ഡ്ഡ
9
0.00001
16.6096
ച്ച
9011
0.00516
7.5984
ല്
20226
0.00001
16.6096
ങ്ങ
7922
0.00454
7.7831
ള്
9647
0.00001
16.6096
‍ണ്ട
7379
0.00423
7.8851
ര്
24929
0.01159
6.4310
ന്‍റ
6074
0.00348
8.1667
ണ്
2195
0.00553
7.4985
റ്റ
5175
0.00297
8.3953
ന്
14390
0.01429
6.1289
ന്ത
5065
0.00290
8.4297
ാ
85839
0.00126
9.6324
ലല
5050
0.00289
8.4347
ാ
139814
0.00825
6.9214
ക്ഷ
3899
0.00223
8.8087
ാ
13794
0.04919
4.3455
ഞ്ഞ
3082
0.00177
9.1420
ാ
89800
0.08012
3.6417
ള്ള
2934
0.00168
9.2173
ാ
11300
0.00648
7.2698
മ്പ
2911
0.00167
9.2259
ാ
2423
0.00139
9.4907
മ്മ
2777
0.00159
9.2968
ൊ
41620
0.02385
5.3899
ങ്ക
2527
0.00145
9.4297
ോ
21948
0.01258
6.3127
സ്ഥ
2103
0.00121
9.6908
ൈാ
3787
0.00217
8.8481
8.2832
ന്ദ
2050
0.00117
9.7393
ൊ
5609
0.00321
സ്റ്റ
1901
0.00109
9.8415
ോ
25062
0.01436
6.1218
ഞ്ച
1473
0.00084
10.2173
ൊ
1591
0.00091
10.1018
യ്യ
1427
0.00082
10.2521
o
46418
0.02660
5.2324
33
0.00002
15.6096
ദ്ധ
1351
0.00077
10.3429
ന്ധ
1049
0.00060
10.7028
18909
0.01084
6.5275
സ്സ
1047
0.00060
10.7028
15228
0.00873
6.8398
ണ്ണ
1023
0.00059
10.7270
3773
0.00216
8.8548
ദ്ദ
879
0.00050
10.9658
ാ
2709
0.00155
9.3335
ക്ത
876
0.00050
10.9658
ാ
80
0.00005
14.2877
ത്ഥ
477
0.00027
11.8548
space
209721
0.12018
3.0567
ത്സ
447
0.00026
11.9092
periods
76897
0.04407
4.5041
Total
1745067
1.0000
ശ്ശ
366
0.00021
12.2173
ന്മ
364
0.00021
12.2173
ത്മ
255
0.00015
12.7028
വ്വ
249
0.00014
12.8023
ജ്ഞ
147
0.00008
13.6096
ച്ഛ
129
0.00007
13.8023
Entropy (-Σpi log pi)
5.47
Table 1: Entropy of Malayalam Characters
Total number of characters
= 125
17
log2 125
= 6.97b
% of Compressibility
= (6.97 - 5.47)/6.97
= 21.58%
2.
Unicode to Bits Mapping
Alphabet
Unicode
Code
അ
3333
10001
ആ
3334
00101
ഇ
3335
00111
ഈ
3336
110110
ഉ
3337
100010
ഊ
3338
001011
ഋ
3339
011001
എ
3342
10011
ഏ
3343
110100
ഐ
3344
111011
ഒ
3346
100011
ഓ
3347
101111
ഔ
3348
001010
അo
3330
111
ക
3349
11
ഖ
3350
101100
ഗ
3351
10110
ഘ
3352
110001
ങ
3353
000000
ച
3354
11111
ഛ
3355
010100
ജ
3356
11000
ഝ
3357
011011
ഞ
3358
111110
ട
3359
1010
ഠ
3360
111111
ഡ
3361
00000
ഢ
3362
010010
ണ
3363
0101
ത‍
3364
010
ഥ
3365
111010
ദ
3366
11011
18
ധ
3367
ന‍
3368
101
പ
3370
1001
01101
ഫ
3371
01011
ബ
3372
11110
ഷ
3383
11100
സ
3384
1000
ഹ
3385
00011
‍ക്ക
3349-3405-3349
1011
ന്ന
3368-3405-3368
1111
ത്ത
3364-3405-3364
0100
ട്ട
3359-3405-3359
10101
പ്പ
3370-3405-3370
11010
ച്ച
3354-3405-3354
11101
ങ്ങ
3353-3405-3353
00001
‍ണ്ട
3363-3405-3359
00010
ന്‍റ
3368-3405-3377
01000
റ്റ
3377-3405-3377
01010
ന്ത
3368-3405-3364
01100
ലല
3378-3405-3378
01110
ക്ഷ
3349-3405-3383
01111
ഞ്ഞ
3358-3405-3358
100100
ള്ള
3379-3405-3379
100101
മ്പ
3374-3405-3370
100110
മ്മ
3374-3405-3374
100111
ങ്ക
3353-3405-3349
101001
സ്ഥ
3384-3405-3365
101101
ന്ദ
3368-3405-3366
101110
ഞ്ച
3358-3405-3354
110010
യ്യ
3375-3405-3375
110011
ദ്ധ
3366-3405-3367
110101
ന്ധ
3368-3405-3367
110111
സ്സ
3384-3405-3384
111000
ണ്ണ
3363-3405-3363
111001
ദ്ദ
3366-3405-3366
111100
ക്ത
3349-3405-3364
111101
ത്ഥ
3364-3405-3365
000001
ത്സ
3364-3405-3384
000010
ണ്മ
3363-3405-3374
011000
ഗ്മ
3351-3405-3374
011010
ഡ്ഡ
3361-3405-3361
011100
ല്
3453
0001
ള്
3454
11001
ര്
3452
1110
ണ്
3450
101011
ന്
3451
10000
ാ
3390
10
ാ
ാ
3391
3392
00
10010
ാ
3393
01
ാ
3394
10100
ാ
3395
101010
ൊ
3398
011
ോ
3399
0000
ൈാ
3400
100000
ൊ
3402
01001
ോ
3403
1101
ൊ
3404
110000
o
3330
111
3331
010110
3405-3377
0011
3405-3375
0111
3405-3381
100001
ാ
3405
101000
ാ
3415
001111
space
32
0
periods
46
1
Table 2: Transition Table
19