Optimization of Farsi Letter Arrangement on Keyboard by Simulated

1
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
Optimization of Farsi Letter Arrangement on Keyboard by
Simulated Annealing and Genetic Algorithms
Navid Samimi Behbahan , [email protected]
Department of Computer, Omidiyeh Branch, Islamic Azad University, Omidiyeh, Iran
Paper Reference Number: 8
Name of the Presenter: Navid Samimi Behbahan
Abstract
Nowadays one of the most common devices for computer data entry is the keyboard. No
doubt, saving time, in the present age, is one of the most important goals humankind sought to
promote. Optimization of keyboard arrangement is of great importance, since it can help us to
have access to information in less time. A combined evolutionary algorithm can search on the
keyboard and reach the optimized arrangement with regard to an evaluation factor (the level
of typing comfort for a special letter arrangement) in the space of Persian letters arrangement
on a keyboard. In this paper, the genetic and simulated annealing algorithms are searching for
the best permutation among the 33 Persian letters on the keyboard. The evaluation criteria
includes three factors: intermittent use of hands in typing the texts, not using a hand for typing
two adjacent letters and the level of hardness of typing a letter in the related arrangement.
In the studies conducted by the large and various data sets (Persian texts), it was determined
that the optimized arrangement resulted from this hybrid algorithm performs better than the
present algorithm.
Key words: permutation, genetic algorithm, simulated annealing algorithm, keyboard, optimum
arrangement
1. Introduction
Since the emergence of computer up to now, key board has been the main interface
between the human and computer. Optimized arrangement of Persian letters is beneficial
for the people dealing with the typing the Persian texts. Before, the keyboard was used in
typing machine. It was 135 years since the rectangular keyboard was designed by
Christopher Latham Sholes and used in typing machines, but this invention was constantly
questioned by the critics. In addition to physical design of keyboard, letter arrangement on
the keyboard is criticized as well. The researchers have presented different algorithms to
create the optimized layout but they were mostly used for the English language.
Unfortunately, no new design has been given for Persian language and the arrangement
recommended at the beginning of Persian language application in computer is still being
used.
2
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
Many researchers have applied the evolutionary processing algorithms and similar
methods to solve the problem of letter arrangements on the keyboard. Glower (1987) was
the first researcher who studied in this area. He designed a genetic algorithm and its
chromosomes were created from the different permutations of Latin characters on the
keyboard. To reach the optimized permutation, he also designed a series of appropriate
genetic operators. Light and Anderson (1993) were the next researchers worked on this
problem. They use the simulated annealing algorithm to search in the different
arrangement forms of English language. The evaluation function used by these people,
was based on the typing time and the frequency of words repetition. Klausler (2005) used
an operator which frequently changed the place of two letters. He applied this algorithm
for 26 English letter and 4 punctuation characters in three rows of ten keys. The evaluation
function used was able to calculate the frequency of fingers displacement according to
their basic position (Fig.1), then the algorithm moves toward the minimized of finger
displacement frequency for a fixed text in exchange for different arrangements.
Fig 1: the basic position of the typist's fingers on keyboard
Wagner et al. (2003) used the Ant Colony Algorithm to search in the environment of
keyboard arrangements. Evaluation function used the factors such as the frequency of
pressing different keys, using hands for typing two adjacent letters and using one finger
for typing two adjacent letters.
Moradi and Shiri (2006) applied the genetic algorithm to solve the problem of Persian
letters arrangements on a keyboard with three rows. They used the mutation operator to
modify their initial populations. Their evaluation functions measured the optimized fingers
displacement, equality of works with both hands, sequences of work with two hands in a
way that hands non-consecutively enter the characters for a six letters word (5). In the
following section, the problem of the mentioned algorithm was precisely discussed, the
combined algorithm was precisely studied a new function was presented to compare the
different keyboard arrangements and finally the optimize arrangement was studied along
with the results.
2. Problem definition
Like the other algorithms searching in the environment of different arrangements, in this
problem the geometry of keyboard is fixed and we want to allocate the number of 33
characters including 32 Persian language letters along with ‫( ء‬hamzeh) on the three rows
of the keyboard which have in order 10, 11 and 12 keys. The target of this problem is to
find the best arrangement on these keys in away that the user feel more comfortable at the
time of typing the Persian texts.
3. Statistical review of the Persian Letters
In this part we try to find the frequency of each of the Persian language characters and the
pair of different characters following each other (Malas et al. 2008). The process of coding
was performed in Matlab environment on a text including 19092 words and the results are
shown as follows. Table 1 represents the frequency for each of the Persian language
3
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
characters. The most and least foe the frequency belongs to "‫ "ﺍ‬and "‫ "ء‬with the percents of
15.80 and 0.02. Table 2 shows the frequency of the pairs of characters following each
other. The rows signify the first character and the column are for the second characters.
The most frequency belongs to "‫ "ﺍ‬and "‫ "ﻥ‬characters.
Table 1. Frequency of the Persian Letters
Table2. Frequency of Persian letter pairs
4. Combine of Genetic and Simulated Annealing Algorithms
In order to find the answer for the mentioned problem, a hybrid algorithm was used. The
main problem of genetic algorithm at the time of problem solving is to be trapped in
position called relative maximum position. This problem sometimes results in failure in
finding the optimized answer (absolute maximum) for the problem. The genetic algorithm
4
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
usually tries to improve the problem solving environment, whereas in order to find an
optimized answer we need to come to the worst answers, then improve them and finally
reach an optimized answer (absolute maximum). So, in this method the property of
improving, which is the characteristic of genetic algorithm and enables it to find a better
answer in comparison to the present answer, would not be used and the simulated
annealing algorithm would be applied. Actually the changes in genetic algorithm is in a
way that leads to the creation of new arrangements, but no arrangement selection would
be done in this algorithm and acceptance and non-acceptance of the new arrangements (as
defined in fig.4) is to be performed by simulated annealing algorithm. The evaluation
function present in this algorithms complex, measures the level of comfort or hardness of
applying a arrangement. In every generation, the genetic operators are applied to a the
present population which are the different arrangements of Persian language letters on a
keyboard, then they are moved toward a direction, by help of a simulated annealing
algorithm, that the amount of function reaches to a number which is in proportion with the
members of that minimize value.
Degree of optimization of each members of the population (which is actually a
arrangement of Farsi letters on keyboard) achieve by applying the evaluation function on
text provided from variety issues (including political, scientific, historical, social and other
issues).
// Sinit is the initial set of rules
// Sbest is the best set of arrangements
// EFbest is Evaluation Fitness for best set of arrangements
// EFcurrent is Evaluation Fitness for current set of arrangements
// Tmax is initial temperature
// Tmin is the final temperature
// α is the cooling rate
// β is a constant
// Time is the time spent for the annealing process so far
// k is the number of calls of metropolis at each temperature
Begin
T = Tmax
;
Scurrent = Sinit ;
S best = S current ; // Sbest is the best set of rules soon so far
Repeat
For i = 1 to k
Call Metropolis( S current , S best , T )
k = β × k;
T = α ×T;
Until ( T ≥ Tmin );
Return( S best );
End. //Genetic-Simulated Annealing
Procedure Metropolis( Scurrent , Sbest , T )
// S new is the new set of rules
Begin
Selection( Scurrent );
S new = 
Mutation( Scurrent );
5
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
EFnew = NNCP ( S new );
∆EF = ( EFnew − EFcurrent );
If ( ∆EF < 0 ) Then
S current = S new ;
If
EFnew < EFbest
Then
S best = S new ;
Else If ( random[0,1] < e −∆EF / T ) Then
S current = S new ;
End If
End If
End. //Metropolis
Fig 2: Quasi-code for Presented hybrid algorithm (Genetic-Simulated Annealing)
4.1. Population
The members of population in this problem are the different permutations of Persian
letters on keyboard or the arrangements. Each member of the population in this problem
can be regarded as a vector of the Persian letters, each index of which is corresponding to
one key. For example, each vector with the length equal to 33 can be regarded as one
chromosome (one member of the population) and ith letter of this vector is corresponding
to a key which is labeled as ith on the keyboard. The corresponding indexes of each
chromosome are shown on the keyboard. One chromosome of population corresponding to
the present arrangement of the keyboard is also shown.
Fig 3: The indexes of genes of each chromosome on keyboard
1
2
3
4
5
6
‫ﻍ ﻑ ﻕ ﺙ ﺹ ﺽ‬
...
33
‫پ‬
Fig 4: Structure of a chromosome of a population corresponding to current arrangement of
Farsi letters on the keyboard
Generally, we are looking for one member of the whole possible population which is
defined in the following section with regard to the evaluation function and bears less cost
in comparison to the other members (arrangements). The important point is that the
number of different arrangements is 33! Or 8.8×1036 and this is the space in which the
hybrid algorithm should look for the optimized arrangement.
4.2. The Evaluation Function for Keyboard arrangement
In this area we are relying o6+n the works done by the specialists. Norman and Romelhart
(1983) have defined four targets for the design of a keyboard including:
1- the most equality of works done by two hands
2- the most number of types done intermittently by two hands
3- the least number of types of two adjacent letters done by the same finger
4- the most allocation of commonly used letters on the middle row of keyboard
6
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
One chromosome (by simulating the typing of a text) represents the arrangement of the
related keyboard. The evaluation factors with the mentioned targets are defined. For the
two first targets we can present the following evaluation factor:
C hand : the cost related to the using one hand for typing two adjacent letters: this factor
covers the two first targets. The second target is to be met directly. The first target would
also be covered, because the intermittent use of both hands to type the two adjacent letters
will finally spread the hardness of typing between two hands. For the third factor the
following factor is defined:
C finger : the cost related to the application of one hand for typing the two adjacent letters.
But we have another measuring factor which covers not only the fourth target but regards
other factors too. This factor dose not just pay attention to frequency of pressing the basic
keys, but also considers the application of different fingers, the level of comfort at the time
of working with hands and the frequency of displacement of fingers on keyboard. In figure
4 the costs related to pressing any key on the keyboard is shown and these numbers are
resulted form the professional typists. It is to be mentioned that these numbers are drawn
from a right hand person. Based on the mentioned information, the third factor can be
defined as follows:
C ergonomic
Fig 5: cost related to pressing any key on the keyboard
Evaluation function for each chromosome is obtained from the total of these three factors
for all the letters used in a text.
33
33
i=1
j=1
� Fletter (li ) × �� ��Fletter _pairs �lj , li � × �Chand �lj , li � + Cfinger �lj , li ��� + Cergonomic (li )��
(1)
F letter : the percent of relative frequency of each letter and F letter_pairs defines the possibility
of allocation (percent of relative frequency) of two pairs of characters besides each other.
C ergonomic A function of costs which transform the cost of typing the l j letter with regard to
the values defined in Figure 3 and the related arrangement. C finger A function which
transforms a fixed value (average of numbers shown in Figure 4) if the two letters of l j and
l j-1 is typed by one hand considering the related arrangement, otherwise its outcome is
zero. C hand : A function which transform the fixed value (a quarter of the mentioned fixed
value for function C finger ), if the two letters of l j and l j-1 are typed by one hand based on
the related arrangement, otherwise its outcome is zero. If the l j is the letter of W i word in
the function of C hand and C finger , the outcome for these two function is zero.
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
4.3. Genetic Operators
In order to lead the population of genetic algorithm (chromosomes) to the direction in
which the evaluation function reduces for each chromosome, the genetic operators should
be used. The mutation operators are just used here. The reason not to use the exchange
operator is that the structure of population members is so that merging the two parent
7
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
chromosome bears high cost of time (in a way that the chromosomes of the children born
from the common genes have their parent and the rest are randomly changed). The
mutation operator is designed in away that bears much less time and fill the empty place of
exchange operator. For each generation, our population includes ten chromosomes. We
designed the mutation operator for these members in a way that a single number among 3
to 12 is considered for each member with regard to level of optimization of related
arrangement. This number in each chromosome shows the number of genes (characters)
including mutation operator. The number of 3 and 12 mutation is to be done for the best
and the worst chromosome. The mutation operator randomly displaces the contents of then
related genes for each member.
This kind of discrimination in the frequency of displacement of each chromosome's genes
causes the genius members of each generation to go under fewer changes and more
changes happens for the normal members of the population. So, in the transition of
generations, there is high probability of creating genius members with regard to the level
of changes. The more genius members of the past generations bear fewer changes and so
there is the chance of generation reform.
5. Conclusion
For the present
rectangular
keyboards,
an appropriate
arrangement
avoid the high
: cost related
to typing
a letter regarding
position
of the lettershould
on keyboard.
number of displacement of hand fingers on the keyboard and consider other ergonomic
factors which provide the user's comfort. Among these factors is not typing two adjacent
letters with one finger and even one hand and spreading then difficulty of typing equally
between two hands. The combined, simulated annealing and genetic algorithm lead the
arrangement of the 33 Persian letters to the optimized arrangement. We implemented he
presented hybrid algorithm with the aforementioned (table3) parameters. Cost of best
arrangement which the combined algorithm finally presented for Farsi letters, according to
evaluation function is 0.6391 cost of current arrangement of Farsi letter and it really is
significant improvement. This arrangement is shown in figure 5.
Number of chromosome
Primary temperature
final temperature
Temperature decrease coefficient
Number of repeating the metropolis function in each
temperature
Number of mutated genes for each chromosome
10
10
0.0001
0.95
15
3 to 12
Table 3. Parameters values of the proposed method
Fig 6: proposed arrangement
References
Glover, D. E., & Kaufmann, M. (1987). Genetic Algorithm and Simulated Annealing, page
12-31, Los Altos, CA.
8
5thSASTech 2011, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14.
Gotti, J. S., & Brugh, A.W., & Julstrom, B. A. (2005). Arranging the Keyboard with a
Permutation-Coded Genetic Algorithm. In Proc. Of the 2005 SCM Symposium on Applied
computing, Volume 2, pp. 947-951.
Klausler, P. (2005). Available at www.visi.com/~pmk/evovled.html, Sep.
Light, W. L., & Anderson G. P. (1993). Typewriter keyboard via simulated annealing, AI
Expert, September.
Malas, T. M., & Taifour, S. S., & Abandah, G. A. (2008). Toward Optimal Arabic
Keyboard Layout Using Genetic Algorithm. In Proc. 9th Int’l Middle Multiconference on
Simulation and Modeling, Aug 26-28, Amman, Jordan.
Moradi, S., & shiri, S. (2006). Optimization of Farsi Letter Arrangement on Keyboard by
Genetic Algorithms. Tehran, 11th International CSI Computer Conference.
Norman, D. A., & Rumelhart, D. E. (1983). Cognitive Aspects of Skilled Typing. New
York, NY: Springer-Verlag.
Wagner, M. O., & Yannou, B., & Kehl. S., & Feillet. D., & Eggers, J. (2003), Ergonomic
Modeling and Optimization of Keyboard Arrangement with an ant colony algorithm.
European Journal of Operation research.