THE IMPLEMENTATION OF CUVOICEBROWSER, A VOICE WEB NAVIGATION TOOL FOR THE DISABLED THAIS Proadpran Punyabukkana, Jirasak Chirathivat, Chanin Chanma, Juthasit Maekwongtrakarn, Atiwong Suchato Spoken Language Systems Group Department of Computer Engineering, Faculty of Engineering Chulalongkorn University Thailand [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT Surfing the web is, today, a daily activity for most. While we utilize the web as a means of publishing information to the world and accessing information from wherever, less fortunate people do not have that privilege and this includes the blind and the motor-handicapped. This paper explains how we have designed and built the CUVoiceBrowser, a web browser that can be controlled by voice in the Thai language to serve the Thai disabled group. While many have worked on text-to-speech capabilities in reading web pages to the blind, we focus on taking commands from the blinds and the motorhandicapped, in Thai, to navigate the web and to search for information on the web. Our prototype shows high accuracy rate of over 80% when users speak their web navigation commands, and over 70% when Thai characters are inputted by voice. KEY WORDS Web browser, speech recognition, text-to-speech, and assistive technology 1. Introduction Web accessibility centered around voice input will one day allow anyone to use the web without training [1]. More importantly, this technology will allow the blinds to participate in the growing web community. A summary of the progress of web accessibility is given by Asakawa [2]. Many researchers have built prototypes to demonstrate the workability of Voice-web concepts. Hemphill, et.al., have developed voice controlled navigation using speaker independent speech recognition that supports a speakable hotlist, speakable links and smart pages in the speech user agents [3]. Parente [4] has developed a prototype of audio enriched link that will provide a summary of web pages for those web pages accessed by the blind. Most of the work done in this area is based on the English language. However, there have been experimental projects attempted by other nonEnglish systems. Brondsted, et.al., have built the prototype of a Danish voice-controlled utility for internet browsing targeting motor-handicapped users who have difficulty in using the standard keyboard and/or standard mouse [5]. And Lopez, et.al., [6] have offered a tool to help the visually-impaired to surf web in Spanish but does not have a voice-recognition capability in place. To date, IBM ViaVoice and Homepage Reader are the only commercial tools that support the Thai language. While ViaVoice does recognize Thai language, it is not a tool to navigate the web. Although IBM Homepage Reader does read homepages or the web aloud in Thai, it does not have a feature to recognize commands from users. The blind and the motor-handicapped share similar difficulties when it comes to surfing the web. They cannot use a keyboard or a mouse to input commands. This fact instantly limits their use of the web. In fact, the blind might be in a better position if they could use a Braille keyboard together with text-to-speech software. Unfortunately, those whose hands are not capable of typing will find the keyboard to be of no use. In addition, the Thai disabled community is not necessarily proficient in English. This further limits them when seeking knowledge through the web. Hence, this project was initiated to build a web browser, the “CUVoiceBrowser”, that understands simple web navigation commands in Thai. Although both the blind and the motor-handicapped are considered as the users of this tool, the design of this tool is more focused on the blind who have more limitations in terms of viewing the web. 2. Design and Implementation Criteria The design of CUVoiceBrowser web navigation tool has taken into consideration the criteria that both general and particular Thai handicapped people are the users. We allow simple voice commands that are appropriate in the Thai language which may not be exact translations from English. Particularly the blind, as they are not able to see the screen, are unable to understand even fundamental concepts like normal people. These include frame, color, blinking text, pictures, etc. This simple framework does not reduce the capabilities nor make it harder for the motor-handicapped. Our design criteria mainly focus on ease of use and ease of understanding. Taking these criteria into account, the resulting design is as follows: 2.1 Minimal Training To minimize the training needed, the tool only requires the users to listen to the guidelines at the beginning once the user opens the program. It tells the user to locate ‘Alt’ and the ‘Space bar’ which they will use to activate each command. It will also list available commands that users may use. These guidelines are kept as a help menu that the user can review at anytime. As a result, users will need no formal training in the use of this tool. The structure of Help and guideline is shown in Figure 1. Whenever the users want to activate the help function, they can say “Help” in Thai at anytime. Help.html Introduction (Welcome) Control: How to record (Ctrl + Spacebar) How to pause Essential commands (Read, Read Next, Read previous) Link command (Goto link) Guideline (one.html) (How to use the tool step by step) How to record sound How to input addresses and goto addresses What happens when finishing loading Other commands Techniques (two.html) All commands (three.html) Program limitation (four.html) Figure 1 Structure of Help and Guideline for the users ii. iii. iv. v. “Next ” - Forward to next page “Reload”- Refresh the current page “Stop loading” – stop loading the new page “Input address” – Entering input mode, voice input of characters and special characters forming URL vi. “Go” – Go to the specified URL vii. “Search ” – Entering search mode , voice input of search item then Web Search Engine will be activated c) Web reading functions i. “Read” – Read the current paragraph ii. “Read next” – Read the next paragraph iii. “Read previous” – Read the previous paragraph iv. “Read again” – Read from the beginning of the current page v. “Stop reading” – Stop reading the text vi. “Next link” – Read and select next link vii. “Previous link” – Read and select the previous link viii. “Open link” – Go to the specified link d) Other commands i. “Help” – Go to help menu ii. “Bookmark” –Go to bookmark page to store the desired page There are altogether 21 commands to navigate the web. The users do not need to memorize all the commands as the tool will ask the users what they would like to do at each stage. The options will be given at each stage or the expected response will be intuitive enough without instruction. Please note again that all the commands are in Thai. Among the 21 commands, the most used are “Read”, “Read next”, and “Search”. 2.2 Web Navigation Commands 2.3 Language Input In developing the Thai language commands, we have attempted to use the fewest set of commands as possible. Also, we have tried to select commands that do not sound similar to each other so that accuracy will be higher. One more consideration we would like to note here is that the commands in Thai are not direct translations from English. This is to reduce the frustration that might be experienced by users should they have to learn new technical jargon. As a result, the following groups of functions have been defined and implemented. The commands enclosed in “ ” are in Thai. Here, it is presented in English for the purpose of publishing. The highlight and the challenge of this tool is the ability for users to input the desired information in Thai using their voice, in addition to English. This is particularly useful when users want to search for something. They may use Thai, English, number, or a special character as the input and they can say one character at a time to the tool. a) Control Functions: i. “Open a new page” - Open new tab ii. “Close page” - Close the current tab iii. “Next page” - Change to next tab iv. “Exit”- Exit browser b) Page access functions: i. “Previous ” - Back to previous page The tool will recognize it and echo the result back to the user. If it is incorrect, the user can repeat that letter again. This paper will not go into details on the English character set. However, it is worth explaining how the Thai language is characterized and how we have designed our system to manage this task. Figure 2 Architecture of the Tool 2.3.1 Thai language 2.3.3 Numbers and Special Characters In the Thai language, there are 46 consonants, 21 vowels, and 4 tone indicators to indicate 5 tones. Out of 46 consonants, there are 44 in use. Our tool allows the use of all 44 consonants, 21 vowels, and 5 tone indicators. The complication arises because some of the characters sound exactly the same although the autography of the letters is different. Therefore, to capture the right one, the user is asked to say that letter along with the main word sample attached to that letter. Though it may sound complicated, it is rather trivial for Thais since it is how the language is taught in school when Thais start to learn. The format of this input is similar to “A-Apple, B-Boy” in English. However, when the user wants to input a vowel, they say the word “vowel” in Thai, followed by the desired vowel. Numbers 0 to 9 can be inputted by saying them in Thai. Again, since there is no duplicate sound, the tool will understand that the user wants a number once it is said. For special characters, most are allowed and they are listed in the guidelines for users to see what can be used. Examples are -, _, @, #, ?, ~, , (comma), :, +, -, “ ”. Users will say these in Thai as well. 2.3.2 To save users time, users can also bookmark their favorite pages. Favorites are kept for them provided that user sets a keyword. Keyword may be anything such as the name of the newspaper, name of the bank, etc. English language When users want to input an English character, they only have to speak that letter without saying the sample word as they have to do when inputting Thai. There are two reasons for this approach. The first is that there is no duplicate sound from those 26 letters. The second is that it makes it automatic for the tool to understand whether the user is inputting English or Thai character. 2.4 Utilities We have designed our speaking Help function not only to minimize training process, but also to serve as a utility for users. In addition to the Help menu, it is important to select utility that is useful to the users. Once a bookmark is set, user is able to go directly to their bookmark in one step by saying the keyword of the bookmarked page. Another option to reach the kept favorite pages if user cannot remember the keyword is to say “bookmark”. The tool will read aloud the list in the bookmark one by one. Once the desired page is read, user can say “open link”, and the tool will automatically navigate to that page. 3. Architecture of the Tool There are three main components when building this CUVoiceBrowser; the browser, the recognizer and the text-to-speech. We have used Microsoft Visual Studio, to build the browser and its functions. In the case of the recognizer, we built it using HMM technique and use HTK as a tool. For text-to-speech, we simply called a library from IBM ViaVoice to do the work. In this paper, we will only give details on the browser and the recognizer that we built. The overall structure of the architecture is shown in Figure 2. Figure 3 Sample of screenshot of CUVoiceBrowser Recognizer Browser Microsoft Foundation Class (MFC) has been used to build our browser. As shown in Figure 2, WebBrowserView handles the displaying of webpage. The navigation tasks are done by calling the functions in WebBrowserView and the function is sent to WebBrowserCore which executes and controls each navigation task. When the user presses Ctrl+Spacebar, the WebBrowserView will accept the event from the keyboard and understand that the user now wants to input a voice command. Then, the MainFrame function will call Recorder which will record the voice command from the user before sending it to the Recognizer. The Recognizer in turn calls HTK to execute the built-in module for Thai and English commands which will perform the recognition task. The recognizer here does the entire job of recognizing what the users say, including web navigation commands and the character input; Thai, English, and special characters, and numbers. The output from the Recognizer is a string of text that is one of the commands that will be sent to MainFrame to accomplish what the user asks the tool to do. At this stage, it will go back to the web browser. The WebBrowserCore function will send HTML string to WebBrowserView in order for it to display the screen. After that, the reading function is performed when HTMLTranslator translates the source code that it has obtained from WebBrowserView. It understands which part of the text is to be read and which part is the link determined from the associated tags. Within HTMLTranslater, there is a Function List. It keeps the text to be read in the List. When the user asks the tool to read, the Function List will send a text string to ViaVoice to read it to the user. Links are also read and the tool will ask the user if they want to go to the next link. A sample of the CUVoiceBrowser screen is shown in Figure 3. As stated briefly earlier, we have used the Hidden Markov Model (HMM) technique and the Hidden Markov Toolkit (HTK) to build the recognizer. Seventy five Thai phonemes, including syllable initial phonemes, vowel phonemes, and coda phonemes, were used as the basic sound units for the vocabulary in the recognizer’s dictionary. Each acoustic model specifically corresponded to a context-dependent triphone, which corresponded to one of the seventy five Thai phonemes with specific preceding and following phonemes. We collected the needed utterances from a total of 30 native Thai speakers, equally distributed between male and female, to train our acoustic models corresponding to the 75 phonemes. Each speaker was asked to say all of the commands used in the system as well as every Thai and English letter and number. Their voices were recorded using a computer headset and digitized at 16kHz with 16 bit resolution. The topology of each HMM is a five-state left-to-right model with three emitting states. Thirty nine-dimensional feature vectors, consisting of Mel-frequency cepstral coefficients (MFCCs) together with their deltas and accelerations, were used to represent observations from speech frames. Gaussian mixtures with two components were used to govern the emitting probabilities of the emitting states. A Baum-Welch re-estimation algorithm with flat-start strategy was used to estimate the required parameters for every HMM and Gaussian mixture. The Token Passing algorithm was used in the decoding phase to find the most likely hypothesis from the HMM-based triphone network generated by associated task grammar, which could be either navigation command or character filling. 4. Discussion and Conclusion In this paper, we have presented the design and implementation of a prototype for a voice-controlled tool using the Thai language that targets Thai who are motorhandicapped and/or blind novice Thai web browsers. The tool is called the CUVoiceBrowser. This prototype is the first example web accessibility technology being implemented in Thailand. In testing the effectiveness of the prototype, we found the accuracy to be satisfactory. The greatest accuracy is found when users use web navigation commands where the accuracy is in the high 80%. With Thai character input, accuracy is approximately 70%. The weakness is when users input English characters where the accuracy is lower than 60%. This might be attributed to the deficiency in English speaking of Thai native speakers who are in our test group. Our future research will concentrate on the systematic processing of non structured commands and combinations of commands. We envision implementing a system with feedback from the web interface so as to guide the users more effectively and also help the user to resolve conflicting accessibility paths. The feedback will also increase the speed with which the handicapped user will use the web pages. Furthermore, we have worked to improve on the portability of the tool and the cost of using it so that all visually-impaired and motor-handicapped Thais will have access to the tool. We also wish to employ information retrieval techniques to summarize the Thai webpage for the users. Acknowledgements The authors thank IBM (Thailand) who supplied the ViaVoice library module to help this project accomplish text-to-speech capability. References [1] Frost, R., “Call for a Public Domain SpeechWeb”, Communications of the ACM, Nov 05, Vol.48, No.11, pp. 45-49. [2] Asakawa, C., “What’s the Web Like If You can’t See It? W4A at WWW2005, 10th May 2005, Chiba, Japan. [3] Hemphill, C.T. and Thrift, P.R., “Surfing the Web by Voice.” ACM Multimedia 95-Electronic Proceedings, November 5-9, 1995, San Francisco, CA, USA. [4] Parente, P., “Audio Enrished Links: Web Page Previews for Blind Users.” ASSET’04, Oct 18-20, 2004, Atlanta, Georgia, 2004. [5] Brondsted, T. and Aaskoven, E., “Voice-Controlled Internet Browsing for Motor-handicapped Users, Design and Implementation Issues, Interspeech 2005. [6] Lopez R. A., Krischning A. I. , “Finder and Reader of Web Pages in Spanish for People with Visual Disadvantages,” Proceedings of the 16th IEEE Conference on Electronics, Communications and Computers, 2006.
© Copyright 2025 Paperzz