1. POST PROCESSING 5.1 INTRODUCTION In chapter 4 the pre-processing of user’s query is explained. In the post processing stage, the processed query will be given to the search engine to retrieve results. The user’s query is searched in the search engine. The search results are converted by the post-processing system, using grammar rule structure and the ontology model. Finally the results are re-ranked using re-ranking algorithm and shown to the user in user query language. The grammar based system plays a role in the assembling of the results in the target language and also in the re-ranking system wherein the most relevant results alone are shown first. 5.2 METHODOLOGY OF PROPOSED POST-PROCESSING The major objective of the post-processing stage is to convert the retrieved results related to the query into the Telugu language. There are three distinct components in the process (Figure 5.1). Figure 5.1 Overall process of post-processing 5.2.1 Tokenizer The working procedure of the tokenizer is same as in the pre-processing stage. Here the tokenizer is used to tokenize the results that are retrieved for the given queries. Tokens are separated by whitespace characters, such as a space or line break, or by punctuation characters. Figure 5.2 explains the working process of a sample user given Telugu query. Results retrieved Tokenization Output Figure 5.2 Tokenizer process When a snippet is given to the tokenizer and it will be tokenized into tokens for further process. The operation of the tokenizer in post-processing is shown in Figure 5.3. 1. The user’s query is received and processed by the pre-processing system. 2. The system processes the query and converts into English equivalent query and it is passed to the search engine. 3. The search engine retrieves the results related to the query. 4. The outcome of the search engine is processed into the post-processing system and the outcome is processed and presented to the user. Figure 5.3 Process Flow of system 5.2.2 Language Grammar Rules The tokenized snippet terms are sent to the language grammar rule component to process. The detailed flow of the grammar structure is explained in Appendix 1. In this sub section, the essence is explained briefly. The working procedure of the language grammar rule component is same as in the pre-processing stage. Once the terms identified and it looks into the ontology to get equivalent terms to convert the results into the user query language. Once the results are converted the reranking process will start. 5.2.3 Re-ranking system In the Web search for the results of processed query by pre-processing system, it has been observed that the majority of the snippet contents contained the search query terms. Hence, methods to manipulate the results based on the snippets, must also take into account the linkages of the search term in the context of the snippet, thus needing the ontology. The snippets are assigned a rank based on the inter-term relationships in an organized set of steps. This approach is outlined in Chapter 4 is combined with the post-processing system and used. The ontological method models the set of keywords retrieved by the search process as a unified whole, from where the re-ranking of the content can be done using the fuzzy relations between the query term and the ontology. In the Web search, it has observed that the majority of the snippet contents contained the search query. Hence, methods to manipulate the results based on the snippets, must also take into account the linkages of the search term in the context of the snippet, thus needing the ontology. The snippets are assigned a rank based on the inter-term relationships in an organized set of steps. Figure 5.4 Term frequency for the query terms relationship Ea Query 0.9 ch 0.7 1.0 term processin Meaning Relevant Relationship 0.5 g considere 0.6 0.8 d Relevant Related Meaning Relevant is Related Meaning Relevant Related as step a in the computation of the information gain, and the consolidated information gain tfij is calculated for the entire snippet contents. Here, the notation tf ij represents a term‘t’ in the snippet ‘f’. The term ‘i’ stands for the snippet value and the term ‘j’ stands for the term in the snippet. Each snippet is randomly chosen from among the search results. The terms visited in each snippet can be written as tf i1, tfi2, tfi3… For each term in the snippet, the distance vector measure is calculated in terms of the term-relationship frequency where the term relationship frequency is calculated as the measure of the term-relationship value level. Now the term relationship is calculated for the snippet as to how each term is related to the contents of the ontology in the dependency tree order in Fig 5.4. The position in the parse tree is found. For relationships, the value is 0.9. For meanings it is 1.0. The third position (related) is 0.75. The next positions are each assigned a value of 0.60, 0.55, 0.50, 0.45…etc. till the 10 terms are reached. For all other terms 0.05 is assigned. Anything beyond is not assigned any value, and left off. These values (1, 0.90, 0.75 …) are arrived at by experimentation. The sample term frequency calculated for the ontology given in Fig 5.5. The similarity of the query results are found next. It is done by comparing the non-stop terms of the snippets. Two snippets are considered to be similar if more than 60% of the terms in the terms match. The value 60% has been arrived after experimentation and in future theoretical basis for the same will be derived. Similar snippets are clustered in the order (meaning, related, relationship; snippet number). The results contain mix of English and Telugu content. For the English results the results smoothening approach is used. 5.2.4 Smoothening Approach The resultant snippets in English are taken one at a time. The basic unit of the process is to identify the root words of each term in the snippet. First the snippets are delineated in terms of sentences. Sentences are classified into simple and complex based on the structure. A simple sentence is one which follows the subject verb object form. All other sentences are complex sentences. For each sentence the terms are identified into – clauses and stop words. Figure 5.5 Sample term frequency A clause is a verb/adverb/adjective. The stop words are identified from the sentences. The terms are converted into the root word using porter’s stemming algorithm. Now language specific rules are applied to identify the translation heuristics. A single term may exist in different tense and word forms. Hence the query specific information tree sequence is used to disambiguate the sense of the term. Now, morphological rules are applied to get the translation for known grammar forms and Related terms. Mobile (మొబైల్) Computing (కంప్యూటంగ్) Relationship Technical (సాంకేతిక) Item (వస్తువు) Telephone (దూర వాణి) Out of Standard (సామర్ధాయాన్ని) Vocabu lary terms are treated in the same manner as Proper nouns. Such terms are transliterated automatically. Case 1, figure 5.6 shows an example, how the results retrieved related to the user given for the given user query which is pre-processed and converted into English language in pre-processing system. Here a step by step process of the post-processing system for results retrieved is discussed below: Step1: relevant results are retrieved related to the pre-processed user query from the web. Step2: Each is ాtokenized into tokens. మొబైల్ కంప్యూట ంగ్ result - వికీపీడియ te.wikipedia.org/wiki/మొబైల్_కంప్యూటంగ్ Step3: Using English grammar rules the terms (subject verb and object) are మొబైల్ కంప్యూటంగ్ (Mobile computing) అనేది చలనంలో ఉనిప్ుుడు సాంకేతిక వస్తువులనత వాడటాన్నకి ఒక identified andవాడే Apply grammar rules tokens, first look into వూకిుకుని సామర్ధా య ాన్ని వర్ధ్ణంచడాన్నకి సాధారణ ప్దం, స్థి రముగా ఒకto చోటthe అమర్ధ్ క చేస్థ మాత్ర మే వాడటాన్నకి వీల ైన స్తలభంగాfor ... the tokens inflection. If any inflection is found and the equivalent grammar rule is మొబైల్ టవి - వికీపీడియా used to identify subject verb and object te.wikipedia.org/wiki/మొబైల్ టవి మొబైల్ టెలివిజన్once అంటే చేthe తిలో ఇమిడే కరముతో టెలివిజన్ చూడడం. ఆ ... Step4: termsప్ర్ధ్(subject, verb, object and inflection) are identified then look మొబైల్ నంబర్ పో ర్టబిలిటీ - వికీపీడియా into the ontology for equivalent terms. Here in this case it looks into the te.wikipedia.org/wiki/మొబైల్_నంబర్_పో ర్టబిలిటీ ontology for Telugu terms. మొబైల్ నంబర్ పో ర్టబిలిటీ (Mobile Number Portability or MNP ) మొబైల్ ఫో నత వాడకందారల కు, ఒక మొబైల్ నెటవర్క్ Step5: ఆప్ర్ధేటర్క నతండి మర్ధొక ఆప్ర్ధేటర్కకు మార్ధ్ినప్ుడు త్మ మొబైల్ టెలిఫో న్ నంబర్కనత ఉంచతకోగలిగే the terms that are not available in ontology are sent to the OOV సౌలభూం కలిుస్తుంది. ... component to transliterate literally Step6: once the terms are converted now the result will be converted into Telugu. Step7: using the ontology the re-ranking process is done and the results are shown to the user in user native language. Figure 5.7 shows the results that are processed in post-processing system. Start Retrieve relevant results related to the pre-processed query Figure 5.6 Tokenize the snippet into tokens using tokenizer Results retrieved related to Inflection Table lookup the query Language Grammar Rules to identify Subject, verb and object Rule identification based on the inflection and verb Lookup into the ontology for equivalent terms Transliteration Figure 5.7 N o Yes Final Results to the user Results conversion into R1: మొబైల్ కంప్యూటంగ్ (Mobile computing) సాంకేతిక వస్తువులనత వూకిుకుని సామర్ధాయాన్ని వర్ధ్ణంచడాన్నకి the user native language సాధారణ ప్దం, స్థి రముగా అమర్ధ్క చేస్థ వాడటాన్నకి వీల ైన స్తలభంగా... R2: మొబైల్ టెలివిజన్ చేతిలో ప్ర్ధ్కరముతో టెలివిజన్ చూడడం. ఆ ... Results re-ranking using ontology R3: మొబైల్ నంబర్ పో ర్టబిలిటీthe (Mobile Number Portability or MNP) మొబైల్ ఫో నత, మొబైల్ నెటవర్క్ ఆప్ర్ధేటర్క మార్ధ్ినప్ుడు మొబైల్ టెలిఫో న్ నంబర్కనత... Stop system is shown in the Figure 5.8 for given query The chart the flow for pre- processing Figure 5.8 Flow Chart for the Post-Processing stage 5.3 CONCLUSION In this chapter, the post-processing system for content presentation to the user has been explained in detail. The re-ranking algorithm is snippet based and takes into account the grammatical structure of the resultant snippets. The highlight of this work has been the semantic nature of the entire processing; overall in the past two chapters the Telugu equivalent for a user generated query has been generated. The next step is to evaluate the system with various parameters. These are done in the next two chapters.
© Copyright 2026 Paperzz