E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL– E-Learning Based Chemical Information Extracting Tool (eChem) L. Ratnayaka, P.S.U. De Silva, H.N.M. Wijesiri, A.M. Samaradiwakara, N. Ranpatabendi, and U.U.S.K. Rajapaksha Abstract-Chemistry is considered one of the most difficult subjects to study. It is a compulsory subject at the GCE A/L science stream with a pass, a necessity. To help students with their difficulties regarding Chemistry an E-Learning tool for Chemistry is envisaged. This E-Learning tool for Advanced Level (A/L) Chemistry is the result of a research carried out to identify difficulties students face in understanding the basic theories of Organic Chemistry as related to problem solving. The E-Learning tool developed from the research supports student practice with Organic Chemistry transformations and structure drawing with different modes of learning. It serves as a support tool external to conventional classroom education. Students can use this e-learning tool to master their organic chemistry skills and practice many different kinds of questions once a lesson unit is completed. In addition, the system facilitates student learning to find out weak areas in answering questions. The system would assist not only students but also teachers to interact with students while using the paper marking component, the decision making component and the evaluation component. Keywords –E-learning, chemical structure transformation, G.C.E. A/L, Topology, SMILES, IUPAC naming I. INTRODUCTION he difficulties associated with learning Chemistry are well known. The difficulties students face in understanding the concepts with chemistry and to answer questions based on the theory learnt have caused much concern. Therefore, it is not surprising to find students running towards tuition classes for help. The use of a learning tool for the purpose has never been thought of. The local GCE A/L examination is the biggest hurdle in the Sri Lankan education curriculum. To do higher studies at a university, everyone in the A/L science stream must secure passes and get through the G.C.E. (A/L) examination. It is revealed that the annual failure rate of the Advanced Level Chemistry subject is very high - nearly 51% in recent years [1]. As such, some support towards a better understanding of the subject will be welcome by both students and teachers alike. The use of new technologies is an important inducement for both teachers and students to obtain an adequate transmission of knowledge. Even though the e-Sri Lanka concept by the government has made a contribution towards the local education sector, the use of learning tools does not play a major role in the school curriculum. eChem, the learning tool was developed to suit both students and teachers anxious to be better guided. The use of a learning tool will be a new experience for students to learn content in Organic Chemistry in addition to classroom education. The developed system facilitates mastery of skills related to Organic Chemistry and supports practice with many different kinds of questions. Further, this system supports teachers to evaluate students effectively by using a digital paper marker, somewhat similar to current multiple question evaluators used in A/L paper marking. This paper is about the implementation of an effective tool to suit a local requirement. In section III the system components are explained and in section IV the research part of the desired component are explained. II. BACKGROUND Throughout the literature survey some of existing e-learning tools and their core features for A/L Chemistry were studied by the authors. They are listed below. Sadly, the survey results revealed that most of the existing software tools do not fulfill local requirements, although a few do, partially. For example, QRIOM (Qualitative Reasoning in Organic Mechanism) [2] which is used to support learning and teaching Organic chemistry, reactions did not appear to contribute much to suit local requirements. It only highlighted the reaction mechanism describing how a reaction takes place by showing what is happening to valence electrons during making and breaking bonds. It also provides a qualitative reasoning model to predict the final outcome of a particular reaction and provides sufficient explanations. Other similar tools that were studied are ChemHelp, Kekule[3],CLIDE[4]. TABLE 1 COMPARISON WITH EXISTING SYSTEMS Similar Tools for Foreign A/L Subjects Don’t have a transformation engine Provide reaction description for a single reaction at a time L. Ratnayaka [email protected] P.S.U. De Silva [email protected] Subject content is less H.N.M. Wijesiri [email protected] Do not allow users to try out different structure transformation questions and evaluate those. A.M. Samaradiwakara [email protected] N. Ranpatabadhi [email protected] U.U. S. K.Rajapaksha [email protected] 40 Our System Consists with chemical structure transformation engine Provide reaction description in advanced manner for a set of chemical reactions (chemical transaction) in the same time Subject content is large, advance and expandable Allow users to select transformation related questions and evaluate their knowledge by them selves E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL– processing task. It’s essential to convert the chemical structure details into a common, machine readable format. To transform the IUPAC name to the SMILES format a separate algorithm is used. III. METHODOLOGY This section elaborates the methods used to achieve the tasks. A. Information Gathering Gathering of requirements was carried out via interviews, questionnaires and meetings with several identified A/L Chemistry teachers to get a basic idea about the structure transformation process related to Organic Chemistry and existing teaching techniques. In addition, discussion sessions were commented with A/L students to find out learning difficulties in Organic transformations and practical issues that arise while preparing for transformation based questions. In addition, more details were gathered on currently available elearning systems for Organic Chemistry via numerous research papers and experiencing some Chemistry tools aligned with foreign A/L syllabuses. More information on suitable technologies that can be used for the system was also acquired [4], [5]. The information gathered contributed towards the system envisaged. 3) Decision Making Tool (Evaluator): This function is implemented as a helping tool for teachers /lecturers with the preparation of questions based on chemical structure transformations of Organic chemistry. According to teachers, they can decide to make the questions to find the lineup order (mixing order) of the reagents and to find intermediate resulting structures restricting the transformation into a particular number of steps. In addition this function facilitates to save the made up questions using the tool for further references. 4) Chemical Structure Name Recognizer: This component was designed to translate digital raster images of chemical structures into standard chemical file formats that are used by chemical structure transformation simulator of the system. This task is used when the student needs to input a chemical structure in 2D format as a main reactant or as a final product to the structure transformation simulator and final outcome is the construction of topology. B. System Overview Basically, the system contains five components: Structure Format Identifier, IUPAC name/ SMILES format Generator, Decision Making Tool (Evaluator), Chemical Structure Name Recognizer and Chemical Structure Transformation Simulator. The components are described hereafter. 5) Chemical Structure Transformation Simulator: This intelligent tool works as an automated chemical reactor using algorithms. Students can use this component as a self learning tool to learn or gain practice to do organic chemical structure transformations. If the user logged to the system as a student, he/she can view the student’s selection page to provide input and output chemical structure, to use the transformation simulator. The input can be in standard chemical structure naming format called IUPAC naming or in 2D format consider as linear-bond structure or in regular format. In the next activity, a user can provide reagents used to interact with main input chemical structure and intermediate resulting structures. Then the user can view the results for each and every step performed in a user preferred format including sufficient explanation for that particular reaction. IV. RESEARCH FINDINGS/ RESULTS AND EVIDENCE Fig. 1.-System Diagram The main target of this research was to build up an elearning based supporting tool for A/L chemistry students in Sri Lanka to master their knowledge of Organic Chemical structure transformations. As the main goal is almost successfully completed, most of the students already can use this as a self learning tool to show expertise with structure transformation skills, to enhance reasoning skills for reactions as well as get familiarized on transformation questions targeting exams. In addition, the tool provides students to learn IUPAC naming by providing chemical structures as input and for teachers, it makes it easier regarding decision making in paper making process. This tool meets all the requirements of users satisfactorily. 1) Structure Format Identifier: This component is used to identify the format of the input. If the data is in image format output, it is directed to the chemical structure/name recognizer. If the data is in 2D format which is drawn using the JME tool [6] the output is directed to the smiles format converter. 2) IUPAC name/ SMILES format Generator: Users of the system can insert not only the 2D graphical structures, but also the IUPAC names of the chemical structures. SMILES conversion [7] is about converting the inserted chemical structures in to a machine readable/manageable format. Unless the system converts the chemical structures given by the user, the system cannot proceed with the chemical transformation process, because it cannot use the 2D graphical structure or the IUPAC name of a chemical compound just as it is, for any useful computer related 41 E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL– A. Translate digital raster images of chemical structures into standard chemical file formats However, the conventional HT does not use pixel-connectivity and line-width information as important features for lineextraction. So, while the human eye only recognizes a linear grouping of pixels as a line, the Hough transform would assign greater weight to a broken, aligned sequence of pixels. In addition, the Hough transform will assign a greater weight to a long, narrow sequence of single pixels that may be less visible to the human eye. To correct these problems, a modified Hough-Transform is used for line detection. In the modified HT, each pair of pixels is assigned a weight based on the probability that the two pixels originate from a single linesegment. This task needs to follow a few steps that are listed below and most of them are somewhat similar to Marc Zimmermann and Martin Hofmann techniques [8]. In this research authors have used Microsoft technologies selecting C# as the programming language and aforge.net as image processing libraries. 1) Pre-processing: The chemical structure diagrams are drawn with different settings with software for drawing such as default bond lengths or character font sizes. Moreover, the image size and format are subjected to variations. Thus it is necessary to resize and de-noise the input image so that the chemical structure diagram within the input image has bond lengths and character sizes optimally adjusted. 4) Bond type recognition: In low resolution (fuzzy) images, Hough Transformation often fails to distinguish a double or triple bond from a single bond. With thickness-based bond correction, a single line detected can be interpreted as a double or triple bond by considering the thickness of the bond, as well as the pattern of dark-white transitions perpendicular to the line (Figure 3). 2) Character line separation: This segment helps disassemble connected components based on pixel connectivity. To detect characters, a character detection algorithm searches for objects with similar heights and areas .For this MODI library built by Microsoft is suitable. The major drawback is it detects words (set of characters together).But in some circumstances it is useful as it’s easy to detect most text populated area. Most text components can be separated from the rest of the chemical structure using this method. 3) Line Detection: Most bonds in a chemical structure drawing are simple straight lines. Therefore, a robust line detection algorithm is the key component for extracting bond features from a chemical structure diagram. In digital image processing, the Hough Transform (HT) is a standard technique used for this purpose. It detects lines by mapping the image in the Cartesian space to the polar Hough space using the normal representation of a line in x-y space (Figure 2(a) and 2(b)): Fig. 3.- Sequential steps for bond detection 3. (a) Original Image, 3. (b) Detected corner points after removing character components, 3. (c) Detected normal and wedge bonds, and 3. (d) left pixels before dashed bond detection 5) Character recognition: All separated character components are sent to an open-source library (MODI library) for optical character recognition. Employed character recognition algorithm is based on template matching of features such as holes at middle, upper, lower positions, pixel densities of sub-regions, and white-black transitions through a line. Currently the library is used without any customization, which leads to relatively high recognition errors that occurs due to following image errors. Figure 4 shows common character recognition errors in the chemical structure diagram. Fig. 2.- Hough Transformation for bond detection 2. (a) Cartesian Image Space, 2. (b) Polar Hough Space, 2. (c) Example of HT applied to a chemical structure image, and 2. (d) Hough Space corresponding to 2. (c). Since a pixel corresponds to a sinusoidal curve in the Hough space, collinear pixels in the x-y space have intersecting sinusoidal lines. Therefore, all possible lines passing through every arbitrary pair of pixels in a chemical diagram image are identified by checking the intersection points of curves in the Hough space. Figure 2. (c) and 2. (d) shows the detected line and the corresponding Hough space. The density of a point is shown by r, . Fig. 4.-Common character recognition errors 4. (a) low resolution, 4. (b) Broken character, 4. (c) Glued to a graphic component, and 4. (d) Glued characters 6) Topology construction and SMILES convertor: Image Processing errors are common in this type of systems. These errors may prevent creating the topology graph. This 42 E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL– type of situation leads the system to generate an error message and smile converter also generates some kind of error prone string. To avoid this type of error the error prone string is sent to the structure recognizer. In side this structure recognizer an approximation algorithm is used to tally the string with the spell checker output. This method will filter the smile string several times. The final output is the most similar smiles format related to the structure. which initializes the starting point of a transformation and another structure as the final outcome which indicates the end point of that transformation. Fig. 5.-Topology Construction. 5. (a) detected bonds (lines) and symbols (rectangle), (b) created nodes (bold dots), and (c) final nodes and edges Fig. 8. - Structure transformation simulator-Architecture 7) Spell checker: It is possible to find patterns between atoms like C and H in a chemical structure. This clue is somewhat effective in identifying carbon compounds. And for other atoms like O, Cl, Br etc. This identified patterns and previously found chemical structures can be stored in a chemical database. By using searching algorithms it’s possible to retrieve similar smile formats that would be output by spell checker. Ex1: Fig. 9. -Input Chemical Structure in 2D Format Fig. 10. -Input Chemical Structure in textual format For further data manipulation tasks input structures are converted in to a linear common format. In this research authors have used a standard machine readable format called, “SMILES” used in chemical data manipulations. Fig. 11. -Input Chemical Structure in smiles format 2) Identify the type of the main reactant: The type of the reactant depends on the functional groups which provide the classification of Organic chemical compounds according to their reaction priority. Since the reactions of chemical compounds are based on this reaction priority it is necessary to identify the type of the reactant. All the functional groups can be identified uniquely and one chemical structure may contain more than one functional group. Fig.6 .-Spell checker B. Data Manipulation approach used in chemical structure transformation simulator Ex 1:-Input chemical structure: - To do a particular transformation, the system goes through a few steps. They are as follows. 1) Convert user input chemical structures’ format in to a machine readable format: To use the structure transformation simulator, basically the user has to input two chemical structures. One structure as the main reactant Fig. 12. -Input Chemical Structure in 2D format Structure in SMILES format: - C (=O) C 43 E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL– According to a function of the system, this structure is …...identified as a “Keetone”. Because the structure contains …...the pattern “(=O) C”, which means it belongs to the …...functional group “Keetone”. Functional group: - “one” V. CONCLUSION AND FUTURE WORK In this paper the authors have introduced a tool which can be used as a supporting tool for students as well as for teachers/lecturers by identifying complications regarding Organic chemical structure transformation process within the A/L Chemistry syllabus of Sri Lanka. The main usage of this tool is that, it can be used to study and understand transformations of structure without lab experiments which may consume a lot of money, time and causes even human injury easily, with aid of predictions. Indeed, chemicalstructure recognition algorithms may be most akin to character and text recognition algorithms. Like words in a dictionary, a chemical structure database can serve as a training set of molecules that can be used to identify the most common chemical substructures present in all known chemical compounds. This method would optimize eChem’s structure readability.As future work, the following requirements are recognized as likely for delivery with future releases. - (=O) C 3) Generate Chemical structure transformation by the system: After providing the structures, a user has to provide the reagents to react. The reagents order should be the order that the user expects to get intermediate results with. It is also possible to do user interactive transformations by providing one reagent at a time to proceed with the transformation. Once the required reagents/reagent is added the system generates the transformation steps with appropriate explanations as in figure 7. Ex1: • • • • Extending the system to handle aromatic chemical structure transformations. Generating 3D animated output (Currently, outputs are presented in 2D format). Promoting the system with more features to enhance self assessing capabilities. Extending the structure readability to read hand drawn structures. Furthermore, this approach can be further developed to build more powerful tools and for industrial purpose, AI concepts can be applied. REFERENCES [1] [2] Fig. 13. - Structure transformation process This task uses a system knowledgebase and it is a core component of the structure transformation simulator which stores data in XML format and most of the data manipulation tasks are done using linq to XML. A sample of a XML document the system use is shown in the figure 8. The system reads the user provided reagents one by one and extracts them from XML tags. For an example if the user input reagent is LiAlH4 and the reactant functional group is a carboxylic acid, the system will replace the Cabonil functional group with a Hydroxyl as the reaction step should generate an Alchohol. [3] [4] [5] [6] [7] Fig. 14. - Sample XML file 44 Department of Examinations-Sri Lanka(2006). GCE,statistics. Available: http://www.doenets.lk Last accessed 28th September 2011. Y. C. Alicia Tang, S. M. Zain, and N. A. Rahman. (2009). Design and development of a qualitative Simulator for learning organic reactions. Available: http://www.naun.org/journals/computers/ijcomputers-126.pdf. Last accessed 01st April 2011. McDaniel JR, Balmuth JR,” Kekule: OCR – Optical Chemical (Structure) Recognition”,J Chem Inf Comput Sci, no:32,373-378, 1992 Ibison P, Jacquot M, Kam F, Neville AG, Simpson RW, Tonnelier C, Venczel T, Johnson AP,” Chemical Literature Data Extraction: The CLiDE Project”, J Chem Inf Comput Sci, no:33:338-334, 1993] Wikibooks. (2011). Organic Chemistry/Foundational concepts of organic chemistry/History of organic chemistry. Available: http://en.wikibooks.org/wiki/Organic_Chemistry/Foundational_conc epts_of_organic_chemistry/History_of_organic_chemistry. Last accessed 15,Oct,2011. Peter Ertl. (2009). JME. Available: http://www.molinspiration.com/jme/index.html. Last accessed 01st Sep 2011. Daylight summer school . (5th June2001). Reaction Smiles and Smirks. Available: http://www.daylight.com/meetings/summerschool01/course/basics/s mirks.html. Last accessed 25th June 2011. Marc Zimmermann and Martin Hofmann. (10 February 2007). Apitius -Automated Extraction Of Chemical Information from Chemical Structure depictions. Available: http://www.touchbriefings.com/pdf/2911/apitius.pdf. Last accessed 10th July 2011.
© Copyright 2026 Paperzz