E-Learning Based Chemical Information Extracting Tool

E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL–
E-Learning Based Chemical Information Extracting
Tool (eChem)
L. Ratnayaka, P.S.U. De Silva, H.N.M. Wijesiri, A.M. Samaradiwakara, N. Ranpatabendi, and U.U.S.K. Rajapaksha
Abstract-Chemistry is considered one of the most difficult subjects to study. It is a compulsory subject at the GCE A/L science stream
with a pass, a necessity. To help students with their difficulties regarding Chemistry an E-Learning tool for Chemistry is envisaged. This
E-Learning tool for Advanced Level (A/L) Chemistry is the result of a research carried out to identify difficulties students face in
understanding the basic theories of Organic Chemistry as related to problem solving. The E-Learning tool developed from the research
supports student practice with Organic Chemistry transformations and structure drawing with different modes of learning. It serves as
a support tool external to conventional classroom education. Students can use this e-learning tool to master their organic chemistry skills
and practice many different kinds of questions once a lesson unit is completed. In addition, the system facilitates student learning to find
out weak areas in answering questions. The system would assist not only students but also teachers to interact with students while using
the paper marking component, the decision making component and the evaluation component.
Keywords –E-learning, chemical structure transformation, G.C.E. A/L, Topology, SMILES, IUPAC naming
I.
INTRODUCTION
he difficulties associated with learning Chemistry are well
known. The difficulties students face in understanding the
concepts with chemistry and to answer questions based on the
theory learnt have caused much concern. Therefore, it is not
surprising to find students running towards tuition classes for
help. The use of a learning tool for the purpose has never been
thought of.
The local GCE A/L examination is the biggest hurdle in the
Sri Lankan education curriculum. To do higher studies at a
university, everyone in the A/L science stream must secure
passes and get through the G.C.E. (A/L) examination. It is
revealed that the annual failure rate of the Advanced Level
Chemistry subject is very high - nearly 51% in recent years [1].
As such, some support towards a better understanding of the
subject will be welcome by both students and teachers alike.
The use of new technologies is an important inducement for
both teachers and students to obtain an adequate transmission
of knowledge.
Even though the e-Sri Lanka concept by the government has
made a contribution towards the local education sector, the use
of learning tools does not play a major role in the school
curriculum. eChem, the learning tool was developed to suit
both students and teachers anxious to be better guided.
The use of a learning tool will be a new experience for
students to learn content in Organic Chemistry in addition to
classroom education. The developed system facilitates mastery
of skills related to Organic Chemistry and supports practice
with many different kinds of questions. Further, this system
supports teachers to evaluate students effectively by using a
digital paper marker, somewhat similar to current multiple
question evaluators used in A/L paper marking.
This paper is about the implementation of an effective tool
to suit a local requirement. In section III the system components
are explained and in section IV the research part of the desired
component are explained.
II. BACKGROUND
Throughout the literature survey some of existing e-learning
tools and their core features for A/L Chemistry were studied by
the authors. They are listed below.
Sadly, the survey results revealed that most of the existing
software tools do not fulfill local requirements, although a few
do, partially. For example, QRIOM (Qualitative Reasoning in
Organic Mechanism) [2] which is used to support learning and
teaching Organic chemistry, reactions did not appear to
contribute much to suit local requirements. It only highlighted
the reaction mechanism describing how a reaction takes place
by showing what is happening to valence electrons during
making and breaking bonds. It also provides a qualitative
reasoning model to predict the final outcome of a particular
reaction and provides sufficient explanations.
Other similar tools that were studied are ChemHelp,
Kekule[3],CLIDE[4].
TABLE 1
COMPARISON WITH EXISTING SYSTEMS
Similar Tools for Foreign
A/L Subjects
Don’t have a transformation
engine
Provide reaction description
for a single reaction at a time
L. Ratnayaka
[email protected]
P.S.U. De Silva
[email protected]
Subject content is less
H.N.M. Wijesiri
[email protected]
Do not allow users to try out
different
structure
transformation questions and
evaluate those.
A.M. Samaradiwakara
[email protected]
N. Ranpatabadhi
[email protected]
U.U. S. K.Rajapaksha
[email protected]
40
Our System
Consists
with
chemical
structure transformation engine
Provide reaction description in
advanced manner for a set of
chemical reactions (chemical
transaction) in the same time
Subject content is large,
advance and expandable
Allow
users
to
select
transformation
related
questions and evaluate their
knowledge by them selves
E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL–
processing task. It’s essential to convert the chemical
structure details into a common, machine readable format.
To transform the IUPAC name to the SMILES format a
separate algorithm is used.
III. METHODOLOGY
This section elaborates the methods used to achieve the tasks.
A. Information Gathering
Gathering of requirements was carried out via interviews,
questionnaires and meetings with several identified A/L
Chemistry teachers to get a basic idea about the structure
transformation process related to Organic Chemistry and
existing teaching techniques. In addition, discussion sessions
were commented with A/L students to find out learning
difficulties in Organic transformations and practical issues that
arise while preparing for transformation based questions. In
addition, more details were gathered on currently available elearning systems for Organic Chemistry via numerous research
papers and experiencing some Chemistry tools aligned with
foreign A/L syllabuses.
More information on suitable
technologies that can be used for the system was also acquired
[4], [5]. The information gathered contributed towards the
system envisaged.
3) Decision Making Tool (Evaluator): This function is
implemented as a helping tool for teachers /lecturers with
the preparation of questions based on chemical structure
transformations of Organic chemistry. According to
teachers, they can decide to make the questions to find the
lineup order (mixing order) of the reagents and to find
intermediate
resulting
structures
restricting
the
transformation into a particular number of steps. In
addition this function facilitates to save the made up
questions using the tool for further references.
4) Chemical Structure Name Recognizer: This component
was designed to translate digital raster images of chemical
structures into standard chemical file formats that are used
by chemical structure transformation simulator of the
system. This task is used when the student needs to input a
chemical structure in 2D format as a main reactant or as a
final product to the structure transformation simulator and
final outcome is the construction of topology.
B. System Overview
Basically, the system contains five components: Structure
Format Identifier, IUPAC name/ SMILES format Generator,
Decision Making Tool (Evaluator), Chemical Structure Name
Recognizer and Chemical Structure Transformation Simulator.
The components are described hereafter.
5) Chemical Structure Transformation Simulator: This
intelligent tool works as an automated chemical reactor
using algorithms. Students can use this component as a self
learning tool to learn or gain practice to do organic
chemical structure transformations. If the user logged to
the system as a student, he/she can view the student’s
selection page to provide input and output chemical
structure, to use the transformation simulator. The input
can be in standard chemical structure naming format called
IUPAC naming or in 2D format consider as linear-bond
structure or in regular format. In the next activity, a user
can provide reagents used to interact with main input
chemical structure and intermediate resulting structures.
Then the user can view the results for each and every step
performed in a user preferred format including sufficient
explanation for that particular reaction.
IV. RESEARCH FINDINGS/ RESULTS AND EVIDENCE
Fig. 1.-System Diagram
The main target of this research was to build up an elearning based supporting tool for A/L chemistry students in Sri
Lanka to master their knowledge of Organic Chemical structure
transformations. As the main goal is almost successfully
completed, most of the students already can use this as a self
learning tool to show expertise with structure transformation
skills, to enhance reasoning skills for reactions as well as get
familiarized on transformation questions targeting exams. In
addition, the tool provides students to learn IUPAC naming by
providing chemical structures as input and for teachers, it
makes it easier regarding decision making in paper making
process. This tool meets all the requirements of users
satisfactorily.
1) Structure Format Identifier: This component is used to
identify the format of the input. If the data is in image
format output, it is directed to the chemical structure/name
recognizer. If the data is in 2D format which is drawn
using the JME tool [6] the output is directed to the smiles
format converter.
2) IUPAC name/ SMILES format Generator: Users of the
system can insert not only the 2D graphical structures, but
also the IUPAC names of the chemical structures. SMILES
conversion [7] is about converting the inserted chemical
structures in to a machine readable/manageable format.
Unless the system converts the chemical structures given
by the user, the system cannot proceed with the chemical
transformation process, because it cannot use the 2D
graphical structure or the IUPAC name of a chemical
compound just as it is, for any useful computer related
41
E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL–
A.
Translate digital raster images of chemical structures
into standard chemical file formats
However, the conventional HT does not use pixel-connectivity
and line-width information as important features for lineextraction. So, while the human eye only recognizes a linear
grouping of pixels as a line, the Hough transform would assign
greater weight to a broken, aligned sequence of pixels. In
addition, the Hough transform will assign a greater weight to a
long, narrow sequence of single pixels that may be less visible
to the human eye. To correct these problems, a modified
Hough-Transform is used for line detection. In the modified
HT, each pair of pixels is assigned a weight based on the
probability that the two pixels originate from a single linesegment.
This task needs to follow a few steps that are listed below and
most of them are somewhat similar to Marc Zimmermann and
Martin Hofmann techniques [8]. In this research authors have
used Microsoft technologies selecting C# as the programming
language and aforge.net as image processing libraries.
1) Pre-processing: The chemical structure diagrams are
drawn with different settings with software for drawing
such as default bond lengths or character font sizes.
Moreover, the image size and format are subjected to
variations. Thus it is necessary to resize and de-noise the
input image so that the chemical structure diagram within
the input image has bond lengths and character sizes
optimally adjusted.
4) Bond type recognition: In low resolution (fuzzy) images,
Hough Transformation often fails to distinguish a double
or triple bond from a single bond. With thickness-based
bond correction, a single line detected can be interpreted as
a double or triple bond by considering the thickness of the
bond, as well as the pattern of dark-white transitions
perpendicular to the line (Figure 3).
2) Character line separation: This segment helps disassemble
connected components based on pixel connectivity. To
detect characters, a character detection algorithm searches
for objects with similar heights and areas .For this MODI
library built by Microsoft is suitable. The major drawback
is it detects words (set of characters together).But in some
circumstances it is useful as it’s easy to detect most text
populated area. Most text components can be separated
from the rest of the chemical structure using this method.
3) Line Detection: Most bonds in a chemical structure
drawing are simple straight lines. Therefore, a robust line
detection algorithm is the key component for extracting
bond features from a chemical structure diagram. In digital
image processing, the Hough Transform (HT) is a standard
technique used for this purpose. It detects lines by mapping
the image in the Cartesian space to the polar Hough space
using the normal representation of a line in x-y space
(Figure 2(a) and 2(b)):
Fig. 3.- Sequential steps for bond detection
3. (a) Original Image, 3. (b) Detected corner points after removing character
components, 3. (c) Detected normal and wedge bonds, and 3. (d) left pixels
before dashed bond detection
5) Character recognition: All separated character components
are sent to an open-source library (MODI library) for
optical character recognition. Employed character
recognition algorithm is based on template matching of
features such as holes at middle, upper, lower positions,
pixel densities of sub-regions, and white-black transitions
through a line. Currently the library is used without any
customization, which leads to relatively high recognition
errors that occurs due to following image errors. Figure 4
shows common character recognition errors in the
chemical structure diagram.
Fig. 2.- Hough Transformation for bond detection
2. (a) Cartesian Image Space, 2. (b) Polar Hough Space, 2. (c) Example of HT
applied to a chemical structure image, and 2. (d) Hough Space corresponding to
2. (c).
Since a pixel corresponds to a sinusoidal curve in the Hough
space, collinear pixels in the x-y space have intersecting
sinusoidal lines. Therefore, all possible lines passing through
every arbitrary pair of pixels in a chemical diagram image are
identified by checking the intersection points of curves in the
Hough space. Figure 2. (c) and 2. (d) shows the detected line
and the corresponding Hough space. The density of a point is
shown by r, .
Fig. 4.-Common character recognition errors
4. (a) low resolution, 4. (b) Broken character, 4. (c) Glued to a graphic
component, and 4. (d) Glued characters
6) Topology construction and SMILES convertor: Image
Processing errors are common in this type of systems.
These errors may prevent creating the topology graph. This
42
E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL–
type of situation leads the system to generate an error
message and smile converter also generates some kind of
error prone string. To avoid this type of error the error
prone string is sent to the structure recognizer. In side this
structure recognizer an approximation algorithm is used to
tally the string with the spell checker output. This method
will filter the smile string several times. The final output is
the most similar smiles format related to the structure.
which initializes the starting point of a transformation and
another structure as the final outcome which indicates the
end point of that transformation.
Fig. 5.-Topology Construction.
5. (a) detected bonds (lines) and symbols (rectangle), (b) created nodes (bold
dots), and (c) final nodes and edges
Fig. 8. - Structure transformation simulator-Architecture
7) Spell checker: It is possible to find patterns between atoms
like C and H in a chemical structure. This clue is
somewhat effective in identifying carbon compounds. And
for other atoms like O, Cl, Br etc. This identified patterns
and previously found chemical structures can be stored in a
chemical database. By using searching algorithms it’s
possible to retrieve similar smile formats that would be
output by spell checker.
Ex1:
Fig. 9. -Input Chemical Structure in 2D Format
Fig. 10. -Input Chemical Structure in textual format
For further data manipulation tasks input structures are
converted in to a linear common format. In this research
authors have used a standard machine readable format
called, “SMILES” used in chemical data manipulations.
Fig. 11. -Input Chemical Structure in smiles format
2) Identify the type of the main reactant: The type of the
reactant depends on the functional groups which provide
the classification of Organic chemical compounds
according to their reaction priority. Since the reactions of
chemical compounds are based on this reaction priority it
is necessary to identify the type of the reactant. All the
functional groups can be identified uniquely and one
chemical structure may contain more than one functional
group.
Fig.6 .-Spell checker
B. Data Manipulation approach used in chemical
structure transformation simulator
Ex 1:-Input chemical structure: -
To do a particular transformation, the system goes through a
few steps. They are as follows.
1) Convert user input chemical structures’ format in to a
machine readable format: To use the structure
transformation simulator, basically the user has to input
two chemical structures. One structure as the main reactant
Fig. 12. -Input Chemical Structure in 2D format
Structure in SMILES format: - C (=O) C
43
E-LEARNING BASED CHEMICAL INFORMATION EXTRACTING TOOL–
According to a function of the system, this structure is
…...identified as a “Keetone”. Because the structure contains
…...the pattern “(=O) C”, which means it belongs to the
…...functional group “Keetone”.
Functional group:
- “one”
V. CONCLUSION AND FUTURE WORK
In this paper the authors have introduced a tool which can
be used as a supporting tool for students as well as for
teachers/lecturers by identifying complications regarding
Organic chemical structure transformation process within the
A/L Chemistry syllabus of Sri Lanka. The main usage of this
tool is that, it can be used to study and understand
transformations of structure without lab experiments which
may consume a lot of money, time and causes even human
injury easily, with aid of predictions. Indeed, chemicalstructure recognition algorithms may be most akin to character
and text recognition algorithms. Like words in a dictionary, a
chemical structure database can serve as a training set of
molecules that can be used to identify the most common
chemical substructures present in all known chemical
compounds. This method would optimize eChem’s structure
readability.As future work, the following requirements are
recognized as likely for delivery with future releases.
- (=O) C
3) Generate Chemical structure transformation by the
system: After providing the structures, a user has to
provide the reagents to react. The reagents order should be
the order that the user expects to get intermediate results
with. It is also possible to do user interactive
transformations by providing one reagent at a time to
proceed with the transformation. Once the required
reagents/reagent is added the system generates the
transformation steps with appropriate explanations as in
figure 7.
Ex1:
•
•
•
•
Extending the system to handle aromatic chemical structure
transformations.
Generating 3D animated output (Currently, outputs are
presented in 2D format).
Promoting the system with more features to enhance self
assessing capabilities.
Extending the structure readability to read hand drawn
structures.
Furthermore, this approach can be further developed to build
more powerful tools and for industrial purpose, AI concepts can
be applied.
REFERENCES
[1]
[2]
Fig. 13. - Structure transformation process
This task uses a system knowledgebase and it is a core
component of the structure transformation simulator which
stores data in XML format and most of the data manipulation
tasks are done using linq to XML. A sample of a XML
document the system use is shown in the figure 8. The system
reads the user provided reagents one by one and extracts them
from XML tags. For an example if the user input reagent is
LiAlH4 and the reactant functional group is a carboxylic acid,
the system will replace the Cabonil functional group with a
Hydroxyl as the reaction step should generate an Alchohol.
[3]
[4]
[5]
[6]
[7]
Fig. 14. - Sample XML file
44
Department of Examinations-Sri Lanka(2006). GCE,statistics.
Available: http://www.doenets.lk Last accessed 28th September
2011.
Y. C. Alicia Tang, S. M. Zain, and N. A. Rahman. (2009). Design
and development of a qualitative Simulator for learning organic
reactions.
Available:
http://www.naun.org/journals/computers/ijcomputers-126.pdf. Last
accessed 01st April 2011.
McDaniel JR, Balmuth JR,” Kekule: OCR – Optical Chemical
(Structure) Recognition”,J Chem Inf Comput Sci, no:32,373-378,
1992
Ibison P, Jacquot M, Kam F, Neville AG, Simpson RW, Tonnelier
C, Venczel T, Johnson AP,” Chemical Literature Data Extraction:
The CLiDE Project”, J Chem Inf Comput Sci, no:33:338-334, 1993]
Wikibooks. (2011). Organic Chemistry/Foundational concepts of
organic chemistry/History of organic chemistry. Available:
http://en.wikibooks.org/wiki/Organic_Chemistry/Foundational_conc
epts_of_organic_chemistry/History_of_organic_chemistry.
Last
accessed 15,Oct,2011.
Peter
Ertl.
(2009).
JME.
Available:
http://www.molinspiration.com/jme/index.html. Last accessed 01st
Sep 2011.
Daylight summer school . (5th June2001). Reaction Smiles and
Smirks.
Available:
http://www.daylight.com/meetings/summerschool01/course/basics/s
mirks.html. Last accessed 25th June 2011.
Marc Zimmermann and Martin Hofmann. (10 February 2007).
Apitius -Automated Extraction Of Chemical Information from
Chemical
Structure
depictions.
Available:
http://www.touchbriefings.com/pdf/2911/apitius.pdf. Last accessed
10th July 2011.