A Meta search engine for medical students using the French Medical

A Meta search engine for medical students
using the French Medical Virtual University
Nicolas Garcelon, Marc Cuggia, Stefan Darmoni, Pierre Le Beux
Laboratoire d’Informatique Médicale - Faculté de Médecine
Université de Rennes 1
35043 Rennes Cedex France
Abstract
The objective of this paper is to describe a Meta search engine which is integrated
into the French Virtual Medical University project. This project is based on the
federation of existing, resources, together with those currently being developed, in
several Medical Schools in France. The aim is to allow medical students to browse
through the teaching resources of the consortium and to look for specific topics by
using their natural language which is a mixture of medical and natural language, as
in daily professional life. The project includes: A virtual Medical University portal
where they can enter a few specific words and the engine will look for matching
resource descriptions given by several sites. The Integration of a new interactive
interface gives an easy way to get to the most adequate resources. Implementation of
new indexing and search engines based on medical vocabulary and ontologies has
been integrated. The first results of the evaluation show a better result than the
previous one using Mesh terms for the queries.
Keywords:
Web resources indexing; metadata; search engine; Virtual Medical University; user interface
1. Introduction
The French Virtual Medical University [1] is now a major project for Medical Schools in
France. It has reached official status as a special consortium called a Public Interest
Groupment (GIP).This project is based on the federation of existing or currently being
developed teaching resources in most Medical Schools in France (3/4 of all Medical Schools
have now joined the consortium). The objectives of the project are not only to share
experience across the country but also to integrate several resources using Information and
Communication Technologies to support new pedagogical approaches for medical students
and for continuing medical education. The project includes: A virtual Medical portal
(www.umvf.org)
In this context we reach most of the medical students in first and second cycles of medical
schools. This means we have to offer them simple and effective ways for retrieving what
they are looking for [2]. In the first phase of the project we used metadata based on Dublin
score and MESH indexing for the resources [3]. This was done in cooperation with the
CISMeF project which has the duty of manually indexing most of the French medical
resources including the pedagogical ones [4].
In this context we tested an automatic French indexing engine specialized in medical
semiology (ADM index) and integrated it into the UMVF indexing process for pedagogical
resources [5].
A specific search engine has been developed including the whole vocabulary of medical
semiology which is usually poorly represented in MESH.
The aim of this project is to create a meta search engine. This engine must propose an
ergonomic search, and the first retrieved resources have to be relevant.
We have to merge the access to CISMEF, PubMed and our subsystem to make a more
powerful meta-engine as described in this paper.
2. Materials and Methods
2.1. Materials
We present the different tools that we used in this paper: first the Doc’CISMeF search
engine, then UMVF workflow for indexing resources, the accentuated MeSH [6], the
Nomindex indexing tool [7] and the Cross-Language Medical Information Retrieval [8]
converted in web services.
The Doc’CISMeF search engine searches in the manually indexed document Cismef data
base for pedagogical resources. The engine enables the student to select courses among 96
medical specialties. A 2684 total resources set is indexed and a given course is indexed, on
average, in 9 specialties (maximum 42 and minimum 1 with a median of 7) using 11 MESH
keywords (maximum 253, minimum 1 and median 8 keywords). An HMTL notice
containing XML descriptive Data is accessible on line for each indexed course. Specialties
and keywords are given in French and English.
The accented MESH has been created from the French non accentuated MESH and contains
515 000 terms.
An indexing workflow has been created for the UMVF project. It allows teachers to directly
index their course materials with in a web form which automatically gives an accentuated
MESH keyword corresponding to the analysis of the description notes of the courses. This is
done thanks to the NOMINDEX web service. Teachers can select, at most, 10 specialties for
their courses. Then the workflow produces an XML description using the SCORM norm [9].
This workflow is currently being validated and after it will be released for routine use.
NOMINDEX is a tool which was developed at the Medical Informatics Laboratory of
Rennes (France) and generates MESH keywords from a given sentence or paragraph. A web
service has been implemented to make it easily transportable, useable and maintainable. This
web services was then associated with the accentuated MESH.
CMLIR is a tool which was also developed at the Medical Informatics Laboratory of Rennes
(France). It enables the automatic translation of a French request in English and selects the
corresponding documents in Medline (PubMed). It is specialized in biomedical terms and
was developed to help students to formulate the queries in their native language. The
translation is not perfect but it allows the student to find a reasonable set of relevant articles.
To develop the Meta search engine we used the Perl language in connection with the web
services presented above. We also used the PHP language containing specific string
functions for the automatic correction (Levenshtein) of spelling errors.
2.2. Methods
We first look for the knowledge resources that might interest the students when searching
the document data base. We select the richest document bases in terms of documents with
the minimum of parasite documents. We did not consider generalist search engines such as
Google© because they gave too much noise. We consider Doc’CISMeF, the UMVF
knowledge resource base and Medline base through PubMed.
Then we created the user interface which allows consultation of the three selected document
bases in real time when a user submits a query. The result was that the loading of pages was
too long for a real time user. Therefore we elaborated a differentiated strategy for each
search engine: we first look for UMVF and CISMEF and then in PUBMED.
For the UMVF and CISMEF course bases:
In order to reduce the page loading time, we decided to make a monthly copy in a local data
base. We load the list of courses resources in CISMEF together with their XML description
every month for each specialty. Then we reorganize the indexing made for each resource in
order to improve the relevance of the results. For each document, we automatically selected
the two most important specialties among those proposed by Cismef.
This reindexation is composed of 3 steps:
• Extract specialties having the maximal weight in CISMeF
• Among those specialties, select those whose terms are present in the Title of the
document and put them as major specialties of the document
• If no specialty has been selected in step 2 then the ones having the maximum weight
are selected as major specialties for the document.
The same operation is done for the document base indexed by the UMVF workflow.
During the search by specialty, the display of results is composed of two parts:
• The most relevant documents: the search specialty is present in the major specialties
of the document
• The less relevant documents: the search specialty is not in the major specialty listed
for the document but is present in the minor specialty of the document
For PubMed
The Meta search engine calls the CMLIR-PUBMED web service in order to propose an
English translation which is then transmitted for direct search to PubMed. Then the Meta
engine collects the XML notices of PubMed and displays them.
The semantic ergonomics
We developed the Meta search engine to make the searching process easier for students. The
interface is very simple for students who can search by keywords and specialties.
The engine automatically lists the MESH keywords associated with the natural language
query using the NOMINDEX web service.
If the engine does not find any document which corresponds to the query, it will call the
automatic misspelling corrector using phonetic proximity (Levenshtein module) and
compare it with the local keyword base.
Furthermore, every day the Meta engine launches an analysis of the users log to analyse the
last resource visited by a user before logging off [10,11]. We supposed that this is an
interesting resource for the user and therefore we weight it with a high score in a future
similar query.
The meta-engine evaluation:
We first assessed the capacity of the meta-engine to retrieve indexed documents by Cismef
and the ones given by PubMed. The evaluation of the CLMIR with PubMed has been
exposed in a previous study [8].The assessment was based on specialties to measure the
relevance of the results, specifically in terms of the ordering of specialties between the meta
engine and Cismef.
A survey done in 2002 on search engine user attitudes showed that only 23% of the users
went beyond the second page [12]. Another pilot study found that the users looked past the
fourth page of results less than 5% of the time [13]. Position ranking in Web-search results,
especially on the first few pages, is an important determinant of information accessibility by
users. This is why we limited our effort to the 25 first results.
For 5 disciplines, we counted the number of documents in terms of noise, silence and
relevance in the Meta engine and Cismef. The result obtained will enable us to validate the
reindexation methods used in the UMVF indexation workflow.
3. Results
To evaluate the results, senior physicians put themselves in the role of a medical student,
looking for a specific topic or course to revise for their exams. Five specialties were
selected: cardiology, Foeto-embryology, pneumology, surgery and urology. If a resource can
be indexed in say nephrology and urology, we consider it as pertinent for urology. Broadly
speaking, at least 60 % of the documents were found pertinent in the 25 first returned
references as can be seen in Figure 1.
Figure 1-Meta Engine evaluation by discipline
In Figure 2 we can see that an average of 77 % of the 25 first documents found by the Meta
engine were relevant.
Figure 2-Meta Engine evaluation
The reindexation method used for the Cismef documents increases the number of pertinent
results found in the first page for a given specialty and is therefore successful .We can
expect this to be true by using the indexing workflow method designed for the UMVF
project.
4. Discussion - Conclusion
The results obtained in terms of pertinence, noise and silence with respect to specialties are
really encouraging. The addition of new knowledge bases might be considered and depends
on the quality of indexation of the documents. If they are not well indexed, their integration
in our Meta engine might require a complete reindexing process. Furthermore, if the data is
not well structured, the Meta engine will not be able to analyse it properly. It is clear that the
quality of the results depends on the quality and adequation of the indexation process.[14]
The Meta engine can reorganize the documents with a precise objective which is to give the
most pertinent document in the first web page (25 referenced URL). In that respect we are
grateful to Cismef, which everybody recognizes as a good manual indexing process that we
can trust to give objective and well structured XML metadata.
Our aim is also to make an ergonomic engine for students. Indeed, the majority of students
have not followed courses in linguistics, thus, the engine’s form must be very simple and
intuitive [15].
Moreover, an evaluation about students faced with the World Wide Web shows that
consumers must be guided [16] to search for relevant documents. This is why we propose a
Meta engine: the students can find the maximum number of documents on only one site and
all documents are approved by teachers (as opposed to Google’s results).
The next step is to evaluate the whole system by medical students in a real experimental
context. Our first objective was to show that this approach seems to be worthwhile before
considering a controlled evaluation using the three knowledge sources (Cismef, PubMed,
and UMVF). Our Meta engine could then be accessible as a web service [17] interoperable in
any search engine with similar objectives.
5. Acknowledgments
We thank the Cismef Group for making available their data base for this work.
6. References
[1] Le Beux P,Le Duff F, Fresnel A, Berland Y, Beuscart R, Burgun A, Brunetaud JM, Chatellier G, Darmoni S,
Duvauferrier R, Fieschi M, Gillois P, Guille F, Kohler F, Pagonis D, Pouliquen B, Soula G, Weber J. The French Virtual
Medical University. Stud Health Technol Inform. 2000;77:554-62.
[2] Escoffery C, Miner KR, Adame DD, Butler S, McCormick L, Mendell E. Internet use for health information among
college students. J Am Coll Health 2005 Jan-Feb;53:183-8
[3] Mougin F, Cuggia M, Le Beux P. Development of an indexing search engine for the UMVF: proposal for an indexing
method based on Dublin Core and XML. Stud Health Technol Inform. 2003;95:727-31.
[4] Douyere M, Soualmia LF, Neveol A, Rogozan A, Dahamna B, Leroy JP, Thirion B, Darmoni SJ. Enhancing the MeSH
thesaurus to retrieve French online health resources in a quality-controlled gateway. Health Info Libr J. 2004
Dec;21(4):253-61.
[5] Pouliquen B, Le Duff F, Delamarre D, Cuggia M, Mougin F, Le Beux P. Medical pedagogical resources management.
Stud Health Technol Inform. 2003;95:486-91.
[6] Mary V. Indexation de documents bio médicaux et systèmes terminologiques. Thèse : 2004, Laboratoire d’informatique
Médicale, université de Rennes 1 France
[7] B. Pouliquen, F. Le Duff, D. Delamarre, M. Cuggia, F. Mougin and P. Le Beux. Managing educational resource in
medicine: system design and integration. Int Journ Med Info. 2005 Mar;74(2-4):201-7.
[8] Tran TD, Garcelon N, Burgun A, Le Beux P. Experiments in cross-language medical information retrieval using a
mixing translation module. Medinfo. 2004;2004:946-52
[9] Shyu FM, Liang YF, Hsu WT, Luh JJ, Chen HS. A problem-based e-Learning prototype system for clinical medical
education. Medinfo 2004 ;11:983-7
[10] Chen D, Orthner HF, Sell SM. Personalized online information search and visualization. BMC Med Inform Decis Mak
2005 ;5:6
[11] Crowell J, Zeng Q, Ngo L, Lacroix EM. A frequency-based technique to improve the spelling suggestion rank in
medical queries. J Am Med Inform Assoc 2004 May-Jun;11:179-85
[12] Greenspan R. Search engine usage ranks high. URL:
http://cyberatlas.internet.com/markets/advertising/article/0,,5941_1500821,00.html [accessed 2003 Jun 16]
[13] Richardson CR, Resnick PJ, Hansen DL, Derry HA, Rideout VJ. Does pornography-blocking software block access to
health information on the Internet? JAMA 2002 Dec 11;288(22):2887-2894.
[14] Baud RH, Ruch P, Gaudinat A, Fabry P, Lovis C, Geissbuhler A. Coping with the variability of medical terms.
Medinfo. 2004;2004:322-6.
[15] Wascat C, Beuscart-Zephir MC, Anceaux F, Alao O, Darmoni S. Evaluation of the Difficulties Faced by Users of a
Search Engine of a Medical Web Site: The Example of Doc'CISMeF. Medinfo. 2004;2004(CD):1903.
[16] Eysenbach G, Kohler C. How do consumers search for and appraise health information on the world wide web?
Qualitative study using focus groups, usability tests, and in-depth interviews. BMJ 2002 Mar 9;324(7337):573-577.
[17] Page RD. A Taxonomic Search Engine: federating taxonomic databases using web services. BMC Bioinformatics 2005
;6:48
7.Address for correspondence
Nicolas Garcelon
Laboratoire Enseignement et Recherche Traitement Information Médicale
CHU Pontchaillou
2, rue Henri Le Guilloux
35033 RENNES
FRANCE
[email protected]