Xper2: training and example of management system for

Xper2: training and example of
management system for description
and free access identification key
Hélène Fradin
Elise Kuntzelmann / [email protected]
Régine Vignes / [email protected]
EDIT National Museum of Natural History of Paris
Laboratoire Informatique et Systématique
UMR 5143 Paleobiodiversity (CNRS, MNHN, Paris 6)
Université Paris 6 – Pierre et Marie Curie FRANCE
Localisation /building place:
Bâtiment de Géologie (MNHN)
43, rue Buffon
75005 Paris
Tel 01 40 79 80 61
Postal address:
MNHN
CP48
57 rue Cuvier
75231 Paris Cedex O5 - France http://lis.snv.jussieu.fr
Abstract:
Xper2 is a plate-forme dedicated to taxonomic descriptions and computer-aided-identification. It includes
an editor to edit taxonomic standardized descriptions and several functionalities to identify specimens to
construct diagnosis, to compare and compute morphological dissimilarities etc.
To read more: http://lis.snv.jussieu.fr/newlis/?q=en/about
During this session Xper2 and the good practices to use it will be presented.
1
I. Taxonomic identification
Taxonomic descriptions can lead to several applications (Fig. 1). In this context we will especially focus
on Computer-Assisted Identification (C.A.I.), a discipline on which taxonomists will more and more rely on
in the future.
1. What is an identification key?
An identification key (or determination or dichotomous key) is a practical tool used to identify the taxon on
which a given specimen belongs to. In traditional keys (or paper keys), the user answers to questions
about his specimen by selecting one of the predefined answers: it is an imposed step-by-step approach.
The given answer lead to a new question and so on until obtaining the name of the search taxon. It is a
step by step elimination of the taxa which description do not match with the answers given by the user.
Such a discriminant route is a kind of graph: the questions are represented by nodes; branches represent
the different answers to a given question, they compose the « decision road » when the specimen is
determined successfully, and finally leaves, which are the terminal nodes of the graph, represent the taxa
taken into account by the key (Fig. 2).
Informatic has allowed to implement algorithms to compute such traditional keys, but also to develop new
identification methods, we named Computer-Assisted Identification, or free access key.
Computer-Assisted Identification systems either exist in local on the computer by installing the
identification software, either by online identification through the internet.
2. Taxonomic identification on Internet
Internet gives an access to an important number of taxonomic identification keys but a high percent of
these applications are classical keys customized for the web (more or less dynamical HTML documents),
but without using software managing taxonomic descriptions (e.g. Fishbase1). These online resources are
very interesting for a large public but do not provide tools to be include as components for an internet
platform of taxonomy on which taxonomists could work collaboratively.
Other web sites offer online atlas on a taxonomic group (e.g. Online atlas of Russian beetles2) but do not
offer tools usable by other taxonomists to create new applications.
Free access keys are also available for various taxonomic groups. See the DELTA website 3 for a list of
applications and TDWG website on SDD format4 to know more on the knowledge representation
standard. In the table (Fig.3) some of these numerous comparable systems are summarized. It is possible
to consult the EDIT report entitled « List of identified and to be tested descriptive tools »5 and also the
BdTracker website6, for a collection of links to softwares, tools useful to taxonomists. This last website is
developed and populated with the help of WP5.
2
3. Computer-Assisted Identification (C.A.I.)
As said previously, Computer-Assisted Identification offers a free access key: it means that the user
chooses himself the order by which he answers the questions. The discriminant path is not preconceived
but constructed in a dynamic way based on the user's choice. The advantage of such a system is to be
adapted to various identification context: in a traditional key, if the user doubt, or if the descriptor is not
visible on the specimen the course within the graph is blocked and the determination compromised (it can
happen, especially in the case of plants where the reproductive and vegetative organs' shape change
over the seasons).
C.A.I. can also assist the user to answer first to questions that will lead quickly to the sought taxon.
However, C.A.I. is not based on the same degree of strategy than a traditional key. Indeed traditional
keys focus on a tactic based on the studied group and defined by the expert who designed the key. With
free access keys, the strategy is chosen by the user or only advised by the system: so it is possible for
the user to select first the safest and / or easiest descriptors, according to his level of taxonomic skills.
Finally, and to conclude on the advantages and disadvantages of these systems, it is important to
underline that free access keys do not necessarily guarantee a quick identification of a specimen; the
user can spend time in going back in his previous choices, whereas traditional keys guide the user step
by step. Moreover this kind of interactive keys can sometimes be safer for inexperienced users.
II. Xper2 - a free access taxonomic software
1. What is Xper2?
Xper2 (Ung et al. 2008 in prep) is based on a previous system Xper (Lebbe et al. 1988; CIPA group
7
1993; Lebbe & Vignes 1998). It is a taxonomic management system for the storage, edition and on-line
distribution of descriptions. It allows interactive identification of specimens and the creation of keys. Even
if Xper2 is a powerful program for the professional taxonomist, it does not require any special computer
skills. It is also user-friendly for the neophyte naturalist who just wants to identify a specimen with an
already made application.
Xper2 is written in Java, and runs on current OS (Windows, Mac OS X or Linux). Its interface can be
displayed in three languages: English, French and Spanish.
2. Structure of a Xper2 Knowledge Base (KB)
A Knowledge Base is structured into four main objects:
⁃
the described entities (e.g. Taxa, specimens, phyto-associations);
⁃
the second type of object corresponds to the descriptors, e.g. the properties used to describe the
taxa; (e.g. « Type of leaves» «Colour of petals » etc.)
⁃
the third type of object is the set of descriptors-states or domain values for each one of the
descriptors; (e.g. « Entire leaves » « compound leaves » etc.)
3
⁃
optionally a list of groups that structures and enrich content. (e.g. « Flower », « Petals », « Fruit »
etc.)
Each type of object may be documented by text and illustrated with images. This knowledge
representation is rich and flexible enough for representing complex descriptions and taxa polymorphism.
Other properties that are associated to the knowledge base itself include authors, external links,
commentaries on the taxonomic or geographic limits and the context. It also can include a legal
information section about copyrights concerning the application, e.g. the knowledge base, and the
illustrations and pictures used.
III. Xper2 functionalities
1. Construction of a Knowledge Base with Xper2
The construction of a knowledge base with Xper2 can be divided into 5 main steps:
Step 1: Download and install Xper2
LIS website: lis.snv.jussieu.fr/apps/xper2 |-> Tab "Téléchargement"
A Java Runtime Environnement is required to use XPER2.
Step 2: Edit a knowledge base. It is possible either to open an existing base either to create a knew base
Open an existing base: => File => New base: select the file in .xpd format then => Open
Create a new base: => File => New base : Name of the base
Nb: the name of the base (title) is different from files name (e.g.: Pinus knowledge base and Pinus.xpd)
Step 3: Descriptors edition:
For each descriptor one or more states can be linked. Decriptors and descriptors states can be
documented by text (simple text or HTML text) and images
Dependancy between descriptors: if descriptors depend of other ones creating a hierarchy of descriptors,
this information can be expressed as rules by defining « exception states ». The « Number of leaflets »
depends of the « Type of leaves » (parent descriptor) and cannot be described if the leaves are « entire »
(exception
character
state)
Numerical descriptors: It is important to find the more relevant intervals in terms of taxa discrimination
Groups: creation of groups allows to organize and to structure the list of descriptors (often based on the
anatomy, but also with/without microscope, field, ...). One descriptor can belong to several groups.
Step 4: Taxa edition:
To describe a taxon, tick the corresponding states (one or several in case of polymorphism), or “unknown
description, because if nothing is ticked it means that no state matches. A comment can be added to each
4
description unit to store additional information, bibliography, etc., to maintain the tracability of the data.
Nb: it is possible to describe several taxa at the same time when the description is similar for one
descriptor ; and it is easy to look for « unknown descriptions » to complete a previous knowledge base.
The selection of a list of taxa allow also to compare their descriptions.
Step 5: Checking and test of the knowledge base:
The menu « Check the base » controls the consistency of the descriptions and compute the
discrimination level of the taxa.
Printable forms (lists, complete forms on taxon, matrix) and import/export functions may be very usefull to
control or to publish the knowledge base (CSV format for spreadsheet, HTML format).
2. Identification process with Xper2
It is a free access key: the user chooses the descriptors, and their order, to describe the unknown
specimen (Fig.4 and Fig. 5). The system can also propose an advice by computing and sorting
descriptors according to their discriminant power at each step of the identification process. Illustrations
and texts are available to guide and to prevent misunderstanding. Uncertainty is therefore managed by
selecting several possible characters states.
The lists of remaining and eliminated taxa are updated at each step of the key. To control the result, a
complete form describing each taxon is available if required. In these forms the differences between
specimen description and the description of the eliminated taxon are pointed in red.
The same identification system is available on line or locally. With the on line version, it is possible to
benefit of additionnal tools (following section) in order to rewrite description in natural language, to
compute diagnose and to focus on most similar taxa etc...And then to obtain a more sophisticated form
describing each taxon (see an example on Pinus on website8 or on Phlebotomine sandflies on website9).
3. Additionnal use of the knowledge base
Additionnal tools are available to analyse the knowledge base, especially:
–
to construct printable keys (or the total or subset of the knowledge base)
–
to construct automatically taxonomic diagnoses
–
to compute taxonomic dissimilarities
–
to rewrite automatically the descriptions in readable texts
These tools are not already included in Xper2 and their use needs to export the knowledge base in the
old Xper format. The possibilities of these tools will be demonstrated during the course. A tutorial (in
french only) is available on the LIS website 10).
5
IV. References:
1. http://fishbase.org/identification/classlist.cfm
2. http://www.zin.ru/Animalia/coleoptera/eng/index.htm
3. http://delta-intkey.com
4. http://wiki.tdwg.org/twiki/bin/view/SDD/WebHome
5. http://wp5.e-taxonomy.eu/blog/2007/05/16/new-deliverable-d-547
6. http://www.bdtracker.net
7. http://lis.snv.jussieu.fr/apps/xper2/
8. http://lis.snv.jussieu.fr/cgi-bin/viewxper.cgi?base=pins
9.
http://lis.snv.jussieu.fr/cgi-bin/viewxper.cgi?base=cipa_en
10. http://lis.snv.jussieu.fr/apps/xper/
Ung, V., Dubus, G., Zaragueta-Bagilis, R., Vignes-Lebbe, R., (2008) in preparation Xper2: a powerful tool
for managing taxonomic descriptions in knowledge databases.
Vignes-Lebbe, R., (2004) Biodiversity information management. Advanced Geographic Information
Systems, from Encyclopedia of Life Support Systems (EOLSS), Developed under the Auspices of the
UNESCO, Eolss Publishers, Oxford ,UK, [http://www.eolss.net].
Dettai, A., Bailly, N., Vignes-Lebbe, R., Lecointre, G. (2004) Metacanthomorpha: Essay on a PhylogenyOriented Database for Morphology — The Acanthomorph (Teleostei) Example. Syst. Biol. 53(5), 14–26
Gerard, D., Vignes-Lebbe, R., Dubois, A. (2006) Ziusudra, de la nomenclature à l’informatique : l’exemple
des Amphibiens. Alytes, 24(1-4) 117-132.
Cao, N., Zaragüeta Bagils, R., Vignes-Lebbe, R. (2007) Hierarchical representation of the hypotheses of
homology. Geodiversitas 29(1): 5-15.
J. LEBBE, S. NILSSON, J. PRAGLOWSKI, R. VIGNES et M. HIDEUX, 1988. The morphology of airborne
pollen grains and spores from northern Europe in relation to allergenic function : a microcomputer-aided
identification. Grana, 26 : 223-229.
CIPA group (Bermudez H., Dedet J.P., Falcao A.L., Feliciangeli D., Ferreira Rangel E., Ferro C., Galati
E.A.B., Gomez E.L., Herrero M.V., Hervas D., Lebbe J., Morales A., Ogusuku E., Perez E., Sherlock I.,
Torrez M., Vignes R. et Wolff M.), 1993. A programme for computer-aided identification of the
phlebotomine sandflies of the americas (CIPA), presentation and check-list of american species.
Memorias do instituto Oswaldo Cruz, 88 : 221-230.
J. LEBBE & R. VIGNES, 1998. Modelling taxonomic description for identification. In : Information
Technology, Plant Pathology and Biodiversity (P. Bridges, P. Jeffries, D.R. Morse & P.R. Scott eds.), : 3746.
Example of knowledge bases build with Xper2:
http://lully.snv.jussieu.fr/xperbotanica/
http://lis.snv.jussieu.fr/apps/xper2/identification/
http://lis.snv.jussieu.fr/apps/xper/data/varanID/
6
V. Figures and Tables:
Figure 1: Taxonomic descriptions use
7
Figure 2: Course of a key
Delta IntKey
Free, Delta format, runs locally
Navikey
Free, Delta format, runs online
Diversity Description
Free, data base format, runs locally
Frida (Dryades)
Free, runs online
IKBS
Free, specific format
Xper2
Free, specific format, runs locally and online
Linnaeus
Commercial, specific format, runs locally
Lucid
Commercial, XML format, runs locally and online
ActKey
Free, DataBase MySQL, online
SLIKS
Free, Delta format, online
Figure 3: Many comparable systems...
8
Figure 4: Edition mode of Xper2
9
Figure 5: Identification mode of Xper2
10