Modeling Just the Important and Relevant Concepts in Medicine for Medical Language Understanding: A Survey of the Issues Anne-Marie Rassinoux1, Randolph A. Miller 1, Robert H. Baud2, Jean-Raoul Scherrer2 1 Division of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA 2 Medical Informatics Division, University Hospital of Geneva, Switzerland Over the past two decades, two challenging, ongoing domains of medical informatics research have been the construction of models for medical concept representation and the inter-related task of understanding the deep meaning of medical free texts. These domains can take advantage of each other by exploiting the rich semantic content embedded in both concept models and medical texts. This review highlights how these two inter-related domains have evolved by focusing on a number of significant works in this area. The discussion examines one particular aspect: how to employ medical modeling for the purpose of medical language understanding. The understanding process analyzes and extracts the content of medical free texts, and stores the information in a deep semantic representation, useful for future elaborated semantic-driven information retrievals. It is now recognized in the medical informatics community that such understanding processes can be augmented through use of a domain-specific knowledge base that describes what can actually occur in a given domain. For this, a well-balanced representation schema should be developed somewhere between a partial but accurate, versus a complete but complex semantic representation. These observations are illustrated using examples from two major independent efforts undertaken by the authors: the elaboration and the subsequent adjustment of the RECIT multilingual analyzer to a solid model of medical concepts, and the recasting of a frame-based interlingua system, originally developed to map equivalent concepts between controlled clinical vocabularies. Introduction Building models for medical concept representation and understanding the deep meaning of free medical texts represent ongoing challenges for research in the medical informatics community [1]. It is no an accident that the related areas of Natural Language Processing (NLP) and Knowledge Representation (KR) have been “hot topics” during the previous and current international working conferences of the International Medical Informatics Association Working Group 6 (IMIA WG6) [2, 3]. The two disciplines can now realize the potential of a combined approach. This convergence is all the more relevant for the medical domain. First, a large amount of useful clinical information is still embedded in natural language free texts. Second, several different medical nomenclatures and thesauri have already emerged, fostered by the need to standardize the parameters of clinical practice and to organize the literature. Several experiments have been reported, showing that advanced NLP tools can help in refining such large medical vocabularies [4, 5, 6, 7, 8, 9]. But, even if these methods greatly assist the expert modeler, an important gap still remains to achieve the building of a model of the domain, from this huge corpora of linguistic knowledge. Existing models for medical concept representation present a significant amount of relevant information upon which natural language processors can be grounded semantically [10, 11, 12]. The authors recently published a review of potential existing medical knowledge sources that are candidates for exploitation within NLP tools [13]. The existence of a list of multiple useful sources reminds us that no standard medical terminology nor common representation for medical information has emerged across clinical institutions. Nevertheless, several important joint efforts have been undertaken to find solutions towards well-known drawbacks, such as the great degree of variability, redundancy and inconsistency emerging from all these sources [14]. Exemplary efforts include the Unified Medical Language System (UMLS) of the National Library of Medicine, as well as the initiative of the Canon Group. The UMLS project is a long-term and vast project which aims at collecting and integrating electronic biomedical information from a variety of other controlled vocabularies [15, 16]. At a more formal level, the Canon Group’s effort aims at building a merged representational model for clinical radiology, representing a consensus among Canon Group members, for use in exchanging data and applications [17]. This paper addresses the benefits of linking the semantic components of a medical text analyzer to a solid model of medical concepts. In order to expound upon the main advantages of such concept models for use by NLP tools, a brief review of existing medical knowledge sources, their evolution and interaction, as well as a view of the different modeling levels are given. During this process, it is important to clarify the basic units which are manipulated at each stage and which support the bridge from free texts to a deep semantic representation of their embedded meaning. Significant current efforts in this area are also considered and compared, in order to gain a greater insight toward modeling just the important and relevant concepts in medicine for the purpose of medical language understanding. controlled keyword vocabulary for indexing biomedical literature. Likewise, the QMR vocabulary [26] (which is a superset of the original INTERNIST-I vocabulary) was created to describe possible (reported) patient findings in diseases in general internal medicine. This vocabulary was derived from extensive manual literature review and serves the purpose of providing input for the QMR diagnostic program [27]. From Texts to Models: Organizing the Medical Knowledge Unfortunately, a tradeoff must be made by the developers of controlled medical vocabularies (CMVs) that influences the ultimate utility of each CMV for NLP. The two counterbalancing features of a CMV are its breadth of scope (ability to easily incorporate a large number of entries from diverse topics in biomedicine) and its depth of representation (ability to represent concepts in a computationally meaningful manner). The force that drives the tradeoff is the amount of work that is humanly possible on a given project. Deeper representations are significantly more time consuming to build, probably by at least one order of magnitude. As a result, most of the large CMVs, even though specified through precise and technical expressions called terms (generally used to name codes), are expressed through languagesurface forms, which present well-known drawbacks. In particular, redundancy and inconsistency occur because of lack of formal definitions suitable for automatic manipulation. There is no algorithmic mechanism in these systems to precisely define what a term is and express how it differs from others. Moreover, as these terms are in most cases noun phrases, their underlying interpretation is both language-dependent and context-dependent. First, these terms are understandable only according to the syntax of the specific language used to express them. Second, interpretation of a term is often defined through its position in a hierarchy (the implicit link between a parent term and its child). Furthermore, the same term may appear in different positions in the hierarchy. Exploiting such large terminologies by computer, where terms only take their full meaning by considering their position inside the whole system, requires developers to make explicit, at least, all the various components of information currently embedded in such partitions. This paper considers how to use models for medical concept representation to perform medical language processing. Nowadays, access to some kind of “semantic model” or “domain knowledge base”, specified through a formal framework that describes what makes sense in a given domain, is a requisite for succeeding with natural language processing. In particular, Wehrli et al. [18] mention that “The connection between language and the ‘real world’, which is what a real semantic analysis should perform, is likely to remain out of reach as long as we do not know more precisely how to give a computational representation of the ‘real world’.” With a modeler viewpoint, Rector argues [19] that “The automatic processing and analysis of medical texts... is dependent upon concept systems to perform the analysis and represent the structured information... Work on the processing of medical texts is showing that the analysis of a text depends primarily on a most important element: a model of medical concepts.” But, what is the current status in organizing the medical knowledge? Different sources of medical information currently exist and are more and more available in machine-readable form. However, the way such medical information is expressed can greatly vary from unstructured format to well-structured system of representation. To be used effectively (by a computer as well as by human beings), information must be rapidly accessible, and therefore organized in an efficient way. Medical Texts: A Natural and Rich Source of Medical Knowledge Large corpora, such as medical reports written by physicians in their daily practice, or textbooks containing a huge corpus of descriptive medical narratives, represent a preeminent source of medical information expressed through natural language. Even if a textbook eliminates specific medical jargons and local “oddities” (e.g., nonstandard abbreviations) as encountered in medical reports, the large expressiveness of language (authorizing ambiguity and vagueness) significantly hampers its use by computerized medical applications which require error-free and clinically pertinent input. Controlled Medical Vocabularies: A Precise and Technical Source of Medical Knowledge There are now a large range of available controlled medical terminologies or vocabularies [20, 21, 22, 23, 24, 25, 26]. All differ from one another based on the specific domains and scopes or which they have been built [14]. For example, MeSH [22] has evolved at the National Library of Medicine (NLM) as a Concept Models: A Structured and Tractable Source of Medical Knowledge Due to the analytic complexities engendered by the above surface-form controlled medical vocabularies, computerized vocabulary mapping has become an important active area of research in medical informatics. Methods employed have ranged from lexical matching to conceptual matching. Lexical Matching Methods Lexical matching methods identify similarities among CMVs by retrieving common “strings” (words or phrases, extended through the notion of synonyms and related terms) in both the source and the target vocabularies. This approach has been favored, based on the availability of the UMLS knowledge sources [28], which integrates nearly 30 authoritative medical terminologies [29, 30, 31, 32]. Managing the large variability inherent in natural language, and still present in biomedical terminologies, has also required a fourth component for the UMLS knowledge sources: the SPECIALIST lexicon [33]. Conceptual Matching Methods Conceptual matching methods attempt to map different terms, not at the level of words and phrases, but at the level of “meaning”, implying a deeper representation of the intricate concepts of medicine. Such representation can be directly built over existing CMVs [34, 35]. But, at this level, it is more crucial to develop formal systems which make a clear distinction between the concepts to be represented and the linguistic terms (or other mechanisms) used to refer to those concepts. That is why formal systems for representing the concepts underlying medical terminology have emerged [36, 37, 38, 39, 40, 41, 42, 43]. Among these, we describe three important efforts as they will be discussed in more details in this paper. • In 1981-83 [38], and again in 1988-91, Miller, Masarie et al., developed and refined a frame-based interlingua [39] initially to capture the complexity of clinical findings in INTERNIST-I, and later to facilitate the translation between CMVs. This system, supported in part by the UMLS project, was based on the assumption that clinically relevant statements about patients contain at least one identifiable central concept, and this set of central concepts can serve as focus for mapping between medical vocabularies. Generic finding frames were used to specify how a central concept may be expressed and also be qualified by general modifiers. Each generic frame has a superstructure (including its concept name, its status descriptor, its potential site descriptor, its potential subcategory descriptor, its potential qualifiers, as well as “the methods of elicitation” descriptor), and finer details which are encapsulated in the form of “item lists” (also called qualifiers). Thus, the generic frames provide a template for describing the medical meaning of specific medical terms in a standardized manner. Over 750 generic frames were created for describing the medical meaning of a test set of 1,500 medical terms for general internal medicine identified from the Quick Medical Reference (QMR) lexicon [27], as well as portions of the HELP PTXT lexicon [24], and parts of the DXplain lexicon [25]. • Cimino et al. have constructed the Medical Entities Dictionary (MED) [40], a hybrid of terminology and knowledge, using a semantic network based on the Unified Medical Language System (UMLS) [28, 44], with a directed acyclic graph which defines a multiple inheritance hierarchy. Each concept node in the MED graph can be viewed as a frame, and has links to nodes other than parentchild nodes through the semantic relationships. Every concept in the MED is a generic concept and as such should be regarded as a type or class. In order to support NLP and the ability to map one controlled vocabulary to another, a compositional modeling of lexically complex concepts is also maintained in MED. This system, which is beginning to reach critical mass (it currently contains over 34,000 conceptual components), forms the heart of the medical representation in the Clinical Information System (CIS) of the Columbia-Presbyterian Medical Center (CPMC). Clinical applications retrieve patient data using MED concepts. • Since 1992, Rector et al. have been developing, through the GALEN consortium, a fully compositional and generative system of medical concepts [41, 42]. These concepts are represented in the Common Reference, or CORE model, and are expressed in a language-independent manner through the GALEN Representation and Integration Language (GRAIL) Kernel. One important feature of this model is that it attempts to restrict entries to valid combinations of concepts that form medically sensible expressions. In this regard, it is similar to the generic frame system [39], which directly specifies how findings can be constructed from concept definitions, and limits modifiers to those that both make sense and are not selfcontradictory. An advantage of the notation used by the GRAIL Kernel is that it can be converted directly to that of conceptual graphs [45] and the set of criteria associated with a concept can be seen as a frame-like structure. The current version, which contains nearly 6,000 concepts, must nevertheless be extended in order to be useful in general clinical applications. A Combined Solution Examining the way these dual approaches (lexical and conceptual) manage medical information, suggests their combination as an ideal solution. This conclusion is reinforced by the observation that medical language presents typical characteristics of a sublanguage (specific types of texts or reports have a writing style often designated as medical jargon), which implies that under certain circumstances, the meaning of natural language sentences is closely connected to the contextual medical domain [46, 47]. Thus, NLP tools coupled with concept models should succeed in managing medical information from texts. The afore-mentioned models have served as examples already: • The frame-based interlingua system developed by Miller, Masarie et al. [38, 39] was successfully used to map among the “pseudo” natural language embedded in QMR [27], HELP [24], and DXplain terms [25]. • The MedLEE text processor (an acronym for Medical Language Extraction and Encoding System) developed by Friedman et al. [11, 48] maps chest x-ray and mammography reports into unique medical concepts defined in the MED [40, 49]. This analyzer provides three phases of processing all of which are driven by different knowledge sources. The first phase, parsing, identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. Finally, the third phase, encoding, maps the terms into the controlled-vocabulary concepts that are maintained in the MED knowledge base, thus ensuring that the data could be used by subsequent automated applications. The formalism of conceptual graphs is used to represent these concepts [50]. • Likewise, the RECIT multilingual analyzer (a French acronym for “Représentation du Contenu Informationnel des Textes médicaux) developed by Rassinoux et al. [51, 52] at the Geneva University Hospital, improved its semantic validating and its inference capabilities by grounding its semantic components directly in the GALEN model [41, 12]. This multilingual analyzer (operational for French, English and, to a minor extent, German) uses a two-phase process to deal with the specific features of medical language. The first phase, called “Proximity Processing”, is a deterministic phase which combines the application of nonconventional syntactic procedures with the checking of the semantic compatibilities in order to group neighboring words together. From this set of relevant fragments, the second phase deals with the building of a sound representation of the sentence meaning into the formalism of conceptual graphs [50]. Conceptual schemata are used to select the heading concept and to establish the links between it and the other concepts in the analyzed sentence. From our experience, it appears that the success of this combined approach relates mostly to the specification of a wellbalanced semantic representation. In order to be useful for NLP, a model must at least provide the relevant concepts and their relationships that naturally occur in medical texts. This “coverage” criterion can be extended toward existing nomenclatures and vocabularies in order to bridge with what is currently available. Medical Modeling: Limiting Large Domains of Expertise Due to the large volume and diversity of medical knowledge, the conceptual modeling task is naturally difficult and laborintensive, but worthwhile to ensure an efficient use of such potential sources (expressed either through natural language or controlled vocabularies) by computerized applications. At the same time, the computational tractability of a knowledge base (i.e. being suitable for manipulations by a computer program) requires restrictions in the kinds of knowledge to be represented, as well as the degree of details. Both must have a manageable orm and size. Finally, a requisite for sharing models and systems across institutions is a formal structure - sufficiently expressive to represent complex knowledge, yet simple enough for semiautomated manipulation, and adequate for the domain needs. These considerations point to the need to develop domain models, answering to a well-defined goal, in order to yield concrete outcomes. The problem again relates to the tradeoff between depth of representation and breadth of coverage. Because concept models are far more labor-intensive to build, it would be a serious mistake for a project with finite resources to attempt to build a general concept model for all of biomedicine, because it would be difficult to achieve closure. As a result, many existing concept models were constructed for a specific purpose that further defined and constrained their scopes. For example, the frame-based interlingua system [38, 39] was limited to representing concepts rom the QMR lexicon of 4500 possible patient findings in internal medicine. The MED [40] was limited to actual laboratory procedures (and their related findings) at one institution, Columbia-Presbyterian Medical Center. Any openended attempt to represent, for example, the scope of concepts found in MeSH (basic biomedical science and clinical practice as described in the literature) in a “deep” model, would probably never come to closure - due to lack of focus, lack of qualified experts to represent concepts in a consistent manner, and lack of financial resources. Modeling for NLP needs The representational needs of NLP are different, but overlapping, with the needs of medical vocabulary system builders. Let us start with a concrete example where we want to determine what general kinds of knowledge are useful to start speaking about the finding pleural effusion? For the modeling task, this question can be reformulated into: what kinds of knowledge, describing our understanding of the concept PleuralEffusion, must be represented in a model? First, a definition of the literal meaning of a pleural effusion can be given as being “an effusion located in the pleural cavity”. In the same way, an effusion can be defined as “an accumulation of pathologic fluid”. These descriptions highlight the compositionality of this concept which is built from a number of different more elementary concepts. Therefore, this concept might be classified as being a kind of effusion. Medical domain and NLP modelers might also include descriptive features, such as the size (small, medium or large), the gross appearance (purulent, serous,..) or the laterality (left, right) of an effusion, as well as the list of methods useful to elicit such a finding (such as a chest plain film or a chest percussion). Finally, medical domain models may incorporate specific inferential knowledge, such as the relationships between findings and disease states (e.g., blunted costophrenic angle is a radiological sign which provides non-specific evidence for a small pleural effusion; and, clinical manifestations of large pleural effusions include atelectasis, egophony, and dullness to percussion; and that possible diseases causing a pleural effusion are congestive heart failure, nephrotic syndrome, pulmonary infarction, and rheumatoid arthritis.). When such information is part of the model, specific reasoning processes can be triggered (e.g., for purposes of recommending diagnoses or therapies). PRAGMATIC LEVEL Patient documents (free texts, notes...), text books... A META LEVEL Concept or terminology models B INTERMEDIATE LEVEL NLP Instantiated information D C E DOMAIN KNOWLEDGE LEVEL Inferential information Figure 1 - The Sphere of Medical Modeling A global model of medical vocabulary usage and processing is presented in Figure 1. Two world views are combined: at one end, the “pragmatic level”, formed by the set of all meaningful medical utterances ever made by qualified domain experts and practitioners, and at the other end, the “meta level” - a general method for representing the concepts of a domain in a “deep model”. The pragmatic level represents the textual information available in the medical domain. The meta level represents structures, rules, and constraints that permit modelers to construct potential utterances in the domain. The meta level involves capturing the concepts of medicine and formalizing them into a concept or terminology model (link A in Figure 1). This level, by organizing the concepts and specifying the relevant relationships that occur between them, provides the semantics of what is medically reasonable to say in a particular domain (such as, a pleural effusion can be seen on radiograph or diagnosed by percussion). Ideally, the pragmatic level and the meta level would overlap completely. However, it is not possible to construct meta-level representational systems that are sufficiently constrained to only permit those sensible utterances that are observed at the pragmatic level. Similarly, the rules of the English language allow construction of grammatically correct, yet nonsensical utterances - a feature exploited by Lewis Carroll in the poem, “Jabberwocky” [53]. Most meta-level models, even though they substantially limit the scope of allowable utterances and force them to be related to meaningful concepts, are sufficiently underconstrained to allow construction of jabberwocky-like expressions. reasoning [54], the last reasoning step is almost non-existent in the current NLP tools as it needs a description which goes beyond a “pure” conceptual description as defined in the meta level. In particular, dealing with impliciteness and vagueness of natural language requires an “assertional” or “inferential” component that suggests the unstated concepts in order to achieve a full semantic representation of the implicit missing content of texts. The availability of the QMR knowledge base and lexicon to the authors [27] should allow future development of “intelligent” NLP tools that take advantage of this level. However, efforts to build domain specific medical knowledge base systems, such as for MYCIN [55], ILIAD [56], QMR [27] or DXplain [25] are usually independent of NLP projects. In between the two extremes of pragmatic and meta is the intermediate level of “instantiated concepts”. Instantiated concepts are the specific models derived from the meta level to capture the actual content expressed at the pragmatic level. In fact, it is the goal of NLP tools to perform exactly this conversion (link B in Figure 1). The meta level can be directly exploited by NLP tools for checking the semantics of any natural language utterances against this set of sensible combinations of concepts (link C in Figure 1). For example, retrieving the finding pleural effusion from free texts requires the ability to cope with the variety of ways this finding can be expressed in natural language. In English, this finding can be formulated through the following expressions: pleural effusion, presence of serous fluid in the pleural cavity, hydrothorax, chylothorax, pleurisy with effusion... (the corresponding terms in French being, épanchement pleural, présence de liquide séreux dans la cavité pleurale...). Conceptual Modeling: Is There a Best Approach? An important set of relationships has not been previously wellexploited in medical informatics. Figure 1 illustrates these relationships: the ability to use the intermediate (instantiated) level as the basic representational schema for a medical decision support system’s knowledge base (link D), and the ability to add pragmatic constraints to the meta level through use of a medical knowledge base constructed in the instantiated format (link E). An additional level is thus highlighted - “the domain knowledge level” - which requires more sophisticated mechanisms, such as the introduction of probabilistic relationships or temporal progressions to reason about complex cases. This information, embedding domain-expertise and reasoning capabilities corresponds basically to what makes sense clinically. Due to its inherent complexity, the domain knowledge level is not yet fully ormalized and thus is not directly available for use by other applications such as NLP systems. This is also a reason why, in a traditional linguistic analysis including the following steps: morphologic, lexical, syntactic, semantic, pragmatic and The rest of this paper concentrates on models for medical concept representation and their potential use by NLP tools. The frame-based interlingua [39], the MED [40], and the GALEN model [41] are examples of concept models which provide metalevel definitions, partially constrained by how to construct meaningful utterances in a domain. The way such models are built can greatly influence its size and degree of granularity, which are all the more important to ensure a large coverage and detailed representation of the medical texts analysis. There is no general methodology for developing a concept model. The quality of the modeling process is, on the one hand, greatly dependent on the depth of understanding of the concepts underlying the domain, and is thus best performed by a specialist in the covered domain. On the other hand, building a concept model is a formal process which requires skills in abstracting and structuring, and is thus best performed by experienced analysts. This last point insures the tractability of such a framework, as the underlying logical process must be sufficiently practical and pragmatic in order that specific operations (such as general inferences) can be performed by computer applications. The specification of a computationally tractable formalism also implies some effort and precision at the level of identifying the primitive and indivisible medical concepts and of distinguishing the relationships used to link these concepts. Several formalisms, suitable for a conceptual representation, have emerged from Artificial Intelligence, such as semantic networks [57], frame systems [58], conceptual graphs [50], or logical-statement languages [59]. These formalisms have proved to be reasonably equivalent. Determining the extent of conceptual modeling for NLP can be considered as an iterative process, involving a combination of both a “top-down” and “bottom-up” approaches [60]. The top-down approach is taken to set up the general characteristics of the domain and to organize them into a hierarchically-structured view. This facilitates navigation through the representation scheme, as well as assisting with the consistency and maintenance of the parts of the model. The highest levels of the GALEN schema [61] is a typical example of a top-down approach, which presents interesting features for its exploitation by NLP tools. The initial division at the highest level structure occurs between the “DomainCategory” and the “DomainAttribute” which corresponds respectively to the notions of concepts and relationships. Another important division takes place under “DomainCategory”, between the following categories: structures, substances and processes, and the category of modifiers. These subdivisions have had an important impact on the adjustment of the RECIT analyzer to the GALEN model [12]. The fact that medical language is, in essence, highly compositional and logically structured, entails that its semantics is intuitively based on the definition of relationships between each pair of sensible concepts. A clear separation between the notions of concepts and relationships is nevertheless important even if they can be considered to some extent as interchangeable [62]. As the concept form is usually more convenient for mapping rom natural language, it is advantageous to express most of the semantic features through concepts and to keep the relationships as simple as possible. Basically, “content words” (i.e. words conveying a strong semantics) should map to concepts and “function words”, such as prepositions, conjunctions or modal auxiliaries (like “can” o r “ must” in English) should map to relationships. Moreover, adding some specific descriptors (as a type or time modifier) is always possible for a concept, whereas relationships cannot be further specified. Another important distinction occurs between the meaningful concepts of medicine (e.g., “Abdominal Pain”) and the modifiers (or qualifiers) which can characterize these concepts. Modifiers generally have a broader usage not restricted to the medical domain (e.g., “ Severity”, “ Chronicity”). The tradeoff between medical concepts and modifiers allows information to be weighed and fits quite well the requirements of NLP systems. Indeed, the aim of NLP systems is irst, to extract relevant medical concepts from texts and then, to complete the representation with specification of the different properties attached to those concepts. Such an a priori conceptual organization defined in the GALEN model seems reasonable to describe most of the generic medical concepts. Nevertheless, its limited scope lacks the precision to be directly used in clinical applications. Current efforts of the GALEN consortium are to extensively cover the subdomain of surgical procedures at the level of their representation in a classification like ICD9-CM (vol. 3). Examining empirical data in order to refine and delimit the scope of modeling is known as the bottom-up approach. This approach aims at exploiting information currently handled within the considered domain. This approach was chosen to build the rame-based interlingua system [38, 39], which consisted of collecting all the relevant axes and terms that clinicians might use to describe any and all medical concepts embedded in QMR terms [27]. This led to specification of two interconnected (semiindependent) levels of information: a rich and accurate set of generic medical concepts described through generic frames, and a large set of well-defined qualifiers that are applicable across a number of generic concepts. The qualifier description incorporates both a limited set of values as well as a measure of the distance between these different possible values (e.g. the qualifier “ Severity” is defined as a progressive deviation with three allowable values: “ mild”, “ moderate”, and “ severe”). This bottom-up approach, by reviewing what is currently expressed in the QMR terms, ensures the robustness of the representation as the generic frames directly fit instances of concepts defined through QMR terms. Nevertheless, used alone, this approach cannot handle well all possible linguistic variations, as shown below. Even if the frame structure used to represent the generic medical concepts is convenient to express a first level of description (through slots and fillers), allowing the initial structure to be inverted according to some criteria, this representation is nevertheless not easy to maintain. Indeed, the frames per se do not specify any hierarchically-structured view of the primitive concepts which are useful to describe more complex medical information. Determining whether a generic frame or a qualifier exits is difficult without knowledge of the entire contents of the frame system. No restriction is mandated on the choice of a given name to express a concept, so that redundancy and inconsistency might appear. For example, names such as motion, exercise, movement or moderate activity are used in the initial system to designate concepts which influence generic frames such as “ Abdominal Pain”, “Back Pain”, or “Myalgia”. This example highlights the extensive use of subtle words in a specified language which may be easily confused with the concepts that they name. In this example, all the specified terms can be considered, at first glance, as synonyms or lexical variants of the unique concept representing the notion of Movement. However, there are subtleties that make their meaning slightly different clinically, and these latter ones should be taken into account by adding appropriate modifiers to this unique concept. Moreover, this thorough and enumerative building method has led to a series of frame descriptions which make extensive use of linguistic names to specify generic medical concepts, such as “ Abdominal Aortic Aneurysm By Imaging”, as well as to designate the suitable qualifiers, such as “ Type Of Aortic Aneurysm”. The literal interpretation of the meaning of these medical concepts is largely in their linguistic names rather than in the model itself. A recasting of this original frame system has been undertaken [63] to overcome these problems. The two above examples emphasize the advantages of each approach - top-down and bottom-up - as well as their weaknesses when applied alone. The top-down approach to conceptual modeling is useful to set up the general architecture of the model but applied alone, it lacks practical feedback to become usable by clinical systems. Conversely, the bottom-up approach to conceptual modeling ensures the construction of an accurate model based on empirical data directly extracted from a given domain, but applied alone, it is often confounded by too many linguistic details. This suggests a combined approach. Both of the afore-mentioned examples have evolved a combined approach. In particular, the high level structure of the GALEN Common Reference Model has evolved iteratively through experiences conducted with some coding systems [64] as well as practical experiments in building clinical systems within the GALEN project [65]. In the same way, the frame-based interlingua has been recast [63], for integrating a more uniform and formal description of the generic frames. First, the nature of the conceptual information has been refined, through the distinction between existential and quantitative frames, as well as the specification of features essential to the description of a generic frame (in particular, distinctions between general and local qualifiers). Second, formal definitions of complex generic frames have been introduced using the formalism of conceptual graphs [50]. Finally, attempts to build a multi-level hierarchy upon this rame system are under way. The two above examples show the clear benefit from having both a top-down and bottom-up approaches during the modeling process. They also emphasize the distinction between the concept model level and the precise language used to express these conceptual components. This is an important feature toward the specification of a Medical Linguistic Knowledge Base [66], and thus entails to clarify the basic units sustaining concept models as well as those underlying to the medical language. From Words To Concepts: Identifying the Basic Units The development in parallel of several inter-connected disciplines linked to the field of medical informatics has shown a widespread use of notions such as “word”, “noun phrase”, “sentence”, “term”, “code”, or “concept”. But depending on the context of usage, the same notion covers different topics and this interchangeable use unfortunately blurs the precise and technical meaning assigned to these notions in a specific discipline. In order to guarantee an accurate and correct communication between experts and scientists in medical informatics, it is important to preserve the distinctions as already emphasized by Tuttle et al. [67]. Preserving the Distinctions while Highlighting the Connections Roughly speaking, after accepting Tuttle’s more formal definitions, we can say that concepts are basic building blocks for modeling; that sentences, noun phrases and words are basic building blocks for natural language processing; and that codes and terms are basic building blocks for classification. But the reality is more complex since there is no one-to-one correspondence between these different notions. The Notion of “Concept” A concept is a unit of thought which is the “pure fruit” of an abstraction effort by human beings trying to represent the units of meaning in a particular domain. In medicine, these mental constructs strongly reflect the domain expert’s ability to extract medical entities from clinical reality. In order to refer to a particular concept, a unique identifier must be defined. Different formats, such as icons, numbers, or words can be used to specify this unique identifier. In practice, most systems [40, 41, 63] choose a unique number as an internal identifier. But, even if a unique number is a good means to refer a specific concept in a non ambiguous way, it is less expressive. That is why these systems usually associate with the internal identifier an external “knowledge-name” - also called “concept name” - which is used (instead of the numeric internal identifier) in the system interface to designate the concept. These concept names, expressed through words, are by convenience the most common way to display to the user a readable and straightforward understandable system of concepts, and as such, it is preferable to define in a specific language, unique concept names with a well- accepted usage. Moreover, in order to distinguish the concept names from others words, one should normalize these names, for example, by adding a specific prefix before each concept name, or by starting all words, belonging to a group of words naming a concept, with a capital letter and then concatenating them into one block, or by naming the concepts in another language than the native language of the system’s builder (in so far as this language is also known by all potential users). All these methods were combined in the RECIT analyzer [68] (i.e. “cl_” is used as prefix to mark all the concepts: for example, cl_AbdominalPain or cl_Heart). But, we can also think about concepts that have no “common” name but for which, at least a clear definition exists. Numerous examples can be found in the GALEN model [42] where such concepts constitute an important part of the compositional process. For example, the two abstract categories defined by: Culturing which actsOn ‘BloodSample’ and Culturing which actsOn ‘UrineSample’ are mentioned in the GALEN model interface through the above definitions and are both classified under the concept Culturing. These unnamed composite concepts are useful in the GALEN model for defining other composite concepts. For example, the composite concept BloodCulture is defined as: LaboratoryTest which hasSubprocess ‘BloodSample’). (Culturing which actsOn These above examples also introduce the notion of “compositionality”, which allows some compound concepts to be decomposed into more primitive concepts. Three new notions must then be considered: • primitive (or basic) concepts: they are atomic semantic entities, in the sense that they do not require to be subsequently subdivided in order to reflect their literal meaning (such as cl_Abdomen or cl_Excision). • composite concepts: they can be expressed through interconnected primitive concepts (for example, the literal meaning of the concept cl_Cardiomegaly can be expressed as an enlargement of the heart). • relationships: they are set up to symbolize the links amongst concepts, and thus act as “semantic glue” toward the specification of a complete semantic representation. Meaningful names (which can be prefixed by the characters “ rel_”) are also chosen to correctly exhibit the underlying semantics of each relationship (i.e. rel_hasLocation, rel_IsAssociatedWith). Therefore, the definition of a composite concept can always be formalized through primitive concepts and their relationships between each other. Finally, an important feature which must be clearly distinguished from the concept name is the concept annotation. Indeed, in order to be able to extract concepts from textual sources, it is important to annotate concepts with all the relevant words (simple words or multi-words phrases) in a specific language that are used to refer to this concept. For example, the concept cl_Liver can be annotated in English by the words “ liver”, “hepatic”, or even the prefix “hepato-”. Notice that, if the name given to a concept is unambiguous with a well-accepted medical usage (which is the basic rule!), a first annotation can be more or less automatically performed by looking at the concept name. The Notion of “Word” Words, which are strings of characters without blanks, constitute the basic units of natural language. They are both useful to express particular objects or acts (such as scissors or ablation) as well as entities with a more abstract meaning (such as pain or severity). They are also used to compose more complex syntactic structures such as noun phrases (i.e. groups of words centered around a word belonging to the grammatical category of a noun), or full sentences (which are constructed around a verb). These structures take their full meaning by considering the narrative context in which they occur (i.e. the surrounding information that clarifies how these structures must be interpreted). As highlighted before, words are also useful to designate concepts and constitute the basic elements for the concepts’ annotation process. Synonyms are features defined at the word level [69]. By and large, the set of expressions that annotate a concept can be considered as a set of synonyms (reflecting an identity of meaning). But, for specific applications, this notion can be extended to equivalent expressions (i.e. expressions conveying the same main idea while not being totally equal) [16]. The Notions of “Code” and “Term” In between concepts and words are codes and terms. A code is a unit of partition (i.e. a unit useful to define some classifications or categorizations), generally expressed through a numeric expression, which has no intrinsic meaning in itself but rather encodes important contextual information through two complementary mechanisms. First, the position of a code in the partition (i.e. the classification in which it is found) gives important information about its meaning. Second, codes are also specified by terms (or any definition), used to label an element of the partition. Terms (also called vocabulary entries), whether naming particular codes or not, are units of technical language intended or reuse [67]. They represent typical phrases selected by domain experts and are usually specified through a formal and scientific (technical) language, the structure of which is mainly expressed through noun phrases. Moreover, precise definition of terms through concepts is a useful way of clarifying their meaning [34, 35]. Common Confusions Among Concepts, Terms, and Words These above descriptions emphasize the strong connection existing between these units, whose casual usage often results in some confusions. A common confusion occurs between the notions of “concept” and “term”. In this matter, we can use the example of the UMLS source, where some misunderstandings can occur. The concepts defined in UMLS are only described through a unique alphanumeric identifier (CUI), for which no concept name is associated. Then, each concept identified through its unique CUI is annotated by a set of terms or noun phrases, which could be interpreted as potential names for that concept, whereas they are not (unless a “preferred form” is explicitly specified). Moreover, all concepts in the Metathesaurus (and, by extension, the terms that annotate these concepts) are connected through the “isa” link to one or more semantic types (or types categories) such as Virus, Acquired Abnormality, Disease or Syndrome, which are nothing more than other concepts specified through a generic name. Another common confusion takes place between a concept (or more precisely, its name), and the words (or groups of words) used to annotate this concept in the different considered languages. In fact, any annotation of a concept can potentially be chosen as the name used to refer to it (see the examples below). That is why, choosing a specific naming convention can easily remove confusion about the corresponding words from different languages. Moreover, there is clearly a dissymmetry between the notions of words and concepts as highlighted by the examples shown in Table 1. The two first examples show that an annotation for a primitive concept (i.e. a concept that is indivisible) can be done either through a single word or a group of words (sometimes annotation by prefixes can also be considered, as “cardio-” for the concept cl_Heart). The two following examples highlight that a composite concept can possibly be annotated by a single word if one exists in the specific language of the annotation. This also implies that, although the “word” constitutes the basic unit of any textual object (such as discharge summaries, reports, notes...), it does not always correspond to the notion of primitive (or basic) concept. Indeed, the latter can be inferior to the word (i.e. more than one primitive concept is embedded in a single word, for example, nephrectomy), or superior to the word (i.e. one primitive concept needs more than one word to be expressed, for example, abdominal aorta). It is also important to notice that the specification of the annotation kidney excision is not mandatory for the composite concept cl_Nephrectomy in so far as a definition is given (for example, by using the formalism of conceptual graphs as defined by Sowa [50]), as follows: [Nephrectomy] is: [Excision] ->(actsOn)-> [Kidney]. Therefore, every primitive concept needs to be annotated (with the different words and synonyms which serve to express this concept, in the several treated languages), whereas composite concepts (for which a definition is given) may be annotated based on the availability of words in the respective language. Maintaining equivalent definitions can then greatly reduce the need for introducing a large range of lexical variants in the system. Table 1- Mapping between words and concepts Words Concepts a word -> a primitive concept kidney renal cl_Kidney cl_Kidney a group of words -> abdominal aorta cl_AbdominalAorta a primitive concept renal failure cl_RenalFailure a word -> a composite concept nephrectomy cl_Nephrectomy a group of words -> a composite concept kidney excision cl_Nephrectomy a word -> several concepts left (adjective) cl_Left cl_LeftSided Finally, the last example deals with one important problem of natural language: its ambiguity. The word left is ambiguous both at the syntactic and semantic levels. Indeed, it can be an adjective, a noun or the past form of the verb “to leave”. As an adjective, it can take two different meanings. It can represent either the left-right selector as in “pain in the left arm” (i.e. being an annotation of the concept cl_Left, which is a child of cl_LeftRightState), or the laterality position as in “pain in the left side of the stomach” (i.e. being an annotation of the concept cl_LeftSided, which is a child of cl_LateralityPositionState). However, the adjective left can also occur in specific medical expressions such as “left heart”. This expression is an example of “medical jargon”, characteristic to the medical sublanguage, and as such, it must be treated as a single unit (also referred to “an idiomatic expression”) by NLP tools. Indeed, the meaning of the whole expression cannot be deduced from the combination of the meaning of each word composing this expression and moreover, such group of words always occur contiguously in textual records. In this way, “ abdominal aorta” and “ renal failure” can also be considered as idiomatic expressions. Finally, in some cases, taking into account the syntactic information is sufficient to solve the ambiguity. For example, the English word patient, considered as a noun annotates the concept cl_Patient, and considered as an adjective annotates the concept cl_Patience. Identifying the Corresponding Knowledge Levels The preceding section has highlighted two main basic units: “concept” and “word”. Each of these units fits into a particular domain of knowledge, which is respectively the conceptual level and the lexical level. The Conceptual Level The conceptual level embeds at least three kinds of information, which constitute the required basis for a robust concept model. • the set of concepts and their names, relevant for the treated domain, • a typology or ontology organizing these concepts relative to each other, • a set of semantic rules specifying how these concepts can be combined together in a manner that makes sense and is relevant. All the concepts deemed relevant for a particular domain must be described at this level. Specification of a structure through which the concepts can be organized allows users to maintain a consistent view of all the relevant clinical entities and their associated attributes [70]. Such hierarchy should reflect an appropriate level of generality and granularity, which may greatly vary with the degree of precision needed by each application. Generalization and specialization, conveyed along the “isa” link between nodes of the hierarchy, constitute the basic principles on which inference mechanisms can be implemented. The complexity of these mechanisms is also dependent on the nature of the hierarchy, which can be simple or multiple (i.e. allowing more than one parent node per child). Finally, an important part of the semantics of the domain under consideration is specified through a set of semantic rules, which allows relationships to be set up between each pair of sensible concepts. These compatibility rules are also useful to define composite concepts. The Lexical Level The lexical level is necessary to recognize words and phrases and thus plays an important role in natural language processing. Indeed, acquiring the list of words used in a given language (usually referred to as the lexicon or dictionary), is the first step of any attempt to understand free texts. The lexical level is then the place to support morpho-syntactic information about words (being either simple words or multi-words phrases such as idiomatic expressions) as well as the notions of synonyms, lexical variants, abbreviations and acronyms. Three kinds of information are usually asserted at this level for each lexical entry: • the lexical unit: It corresponds to the specification of the basic form of a word. Such basic form is generally expressed through masculine, singular and infinitive, depending on the word category and applicability in a specific language. Recognizing the basic form from any morphological variant is usually considered as a task belonging to the NLP side. Such a task is languagedependent and even though analogies exist between languages, it has to be redesigned for each new language. • the syntactic argument: This argument allows each lexical unit to be “grammatically” categorized, and may be rather complex depending on the specific language and the purpose of the application. It generally describes the grammatical category (preposition, noun, adjective, verb...) with some morpho-syntactic features as needed (i.e. number, gender, mode variations), as well as some usage information (for example, considering the usual position of an adjective relatively to the qualified noun, for the French language). • the semantic argument: This argument aims at describing the “meaning” of each lexical unit, this “meaning” being exactly conveyed by what we have called a concept. It can then be considered as a pointer toward one concept (or more than one in case of semantic ambiguity), which is precisely defined in the conceptual level. These lexical units can also be viewed as annotations of the semantic argument, useful to ensure a large coverage in searching for instances of concepts in medical texts, written in a specific language. The semantic argument is also the key element to define dictionaries in a multilingual environment [66, 69]. As defined above, the content of the lexical and conceptual levels must be broadened in order to fit with NLP and KR purposes. Extending these Levels for NLP and KR Purposes In addition to the local syntactic properties associated with individual lexical entries, syntactic rules can be added at the lexical level to deal with more complex structures such as sentences. These syntactic rules are language-dependent, and they clarify the valid syntactic structures (i.e. well-formed combinations of grammatical categories), which are permitted to support the expressions formulated in the treated language. The lexical level augmented with the syntactic rules allows NLP tools to precisely manage the syntactic information embedded in words, phrases, and sentences, as found in textual documents. The semantic rules defined at the conceptual level are useful to define binary relationships between two concepts, which can be roughly categorized as thematic and attribute relationships. Moreover, these rules are only locally validated. Describing the roles that a concept plays in a particular situation requires taking into account more precise information about the clinical context where this event can occur. Such contexts are a good place to specify more complex information, such as causal or temporal information, as well as default values and basic common knowledge, useful to build a semantic representation that clarifies the implicit information not said in the texts, although wellknown by people reading these texts. This requires modelers to incorporate contextual information at the conceptual level. For this, frame-based representation systems [71, 72, 73, 39] are suitable, as they provide a uniform environment for describing a network of associations between concepts representing a stereotyped situation. Terms (and their associated codes) are critical to the collection of accurate and aggregate health care data and to linking patient records to decision support tools. Therefore, terms should be tied to the two previously mentioned levels. First, such terms can be smoothly integrated at the conceptual level if a conceptual definition clarifying their meaning is provided [34, 35, 64]. Second, the technical vocabulary used to express these terms must also be incorporated at the lexical level. Once the basic units and corresponding levels of specification have been determined, their use by NLP tools can be considered. From Sentences to a Conceptual Representation: Holding the Important and Relevant Information Characteristics of Existing NLP systems Developing analyzers that yield a conceptual representation of medical narratives has long been a considerable research topic in medical informatics. Several analytic techniques have emerged [74, 75, 76, 77, 78, 79, 48, 49, 68, 51, 52]. The common approach taken by these systems deals with transforming sentences, from the language words in which they are expressed, into the chosen conceptual representation, which will be used as a standardized format for further information access. Different kinds of knowledge are involved in the analysis process which can be clustered in two main parts. The first category is concerned with the morpho-syntactic knowledge related to the sentence structure. This knowledge is precisely defined in what we have called the lexical level. The second category deals with the semantic (or conceptual) knowledge related to the sentence meaning. This knowledge corresponds to what we have embedded in the conceptual level, and is usually achieved as part of the domain modeling task. In between is the integration process [80] dealing with the problem of using syntactic and semantic information one after the other, or together. MENELAS, a medical language understanding project [78], is an example of a system that follows the standard division of natural language processing: morphosyntactic, semantic and pragmatic analyses. In the Linguistic String Project (LSP) system [74, 75], which is more syntaxoriented, the semantic restrictions are precoded at the level of the grammar rules and thus must be entirely anticipated during the conception of the system. In other systems, the weight of the semantic argument is amplified because of the importance conferred to a semantic-driven approach for medical texts analysis. In the MedLEE analyzer [48, 49], the structure of the source language is specified in a context-free semantic grammar which defines the well-formed semantic structures of the domain, integrating only few syntactic features. The METEXA (“MEdical TEXt Analysis”) [77] and the RECIT [68, 51] systems both use local syntactic rules to trigger the checking of any combination of sensible concepts. That is to say, as soon as two syntactic constituents have been detected (such a an adjective plus a noun, or a noun plus a noun complement), a valid semantic interpretation (specified by a pair of conceptual entities linked by a relationship) is sought in the domain model, if there is any. The syntactic information is moreover relaxed in the second stage of the RECIT analyzer [51], devoted to the building of the conceptual representation for the whole analyzed sentence. In this way, ill-formed phrases (i.e. phrases constructed without “syntactic glue”) are also treated insofar as they are sensible. This relaxation fits in with the particularities of medical language being both technical (using specific medical jargon) and written in a concise and direct style (resembling the telegraphic style). Some general remarks also apply to the above systems. Each of these systems uses a conceptual structure to store the meaning of natural language inputs. The formalism of conceptual graphs (CGs) as defined by J.F. Sowa [50] is nowadays the most popular formalism, chosen as the language-independent representation for the storage of the result of the medical texts analysis [77, 78, 48, 52]. Several attractive features promote them, such as their ability to reflect both the constraints of expressive power and notational efficacy. In particular, CGs allow distinctive features to be expressed and they support various kinds of operations. Moreover, their straightforward readability makes them easily understood. Although many groups have been working on medical language processing, very few useful and practical systems exist at the present time. Indeed, the strong medical constraints to be errorfree and accurate slow down the overall development. This is why a common feature of most of the existing systems is that they choose the radiology domain as the focus domain to test and evaluate their implemented analysis strategies [48, 77, 79]. The special appeal of X-ray reports is mainly due to their well-defined physical and conceptual structure, encompassing a delimited domain of clinical medicine that yields useful clinical information or decision support and research. Finally, the fact that on-line radiology reports are readily accessible from central patient databases in most hospitals enhances their potential use for NLP. Relying on Concept Models For NLP Needs: What are the Requirements? As highlighted before, the underlying properties of medical language have oriented researches in medical language processing toward investigating semantic-driven approaches, which make use of a large body of semantic or conceptual knowledge [77, 48, 51]. Anticipating the amount of useful conceptual knowledge during the design phase of a semantic-driven analyzer, even if a narrow domain and specific task are considered, remains highly utopian, due to the large and expressive amount of manageable information. Moreover, this requires skills that range beyond the domain of linguistic informatics by considering conceptual modeling tasks. A solution is to grasp this conceptual information directly from some existing conceptual knowledge bases. RECIT [12, 68, 51, 52] is a typical example of an analyzer that grounds its semantic components directly from a model for medical concept representation, which is developed apart from the analysis process, and which is the GALEN model [41, 42]. MedLEE [11, 48, 49] is another example of an analyzer that takes advantage of the existence of the MED [40] to produce structures which are compatible with, but not directly built from, the indings as modeled with the MED. Even if these two above systems differ in the way they rely on a concept model, they emphasize some requirements for the success of a combined approach. These requirements cover abilities defined from both the NLP system side and the concept model side, as emphasized in the following sections. Separate Processor From Knowledge Components A first requirement, essential for the implementation of an analyzer relying on a concept model, is that the core engine of the processor be clearly separated from the knowledge components. This separation is a crucial implementation feature designed to cope with the fluctuations in a concept model, which can at any moment evolve (by editing, removing or adding pieces of knowledge). This also ensures the independence of the processor toward specific clinical applications, as redefining the corresponding domain-specific knowledge sources should be sufficient to switch to another clinical application. In MedLEE, this separation is clearly emphasized [48, 81]. For the RECIT system, its architecture has taken advantage of the ambition, from the start, to develop an analyzer in a multilingual environment [68] (i.e. first applied to French, then adjusted to English, and to a minor extent, German). In order to minimize the development efforts to accommodate the RECIT system to another language, a modular structure has been implemented. This allows new rules to be inserted without disturbing the general computational mechanism implemented to select and apply them at the right time during the analysis process. Moreover, the declarative style in which this analyzer is written (using Prolog as the logical programming language [82]) ensures a large expressiveness for the set up and management of rules as well as for the design of a knowledge representation. Finally, European languages, although considered as different from a syntactic viewpoint (even if some analogies exist such as between French and English [83, 68]) allow the same concepts to be expressed. That is why a semanticdriven approach has been chosen, in order to take advantage of a single conceptual knowledge base, independent of any language and thus, accessible by any version of the multilingual RECIT analyzer. As a result, a Medical Linguistic Knowledge Base (MLKB) has been set up as a recipient for all the declarative knowledge used during the analysis of medical texts [52, 66]. Various guiding lines underlie this knowledge base, the main one occurring between the semantic part (domain-dependent but language-independent) and the syntactic part (domain-independent but language-dependent). The semantic part can typically be supplied with a concept model. The separation between processor and knowledge (and furthermore between different kinds of knowledge) is nevertheless not absolute. Indeed, the way of formulating specific information can be strongly dependent on the language style used as well as on the nature of the information to be communicated. This requires advance specification of the precise types of knowledge which will be manipulated during the analysis process, as well as their functionality. Finally, it is important to bridge the gap between the way information is expressed in the medical texts and the way it is represented at the conceptual level. The amount of attention paid to these previous points determines the depth of integration, that is to say, the required efforts to adjust NLP tools to existing concept models and vice versa. Exploiting Relevant Information From Concept Models: Determining the Depth of Integration Most existing medical knowledge sources have been developed with objectives other than their exploitation by NLP tools. Nevertheless, these sources embed categories of linguistic knowledge (as outlined at the lexical and conceptual levels) which are often applicable for NLP needs, as reviewed by the authors [13]. In particular, the MED and GALEN models enclose interesting though different features from strict NLP needs. MED provides a large vocabulary which encompasses the needs of ancillary clinical systems at the Columbia-Presbyterian Medical Center. However, semantic rules are not directly available but could be extracted from the frame concept nodes, which embed semantic links and attributes, and which describe significant contextual information. The GALEN model has a more restricted vocabulary coverage but presents a large set of directly available semantic rules expressed through the so-called “sensible statements”. Contextual information is only partially present in this system and has to be extracted from the set of criteria associated with relevant concepts. The experiments conducted with MedLEE [48] and RECIT [12] emphasize two different ways of integrating, in the analysis process, relevant information derived from a concept model, which is respectively the MED and the GALEN models. However, the MedLEE processor does not use MED as a direct source of conceptual knowledge, but rather as a “reference model”, useful to specify the structure of the analysis output which must map findings as modeled in the MED. For this, additional knowledge sources have been elaborated separately from the model (such as a formal semantic grammar and a lexicon, a mapping knowledge base, and a synonym knowledge base) which act as bridges between the language of the texts and the unique concepts in the controlled vocabulary as defined in the MED [48]. On the other hand, the RECIT system uses the GALEN model as a direct semantic source providing both the set of concepts which can be combined to form the analysis output (this being expressed in the formalism of conceptual graphs) as well as the sanctioning rules useful to check the pertinence of any medical language expression against the concept model. This integration process is presented below, highlighting the relevant pieces of conceptual information provided by such concept model and which are of direct use by the analysis process. Finally, the last piece of conceptual knowledge used by the RECIT analyzer deals with the specification of conceptual schemata. The latter ones are used in the second analysis phase to link the heading concept of the sentence with all the other concepts (highlighted in the sentence during the proximity processing phase), in order to produce the CG representation expressing the sentence meaning. As seen before, such information still needs to be extracted from the GALEN model. Conceptual modeling as implemented in a system like the framebase interlingua [39, 63] seems more appropriate for handling this kind of knowledge, as it explicitly describes the properties of concepts relative to a specific context. The first version of the RECIT system relied on a knowledge base built by the authors. But the limited size of such a domain knowledge base greatly reduced the capacity of the analyzer. That is why the idea to import as much conceptual knowledge as possible from the GALEN model emerged. This transfer has been acilitated by different factors from both the analyzer and the model sides. The above experiment has shown the different kinds of knowledge that a concept model like GALEN can provide for NLP needs. However, the main challenge is to stress the distinction between information as it is formulated in medical texts and as it is expressed in concept models. This entails mediation between the large expressiveness, permissibility, and impliciteness of natural language on the one hand, and the generality, granularity, and conciseness of the concept model on the other hand. Such a gap between the “language of the texts” and “the language of concepts” can be filled in by considering what linguistic information must be attached to the conceptual level in order to manage the analysis of medical texts. First, the typology of GALEN, by its high-level structure [61] corresponds quite well to the main partition initially implemented in RECIT, and which emphasized the distinctions between the actors, the medical events, the qualifiers, the values, and the modalities. These subdivisions are taken into account during the analysis process, especially for the triggering of relevant heading concepts, around which conceptual graphs can be built. In order to retain the analysis strategies, an alignment of the GALEN typology has been performed by specifying pointers at the highest possible levels. Second, a strong similarity was observed between the semantic part of compatibility rules as implemented in the RECIT system (and which are used during the proximity processing phase to link neighboring words together) and the GALEN sensible statements. Both aim at describing a relationship between each pair of sensible concepts, as shown in the following example: The last three semantic arguments of the NLP rules: compatibility_rule(#Number, Syntax, cl_Fracture, cl_Bone, ‘LOC’). are equivalent to the GALEN sensible statement: Fracture which hasLocation Bone. Such rules are quite general (i.e. not all the bones are candidates to be fractured) but are adequate to analyze sentences which are sensible per se, where the need of sanctioning arises essentially in presence of ambiguities. Moreover, taking into account other kinds of fractures (like fractures occurring in cartilages) will require additional statements to be specified. Third, the compositionality property of the GALEN model is largely exploited in the RECIT system to replace composite concepts by their definition (using the expansion operation defined with the conceptual graph formalism [50]), thus ensuring better results in future querying as information is decomposed into its primitive components. Bridging the Gap Between Reality and Abstraction Such syntactic attachments have been defined at different strategic points in the RECIT system. First, it is important to translate the model typology in the context of the analyzed texts. This is performed through the typology annotation which allows concepts to be annotated by words and expressions available in the different languages together with their syntactic properties. These annotations result in the creation of the dictionary as needed by NLP tools. Second, the application of the sensible statements to natural language expressions implies to clarify the syntactic structures supporting the expression of the concepts and the relationships in a specific language. For example, the sensible statement linking the concepts cl_Fracture and cl_Bone through the relationship rel_hasLocation can be instantiated by different expressions in English such as “ fractured femur”, “ fracture of the femur”, which are respectively supported by the syntactic structures: “adjective plus noun” and “noun plus noun complement”. Relying directly on the sensible statements as described in the GALEN model has permitted such syntactic constraints, initially specified for each compatibility rules (second argument of the clause compatibility_rule), to be defined at the level of the relationships, without loosing information. This syntactic information is specified for each relationship at the highest level which can always be refined by defining a more restrictive conceptual context. For example, the relationship rel_hasLocation can be annotated by the two above syntactic structures when used in the restrictive context occurring between the concepts cl_PathologicalCondition and cl_BodyStructure. These syntactic constraints, described as a syntactic annotation of relationships is also easier to maintain, as the number of relationships is greatly inferior to this one of sensible statements. Finally, another encountered problem was that linguistic relationships do not always fit with the conceptual relationships as specified in the GALEN model. Indeed, in order to link the expression “severe chest pain”, during the proximity processing phase, RECIT needs to check the presence of a sensible statement which specifies the relationship occurring between the concepts cl_ChestPain and cl_Severity. But the granularity of the GALEN model furnishes two sensible statements: Pain which hasSeverity Severity This results in a knowledge-oriented representation which adds new functionality by moving from data to concepts. Acknowledgments This work is supported by grant number 8220-046502 from the “Fonds National Suisse de la Recherche Scientifique”. Work on the generic frame schema was originally supported through NLM Contract N01-LM-6-3522. Severity which hasState SeverityState where the concept cl_ChestPain is a child of the concept cl_Pain. A combination of these two statements is necessary to deal with the impliciteness of natural language where the qualifier Severity is inherently embedded in its values. Such operation can be performed automatically by considering the transitivity between the relationships hasFeature (being an ancestor of hasSeverity) and hasState, to produce the following statement which applies directly to natural language input: Pain which hasFeatureState SeverityState. The specification of additional linguistic information above a “pure” concept model has proved to be a key solution, allowing natural language processor to smoothly integrate such conceptual information during the analysis process. For this, any concept model, specified through a modular and declarative structure, and providing at least the relevant inter-connected concepts as naturally found in medical texts, should be considered as potential conceptual source by NLP tools. Conclusions The experience of the authors’ group in managing models for medical concept representation, first by adjusting the RECIT analyzer to the GALEN model [12], then by recasting the generic interlingua frame system, initially developed by Miller, Masarie et al. [63], has reinforced our belief that a solid model of medical concepts must be developed and used for feeding the semantic components of a medical language processor. Integrating in the analysis process, all the basic conceptual components (from which the conceptual representation will be built) as well as the sanctioning mechanism (used to set up an accurate representation) rom a concept model, ensures a consistent follow-up of the analyzer as the concept model is evolving. The major constraint is that the success of the analysis process is then greatly dependent on the accuracy and efficiency of the model. This implies that developers should focus on a well-limited domain, answering to a well-specified goal, in order to yield concrete outcomes. Moreover, as natural language is, in essence, highly permissive and generative, by authorizing ambiguity and vagueness as well as neologism, it is all the more important to rely on a concept model that has the quality of being more restrictive while still preserving compositionality. Finally, linking the semantic components of a medical language processor to a concept model allows the combination of a usually top-down approach to define the general structure of concepts in a given domain with a bottom-up analysis of medical language texts. References 1. Evans DA, Cimino JJ, Hersh WR, Huff SM, Bell DS, for the Canon Group. Toward a Medical-concept Representation Language. J Am Med Informatics Assoc 1994, 1: 207-217. 2. Scherrer J-R, Côté RA, Mandil SH (eds). Computerized Natural Medical Language Processing for Knowledge Representation. Proceedings of the IFIP-IMIA WG6 International Working Conference, Geneva, Switzerland, 12-15 September, 1988. Amsterdam: Elsevier Science Publishers B.V. (NorthHolland), 1989. 3. McCray AT, Scherrer J-R, Safran C, Chute CG (eds). Special Issue on Concepts, Knowledge, and Language in HealthCare Information Systems (IMIA). Methods of Information in Medicine 1995, 34. 4. Evans DA, Ginther-Webster K, Hart M, Lefferts R, Monarch I. Automatic indexing using selective NLP and first-order thesauri. In: RIAO’91. Barcelona: Autonoma University of Barcelona, 1991: 624-644. 5. Bell DS, Pattison-Gordon E, Greenes RA. Experiments in Concepts Modeling for Radiographic Image Reports. J Am Med Informatics Assoc 1994, 1: 249-262. 6. Spackman KA, Hersh WR. Recognizing Noun Phrases in Medical Discharge Summaries: An Evaluation of Two Natural Language Parsers. In: Cimino JJ (ed). Proceedings of the 1996 AMIA Annual Fall Symposium (Formerly SCAMC). Philadelphia: Hanley & Belfus, Inc. 1996: 155-158. 7. Hersh WR, Campbell EH, Evans DA, Brownlow ND. Empirical, Automated Vocabulary Discovery Using Large Text Corpora and Advanced Natural Language Processing Tools. In: Cimino JJ (ed). Proceedings of the 1996 AMIA Annual Fall Symposium (Formerly SCAMC). Philadelphia: Hanley & Belfus, Inc. 1996: 159-163. 8. Evans DA, Brownlow ND, Hersh WR, Campbell EM. Automating Concept Identification in the Electronic Medical Record: An Experimant in Extracting Dosage Information. In: Cimino JJ (ed). Proceedings of the 1996 AMIA Annual Fall Symposium (Formerly SCAMC). Philadelphia: Hanley & Belfus, Inc. 1996: 388-392. 9. Hahn U, Schnattinger K, Romacker M. Automatic Knowledge Acquisition from Medical Texts. In: Cimino JJ (ed). Proceedings of the 1996 AMIA Annual Fall Symposium (Formerly SCAMC). Philadelphia: Hanley & Belfus, Inc. 1996: 383-387. 10. Baud RH, Lovis C, Alpay L, Rassinoux A-M, Scherrer J-R, Nowlan A, Rector A. Modelling for Natural Language Understanding. In: Safran C (ed). Proceedings of SCAMC 93. New York: McGraw-Hill, Inc. 1993: 289-293. 11. Friedman C, Cimino JJ, Johnson SB. A Schema for Representing Medical Language Applied to Clinical Radiology. J Am Med Informatics Assoc 1994, 1: 233-248. 12. Rassinoux A-M, Wagner JC, Lovis C, et al. Analysis of Medical Texts Based on a Sound Medical Model. In: Gardner RM (ed). Proceedings of SCAMC 95. Philadelphia: Hanley&Belfus, Inc., 1995: 27-31. 13. Baud RH, Rassinoux A-M, Lovis C, Wagner J, Griesser V, Michel P-A, Scherrer J-R. Knowledge Sources for Natural Language Processing. In: Cimino JJ (ed). Proceedings of the 1996 AMIA Annual Fall Symposium (Formerly SCAMC). Philadelphia: Hanley & Belfus, Inc. 1996: 70-74. 14. Ingenerf J. Taxonomic Vocabularies in Medicine: The Intention of Usage Determines Different Established Structures. In: Greenes RA et al. (eds). Proceedings of MEDINFO 95. Alberta: HC&CC, 1995: 136-139. 15. McCray AT, Hole WT. The Scope and Structure of the First Version of the UMLS Semantic Network. In: Miller RA (ed). Proceedings of SCAMC 90. Los Alamitos: IEEE Computer Society Press, 1990: 126-130. 16. McCray AT, Nelson SJ. The Representation of Meaning in the UMLS. In [3]: 193-201. 17. Friedman C, Huff SM, Hersh WR, Pattison-Gordon E, Cimino JJ. The Canon Group’s Effort: Working Toward a Merged Model. J Am Med Informatics Assoc 1995; 2: 4-18. 18. Wehrli E, Clark R. Natural Language Processing, Lexicon and Semantics. In: [3]: 68-74. 19. Rector A. Compositional Models of Medical Concepts: Towards Re-usable Application-Independent Medical Terminologies. In: Barahona P, Christensen JP (eds). Knowledge and Decisions in Health Telematics. IOS Press, 1994: 109-114. 20. The International Classification of Diseases, 9th revision, Clinical Modification. 2nd ed. Vols. 1-3. U.S. Department of Health and Human Services, September 1980. 21. Rothwell DJ. SNOMED-Based Knowledge Representation. In: [3]: 209-213. 22. “Medical Subject Headings - Annotated Alphabetical List”, National Library of Medicine, published annually. 23. Read J. The Read Clinical Classification. NHS Centre for Coding and Classification, Loughborough, UK, 1993. 24. Pryor TA, Gardner RM, Clayton PD Warner HR. The HELP system. J Med Syst 1983, 7(2): 87-102. 25. Barnett GO, Cimino JJ, Hupp JA, Hoffer EP. DXplain: An Evolving Diagnostic Decision-Support System. J Am Med Informatics Assoc 1987, 258: 67-74. 26. Masarie FE, Jr, Miller RA, Myers JD. INTERNIST-I Properties: Representing Common Sense and Good Medical Practice in a Computerized Medical Knowledge Base. Comput Biomed Res 1985, 18: 458-479. 27. Miller RA, Massarie FE, Jr. Use of the Quick Medical Reference (QMR) Program as a Tool for Medical Education. Meth Inform Med 1989, 28(4): 340-345. 28. Lindberg DAB, Humphreys BL, McCray AT. The Unified Medical Language System. Meth Inform Med 1993, 32: 281-291. 29. Sherertz DD, Tuttle MS, Blois MS, Erlbaum MS. Intervocabulary Mapping within the UMLS: The Role of Lexical Matching. In: Greenes RA (ed). Proceedings of SCAMC 88. Los Angeles: IEEE Computer Society, 1988: 201-206. 30. Huff SM, Warner HR. A comparison of Meta-1 and HELP terms: Implications for clinical data. In: Miller RA (ed). Proceedings of SCAMC 1990. Los Angeles: IEE Computer Society, 1990: 166-169. 31. Rocha RA, Huff SM. Using Digrams to Map Controlled Medical Vocabularies. In: Ozbolt JG (ed). Proceedings of SCAMC 94. Philadelphia: Hanley & Belfus, Inc., 1994: 172-176. 32. Miller RA, Gieszczykiewicz FM, Vries JK, Cooper GF. CHARTLINE: Providing bibliographic references relevant to patient charts using the UMLS Metathesaurus Knowledge Sources. In: Frisse ME (ed). Proceedings of SCAMC 1992. New York: McGraw Hill, 1992: 86-90. 33. McCray AT, Srinivasan S, Browne AC. Lexical Methods for Managing Variation in Biomedical Terminologies. In: Ozbolt JG (ed). Proceedings of SCAMC 1994. Philadelphia: Hanley&Belfus, Inc., 1994: 235-239. 34. Campbell KE, Musen MA. Representation of Clinical Data Using SNOMED III and Conceptual Graphs. In: Frisse ME (ed). Proceedings of SCAMC 92. New York: McGraw-Hill, 1992: 354-358. 35. Joubert M, Miton F, Fieschi M, Robert J-J. A Conceptual Graphs Modeling of UMLS Components. In: Greenes RA et al. (eds). Proceedings of MEDINFO 95. Alberta: HC&CC, 1995: 90-94. 36. Evans DA. Final Report on the MedSORT-II Project: Developing and Managing Medical Thesauri. Technical Report. Pittsburgh, PA: Laboratory for Computational Linguistics, Carnegie Mellon University, 1987. 37. Evans DA. Pragmatically-Structured, Lexical-Semantic Knowledge Bases For Unified Medical Language Systems. In: Greenes RA (ed). Proceedings of SCAMC 88. Los Angeles: IEE Computer Society Press, 1988: 169-173. 38. Miller RA. A Computer-based Patient Case Simulator. Clin Research 1984, 32: 651A. 39. Masarie FE, Miller RA, Bouhaddou O, Giuse NB, Warner HR. An Interlingua for Electronic Interchange of Medical Information: Using Frames to Map between Clinical Vocabularies. Comput Biomed Res 1991, 24(4): 379-400. 40. Cimino JJ, Clayton PD, Hripcsak G, Johnson SB. Knowledge-based Approaches to the Maintenance of a Large Controlled Medical Terminology. J Am Med Informatics Assoc 1994, 1: 35-50. 41. Rector AL, Nowlan WA, Glowinski A. Goals for Concept Representation in the GALEN project. In: Safran C (ed). Proceedings of SCAMC 93. New York: McGraw-Hill, Inc. 1993: 414-418. 42. Rector AL. Coordinating Taxomomies: Key to Re-usable Concept Representations. In: Barahona P, Stefanelli M, Wyatt J (eds). Proceedings of Artificial Intelligence in Medicine (AIME 95). Berlin: Springer, 1995: 17-28. 43. Rossi-Mori A, Bernauer J, PakarinenV, et al. CEN/TC251/PT003 models for representation of terminologies and coding systems in medicine. Proceedings of the Seminar: Opportunities for European and US Cooperation in Standardization in Health Care Informatics, Geneva, Switzerland, September 1992. 44. Cimino JJ. Use of the Unified Medical Language System in Patient Care at the Columbia-Presbyterian Medical Center. In: [3]: 158-164. 45. Alpay L, Baud RH, Rassinoux A-M, Wagner J, Lovis C, Scherrer J-R. Interfacing Conceptual Graphs (CG) and the Galen Master Notation (MN) for medical knowledge representation and modelling. In: Andreassen S, Engelbrecht R, Wyatt J (eds). Proceedings of Artificial Intelligence in Medicine 1993 (AIME 93). Amsterdam: IOS Press, 1993: 337-347. 46. Grishman R, Kittredge R. Analysing Language in Restricted Domains: Sublanguage Description and Processing. Hillsdale, NJ: Lawrence Erlbaum Associates, 1986. 47. Hirschman L, Sager N. Automatic Information Formatting of a Medical Sublanguage. In: Kittredge R, Lehrberger J (eds). Sublanguage: Studies of Language in Restricted Semantic Domains. Berlin: Walter de Gruyter, 1982: 27-80. 48. Friedman C, Alderson PO, Austin JHM, Cimino JJ, Johnson SB. A General Natural-language Text Processor for Clinical Radiology. J Am Med Informatics Assoc. 1994, 1: 161174. 49. Friedman C, Cimino JJ, Johnson SB. A Conceptual Model or Clinical Radiology Reports. In: Safran C (ed). Proceedings of SCAMC 93. New York: McGraw-Hill, Inc. 1993: 829-833. 50. Sowa JF. Conceptual Structures: Information Processing in Mind and Machine. Reading, MA: Addison-Wesley Publishing Company, 1984. 51. Rassinoux A-M, Juge C, Michel P-A, Baud RH, Lemaitre D, Jean F-C, Degoulet P, Scherrer J-R. Analysis of Medical Jargon: The RECIT System. In: Barahona P, Stefanelli M, Wyatt J (eds). Proceedings of Artificial Intelligence in Medicine (AIME 95). Berlin: Springer, 1995: 42-52. 52. Baud RH, Rassinoux A-M, Wagner JC, Lovis C, Juge C, Alpay LL, Michel P-A, Degoulet P, Scherrer J-R. Representing Clinical Narratives Using Conceptual Graphs. In: [3]: 176-186. 53. Lewis Carroll. Jabberwocky. Further details on this poem can be found at the URL http://www.iit.edu/~beberg/jabberwocky.html. See also http://www.math.luc.edu/~vande/jabfrench.html or http://www.math.luc.edu/~vande/jabgerman.html. 54. Allen J. Natural Language Understanding. Menlo Park, CA: The Benjamin/Cummings Publishing Compagny, 1987. 55. Shortliffe EH, Davis R, Axline SG, Buchanan BG, Green CC, Cohen SN. Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Comput Biomed Res 1975, 8(4): 303-320. 56. Warner HR, Haug P, Bouhaddou O, Lincoln M et al. ILIAD As An Expert Consultant to Teach Differential Diagnosis. In: Greenes RA (ed). Proceedings of SCAMC 88. Los Angeles: IEEE Computer Society, 1988: 371-376. 57. Quillian MR. Semantic memory. In: Minsky M (ed). Semantic information processing. Cambridge, MA: MIT Press, 1968: 227-270. 58. Minsky M. A framework for representing knowledge. In: Winston PH (ed). The psychology of computer vision. New York: McGraw-Hill, 1975: 211-277. 59. Brachman R, Schmolze J. An Overview of the KL-ONE Knowledge Representation System. Cognitive Science 1985, 9(2): 171-216. 60. Barr CE, Komorowski HJ, Pattison-Gordon E, Greenes RA. Conceptual Modeling for the Unified Medical Language System. In: Greenes RA (ed). Proceedings of SCAMC 88. Los Angeles: IEE Computer Society Press, 1988: 148-151. 61. Rector AL, Rogers JE, Pole P. The GALEN High Level Ontology. In: Brender J, Christensen JP, Scherrer J-R, McNair P (eds). Proceedings of Medical Informatics Europe ‘96 (MIE 96). Amsterdam: IOS Press, 1996:174-178. 62. Sowa JF. Toward the Expressive Power of Natural Language. In: Sowa JF (ed). Principles of Semantic Networks: Explorations in the Representation of Knowledge. San Mateo, CA: Morgan Kaufmann Publishers, 1991: 157-189. 63. Rassinoux A-M, Miller R A, Baud R H, Scherrer J-R. Modeling Principles for QMR Medical Findings. In: Cimino JJ (ed). Proceedings of the 1996 AMIA Annual Fall Symposium (Formerly SCAMC). Philadelphia: Hanley & Belfus, Inc. 1996: 264-268. 64. Pole PM, Rector AL. Mapping the GALEN CORE Model to SNOMED-III: Initial Experiments. In: Cimino JJ (ed). Proceedings of the 1996 AMIA Annual Fall Symposium (Formerly SCAMC). Philadelphia: Hanley & Belfus, Inc. 1996: 100-104. 65. Wigertz O, Hripscak G, Shasavar M, Bagenholm P, Ahlfeldt H, Gill H. Data-driven medical knowledge based sustems based on Arden Syntax. In: Barahona P, Christianson (eds). Knowledge and Decisions in Health Telematics. IOS Press, 1994: 126-131. 66. Baud RH, Lovis C, Rassinoux A-M, Michel P-A, Alpay L, Wagner JC, Juge C, Scherrer J-R. Towards a Medical Linguistic Knowledge Base. In: Greenes RA et al. (eds). Proceedings of MEDINFO 95. Alberta: HC&CC, 1995: 13-17. 67. Tuttle MS, Campbell KE, Olson NE et al. Concept, Code, Term and Word: Preserving the Distinctions. In: Gardner RM (ed). Proceedings of SCAMC 95. Philadelphia: Hanley&Belfus, Inc., 1995: 956. 68. Rassinoux A-M, Baud RH, Scherrer J-R. A Multilingual Analyser of Medical Texts. In: Tepfenhart WM, Dick JP, Sowa JF (eds). Proceedings of the Second International Conference on Conceptual Structures (ICCS 94). Berlin: Springer-Verlag, 1994: 84-96. 69. Baud RH, Lovis C, Rassinoux A-M, Scherrer J-R. Alternate Ways for Knowledge Collection, Indexing and Robust Language Retrieval. To appear in: Proceedings of the Fourth International Conference on Medical Concept Representation, Jacksonville, Florida, January 19-22, 1997. 70. Zweigenbaum P, Bachimont B, Bouaud J, Charlet J, Boisvieux J-F. Issues in the Structuring and Acquisition of an Ontology for Medical Language Understanding. In: [3] :15-24. 71. Lytinen SL. Frame selection in parsing. In: American Association for Artificial Intelligence. Proceedings of the third national conference on artificial intelligence (AAAI 84). Los Altos, CA: William Kaufmann, 1984: 222-225. 72. Binot J-L, Ribbens D. Dual frames: a new tool for semanting parsing. In: American Association for Artificial Intelligence. Proceedings of the fifth national conference on artificial intelligence (AAAI 86). Los Altos, CA: Morgan Kaufmann Publishers, 1986: 579-583. 73. Rocha RA, Rocha BHSC, Huff SM. Automated Translation Between Medical Vocabularies Using a Frame-Based Interlingua. In: Safran C (ed). Proceedings of SCAMC 93. New York: McGraw-Hill, Inc. 1993: 690-694. 74. Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ. Natural Language Processing and the Representation of Clinical Data. J Am Med Infomatics Assoc. 1994, 1: 142-160. 75. Sager N, Lyman M, Nhàn NT, Tick LJ. Medical Language Processing: Applications to Patient Data Representation and Automatic Encoding. In: [3]: 140-146. 76. Berrut C, Cinquin P. Natural language understanding of medical reports. In: [2]: 129-137. 77. Schröder M. Knowledge-based Processing of Medical Language: A Language Engineering Approach. In: Ohlbach H-J (ed). Proceedings of the Sixteenth German Workshop on AI (GWAI 92). Berlin: Springer-Verlag, 1992: 221-234. 78. Zweigenbaum P, Consortium Menelas. MENELAS: an access system for medical records using natural language. Comput Meth Prog Biomed 1994, 45:117-120. 79. Haug P, Koehler S, Lau LM, Wang P, Rocha R, Huff S. A Natural Language Understanding System Combining Syntactic and Semantic Techniques. In: Ozbolt JG (ed). Proceedings of SCAMC 1994. Philadelphia: Hanley&Belfus, Inc., 1994: 247251. 80. Lesmo L, Torasso P. Weighted Interaction of Syntax and Semantics in Natural Language Analysis. In: Joshi A (ed). Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI 85). Los Altos, CA: Morgan Kaufmann Publishers, 1985: 772-778. 81. Friedman C, Johnson SB, Forman B, Starren J. Architectural Requirements for a Multipurpose Natural Language Processor in the Clinical Environment. In: Gardner RM (ed). Proceedings of SCAMC 95. Philadelphia: Hanley&Belfus, Inc., 1995: 347-351. 82. Gazdar G, Mellish C. Natural Language Processing in PROLOG: An Introduction to Computational Linguistics. Workingham, England: Addison-Wesley Publishing Company, 1989. 83. Nhan NT, Sager N, Lyman M, Tick LJ, Borst F, Su Y. A Medical Language Processor for Two Indo-European Languages. In: Kingsland LC III (ed). Proceedings of SCAMC 89. Washington: IEEE Computer Society Press, 1989: 554-558.
© Copyright 2026 Paperzz