Chapter 4 4. Analysis of Linguistic Features Associated with Point of View for Generating Stylistically Appropriate Text Nancy L. Green Nancy L. Green University of North Carolina at Greensboro Dept. of Mathematical Sciences University of North Carolina at Greensboro Greensboro, NC 27402-6170 USA Email: [email protected] Abstract We describe a qualitative analysis of a corpus of clinical genetics patient letters. In this genre, a single letter is intended to serve multiple functions and is designed for multiple audiences. The goal of the analysis was to identify stylistically-related features for a natural language generation system. We found that, perhaps because of the multiple intended functions and audiences, within a single letter more than one writing style (set of realization choices) can be observed, and the sets of features are associated with different perspectives. Thus, an NLG system must take perspective into account to generate stylistically appropriate text in this application. The paper outlines the perspectives and the features associated with each that were identified in the corpus. Keywords: clinical genetics, patient letters, style analysis, natural language generation, perspective, point of view. 1. Introduction We are studying a corpus of clinical genetics patient letters written by genetic counselors to their clients. According to Baker et al. (2002), the typical patient letter, one to two pages in length, summarizes the counselor's meeting with the client. At a meeting the counselor may provide information on the client's case (e.g., test results, diagnosis of a genetic disorder, prediction of genetic risks), counseling to cope with the potential emotional effects of the information, as well as explanations of genetics concepts relevant to the client's case. While the client is the addressee of the letter, intended secondary audiences include family members and (in case the client is the parent or guardian of a pediatric patient) staff members at the patient's school or daycare. In addition, the letter is intended to provide medical documentation for healthcare providers. These audiences differ in background (e.g., expert or layperson), in information needs (e.g., a description 34 ANALYSIS OF LINGUISTIC FEATURES ASSOCIATED WITH POINT OF VIEW FOR STYLISTIC TEXT GENERATION of patient symptoms to support a medical diagnosis or to provide information for caregivers), and in their emotional relationship to the patient (e.g., a parent or someone not personally involved with the patient). The motivation for our study of the corpus, unlike most of the other papers in this volume, is generation rather than interpretation. We wish to identify stylistically-related features to guide linguistic realization and content selection in a natural language generation (NLG) system for genetic counselors. The system will generate the first draft of a patient letter using general information about clinical genetics and specific information about the patient's case. Previous NLG research on stylistic variation has viewed style as a constant property within a document and as defining a genre (Hovy, 1990; DiMarco and Hirst, 1993). After informal review of letters in the corpus, we noted that, perhaps because of the multiple intended functions and audiences, within a single letter (and in some cases within a single sentence) more than one writing style can be observed. Our hypothesis is that each style (i.e., coherent set of realization choices) is associated with a different perspective assumed by the writer, e.g., a counseling perspective addressing the client’s emotional state or a medical perspective serving a documentation function. For example, in sentence (2) below the writer uses the referring doctor's perspective in reporting the reason for the referral to the author's clinic. (The number in parentheses identifies the sentence; the letter's identifier, VCF, is given in parentheses at the end of the excerpt. In the corpus, capitalized words in brackets have been substituted for original text to maintain client confidentiality but convey the gist of the original text. In this domain, proband refers to the person who is the focus of a genetic study, i.e., the patient.) (2) [DOCTOR] asked us to evaluate [PROBAND] to determine if [HIS/HER] delays in development and [SPECIFIC TYPES OF BIRTH DEFECT] were due to a recognizable genetic condition. (letter VCF) When speaking from the referring doctor's perspective, the writer's description of the patient's symptoms is precise and uses words that may have negative connotations to the addressee (the patient's parent), e.g., a description of the specific types of birth defects and use of the term delays. In contrast, when the writer assumes the genetic counselor's perspective, the wording is designed to mitigate the possible negative effect of the information on the addressee. A key stylistic choice expressing the voice of the counselor in sentence (14) below is use of the value-free or nonstigmatizing phrase altered form instead of mutation (Baker et al., 2002). (14) [PROBAND] could have inherited an altered form of a gene from both you and [HIS/HER] father that caused [HIS/HER] birth defects and learning problems. (letter VCF) In summary, we claim that in addition to a representation of what must be said, our NLG system must take perspective into account in order to be able to generate stylistically appropriate text in this application. This paper justifies the claim by outlining a set of perspectives and some of the features potentially associated with each that we have identified by qualitative analysis of the corpus. 2. Perspectives in Corpus Based upon a review of letters in the corpus and information on genetic counseling, e.g., (Wilson, 2000), we have identified the following perspectives: • author: the letter writer, i.e., a genetic counselor writing on behalf of a genetics clinic. This voice can be distinguished from the voices that we call genetic counselorr and clinic. For COMPUTING AFFECT AND ATTITUDE IN TEXT: THEORY AND APPLICATIONS • • • • • • 35 example, in the author's voice, formulaic expressions are used (e.g., We hope this information is helpful), which are not used in parts of a letter representing those other perspectives. client: the person(s) who met with the counselor and who is (are) the principal addressee(s) of the letter, usually the patient or some member(s) of the patient's family. This perspective is taken to document discussion initiated by the client at the meeting (e.g., You expressed concern that …) as well as to enable the writer to include information for the medical record although it is already known to the client (e.g., As you know, [DOCTOR] first saw [PROBAND] at eight months…). referring doctor: the doctor who referred the patient to the clinic (e.g., [DOCTOR] asked us to evaluate …). This perspective is used to document the referring doctor’s findings and tentative diagnosis, with which the clinic need not agree. clinic: genetics clinic with which the genetic counselor is affiliated and that was visited by the client. This voice is used to document what was done to a patient (e.g., We obtained a blood sample …), or told to the client (e.g., We have recommended …) during the visit. genetic counselor: the genetic counselor who met with the client(s), who is also the letter writer. This perspective is used in discussing patient-specific information such as the diagnosis or a family member’s inheritance risks in terms that the client can understand and that mitigate the potential negative effect of the information (e.g., It is important to remember that [PROBAND'S] problems could still be caused by genetic alteration…). education: basic background knowledge about human genetics. For example, this perspective is used to explain the role of genes in health and how genes are inherited (e.g., In autosomal dominant inheritance, only one altered gene is needed for the person to have the condition. This gene can come from either the mother or the father…). research: information from the clinical genetics research literature (e.g., Most children [with osteogenesis imperfecta] have fragile bones, blue sclera, ...). Although originally developed for the automated analysis of narrative (Wiebe, 1994), and later applied to analysis of attitude in newspaper articles (Wilson and Wiebe, 2003), the model of psychological point of view (POV) provides a framework for our own study. That model defines a private-state relation whose components include an experiencer, an attitude, and the objectt of the private state. For example in sentence (2, VCF) repeated below, the experiencer, identified as [DOCTOR], is the referring doctor, the attitude could be interpreted as believes it likely that, and the object corresponds to what is expressed as the proband's delays in development and [SPECIFIC TYPES OF BIRTH DEFECT] T were due to a recognizable genetic condition. (2) [DOCTOR] asked us to evaluate [PROBAND] to determine if [HIS/HER] delays in development and [SPECIFIC TYPES OF BIRTH DEFECT] were due to a recognizable genetic condition. (3) During your appointment on [DATE], we obtained a blood sample from [PROBAND]. (4a) In addition to the routine chromosome study, (4b) in which a microscopic study of the 46 chromosomes is done, (4c) a special analysis of the long arm of chromosome 22 (22q11) (4d) by a technique called fluorescence in situ hybridization (FISH) 36 ANALYSIS OF LINGUISTIC FEATURES ASSOCIATED WITH POINT OF VIEW FOR STYLISTIC TEXT GENERATION (4e) was done to test for Velocardiofacial syndrome (VCF). (5) Individuals with VCF often have [SPECIFIC TYPES OF BIRTH DEFECT] and learning problems. (letter VCF) This excerpt illustrates several other points. As noted in (Wiebe, 1994), experiencer and attitude need not be stated explicitly. In (3), the experiencer, signaled by we, is the clinic and the attitude could be interpreted as knowledge shared by experiencer and addressee. In (4a), the experiencer could be interpreted as the clinic again, although it was not explicitly signaled; (4a) continues (Wiebe, 1994) the experiencer of the current POV. However, we claim that the explanatory information provided in (4b) and (4d) is the voice of the genetic counselor and the attitude for those phrases could be interpreted as knowledge that the experiencer believes the addressee does not share with the experiencer. This change in attitude is associated with a shift in tense; the explanatory information in (4b) and (4d) is presented in the present tense while the rest of (2) through (4), a narration of the patient's referral, history and clinic visit, is presented in the past tense. Finally, the experiencer in (5) is the research perspective. This change in experiencer is marked also by a shift to the present tense. 3. Associated Features Table 1 shows, for each perspective defined above, some associated features that we have identified by manual inspection of the corpus. The second column lists the typical forms used for referring to each type of experiencer. Note that according to the table, first person plural pronoun forms such as we are used to refer to several categories of experiencer. The third column lists typical forms for referring to individuals other than the experiencer. For example, the education and research perspectives are characterized by reference to generic individuals instead of to members of the client's family. According to Baker et al. (2002) the strategy of conveying information about a patient indirectly by using general terms (e.g., Children with this condition tend to lose their hearing, instead of Nisha is likely to lose her hearing) can be used by the writer to mitigate the negative impact of the information on the client. The fourth column lists verb tenses characteristic of each perspective. The fifth column lists forms for conveying probability, and is discussed below. The last column lists other associated features, including characteristic open-class words and word patterns. For example, several perspectives can be distinguished on the basis of use of expert biomedical terminology in contrast to use of more layperson-oriented terminology, e.g., use of the geneticist's term allele instead of the layperson-oriented copy. In addition to this distinction, some perspectives can be characterized by use of value-free or nonstigmatizing language. COMPUTING AFFECT AND ATTITUDE IN TEXT: THEORY AND APPLICATIONS Experiencer author client referring doctor Reference to experiencer pronoun (1p-plural), self-reference to letter (e.g., this letter) reference to family members by name or pronoun (2p, 3p) doctor's name genetics clinic pronoun (1p-plural) genetic counselor pronoun (1p-plural) education agentless passive (e.g., it is believed that) research Reference others reference family members name pronoun (2p, 3p) to Tense to present or past (time of clinic visit) by or 37 Probability formulaic language (e.g., it was a pleasure), position near beginning and end of letter client’s knowledge or questions (e.g., you asked whether, as you know) past (time of clinic visit) reference to family members by name or pronoun (2p, 3p) reference to family members by name or pronoun (2p, 3p) reference to family members by name or pronoun (2p, 3p) past (before clinic visit) implicit (e.g., due to) referral verbs (e.g., referred by), expert biomedical terminology, nonvalue-free words clinic’s actions (e.g., we gave you, we obtained), expert biomedical terminology past (time of clinic visit) present or future Other cues qualitative (e.g., could, it appears that), Mendelian ratio (e.g., a 50% chance) qualitative, Mendelian ratio emphasis (still, it is important), valuefree words (e.g., alteration instead of mutation), layperson-oriented biomedical terminology layperson-oriented biomedical terminology, called (e.g., a gene called GJB2) reference to habitpopulation ual (e.g., the present parents, the or mother) or future universal (e.g., we, everyone) qualitative, expert biomedical reference to habitagentless ual quantitative terminology (e.g., population passive present has been (e.g., individuals) reported) Table 1. Types of features characterizing each perspective. In a previous study of this corpus (Green, 2003), we manually tagged both qualitative and quantitative indicators of probability. Examples of qualitative indicators are modal verbs (e.g., 38 ANALYSIS OF LINGUISTIC FEATURES ASSOCIATED WITH POINT OF VIEW FOR STYLISTIC TEXT GENERATION can, could), frequency adverbs (e.g., often), and quantifiers (e.g., many). Quantitative indicators are phrases containing numeric expressions (e.g., rates, odds, percentages), possibly with qualifiers (e.g., approximately 80%.). That study determined that the ratio of probability cues to the number of sentences was high, which is not surprising due to the inherent uncertainty in human genetics. Column five of Table 1 shows the types of probability cues associated with each perspective. Qualitative cues are used in all perspectives characterized by explicit use of probability terms. The cues that we call Mendelian ratios, i.e., the idealized ratios of a Mendelian inheritance model (e.g., 0%, 25%, 50%, 75%, and 100%) are characteristic of the education perspective (in explanations of inheritance patterns) and in the genetic counselor perspective (in explaining inheritance patterns that occur in the client's family). Presence of a quantitative, nonMendelian probability value (e.g., 6%), seems to be a good indicator of the research perspective, since the original source of information would have been from empirical studies published in the research literature. 4. Implications for Natural Language Generation and Automatic Recognition of Point of View An NLG system for a domain such as this must take perspective into account in order to be able to generate stylistically appropriate text, regardless of whether perspective is considered in generating text from "first principles", or whether it is "compiled into" quasi-textual building blocks. Otherwise, for example, information needed for medical documentation purposes might be realized in layperson-oriented terminology that is unsuitable for its intended function, or information intended for a parent might be realized in obscure-sounding medical terminology that fails to consider the emotional impact on the parent. Even when a generator uses precompiled "building blocks" (Hirst et al., 1997), if the generator is not informed of the perspective represented by each building block, then subsequent transformations such as text aggregation or referring expression construction could produce phrasing that mixes perspective infelicitously. In contrast to our work, most of the other projects described in this volume have goals related to automatic recognition of point of view in text. Despite the difference in motivation, our qualitative analysis can be seen as a possible step towards automatic recognition of point of view in clinical genetics-related documents. It seems likely one could build a classifier to predict perspective based on features like those that we have identified. The classifier might be used, for example, in a question-answering system with access to a heterogeneous collection of text, e.g., patient medical records and general patient education material on genetic disorders. 5. Acknowledgments This work is supported by the National Science Foundation under CAREER Award No. 0132821. 6. Bibliography Baker, D.L., Eash, T., Schuette, J.L., and Uhlmann, W.R. (2002) Guidelines for Writing Letters to Patients. Journal of Genetic Counseling, 11 (5), 399-418. DiMarco, C. and Hirst, G. (1993) A Computational Theory of Goal-Directed Style in Syntax. Computational Linguistics, 19 (3), 451-500. COMPUTING AFFECT AND ATTITUDE IN TEXT: THEORY AND APPLICATIONS 39 Green, N. (2003) Towards an Empirical Model of Argumentation in Medical Genetics. In Proceedings of IJCAI 2003 Workshop on Computational Models of Natural Argument (CMNA03). 39-44. Hirst, G., DiMarco, C., Hovy, E., and Parsons, K. (1997) Authoring and Generating Healtheducation Documents that are Tailored to the Needs of the Individual Patient. In Proceedings of User Modeling 1997. Hovy, E. (1990) Pragmatics and Natural Language Generation. Artificial Intelligence 43, 153-197. Wiebe, J. M. (1994) Tracking Point of View in Narrative. Computational Linguistics 20 (2), 233288. Wilson, T. and Wiebe, J. (2003) Annotating Opinions in the World Press. In Proceedings of the 4th SIGDial Workshop. Wilson, G.N. (2000) Clinical Genetics: A Short Course. Wiley-Liss.
© Copyright 2026 Paperzz