Feature Based Gene Summary Extraction with Re-ranking Samir Gupta Computer and Information Sciences University of Delaware Newark, DE 19716 USA [email protected] Abstract Due to the vast availability of bio-medical literature, searching medical databases for information about genes is getting problematic and cumbersome. Searching PubMed with a gene name as query, returns thousands of results, including irrelevant ones. Gene Ontology(GO) and UniProtKB databases provide an indication of relevant terms associated with a gene but is not enough for a quick understanding of the different properties of the gene. Besides these are manually written and curated which is both labor-intensive and time consuming. Automatically generating summaries for gene would help biologists to get an overall picture about the gene quickly. In this paper we adapt generic feature-based extractive summarization techniques and augment it with biomedical domain specific features. We also use the concept of “novelty” to reduce the redundancy in the extracted summary. Our results show inclusion of domain specific features and “redundancy removal” improve the content of the summary significantly. 1 been made to construct databases such as EntrezGene(Maglott et al., 2005) , Gene Ontology2 and UnitProtKB3 , which provides “important information” about a gene. But these database are manually created and require curation and regular updates which is labor-intensive. This necessitates the development automatic gene summary extractor. In this paper we describe an approach which expands on the generic features used in summary extraction by including domain specific features. Feature based summary extraction techniques were explored by (Edmundson, 1969; Kupiec et al., 1995) for generic domains. We augment these features with certain domain specific features like presence of gene name and certain biological cue phrases. We also use a variant of Maximal Marginal Relevance(Carbonell and Goldstein, 1998) to reduce the “redundancy” in the final summary. We use different modules of eGIFT(Tudor et al., 2010), a gene information mining tool, to extract the set of abstracts relating to a gene, compute “descriptive words”, and extract gene name variations. The major contributions of this paper are: • Applying the generic features used by Edmundson(1969) to the biomedical domain. • Augmenting the generic features with biomedical domain specific features. • Using terms provided by Gene Ontology and UnitProtKB medical databases to re-rank sentences based on “information novelty”. Introduction Biomedical databases like PubMed(McEntyre and Lipman, 2001) and BioMed Central1 are expanding rapidly and contain millions of articles. Due to this vast amount of information, biologists spend a large amount of time searching and reading articles to find relevant information. One such “information need” which life scientists look for is gene-specific information. A quick overview of the different properties, functions and other aspects of a gene would be very useful. Efforts have 1 http://www.biomedcentral.com/ 2 Approach In this section we discuss the details of the the gene summarization system. The input to this system is a gene identifier, same as the one used in the eGIFT system (Tudor et al., 2010). Given a gene 2 3 http://www.geneontology.org http://www.uniprot.org/help/uniprotkb identifier, we first extract a set of abstracts from Medline. The retrieval of relevant abstracts for a given gene is done using the eGRAB (Extractor of Gene-Relevant ABstracts) module of eGIFT. The eGRAB module considers all gene names, synonyms, and aliases, to query the Medline database and return a set of abstracts for the given gene. Each sentence in the set of abstracts is scored based on a number of features. A subset of these features like term frequency, sentence position, presence of title words and sentence length, are similar to the ones used in (Edmundson, 1969; Kupiec et al., 1995). In addition to these, we use features like the presence of the gene name and certain biological phrases to adapt the generic techniques to the biological domain. After several iterations on some test genes, we manually assigned weights to each of the features and compute a final score for a sentence. The top ranking sentences are then selected to be included in the summary. We have also explored the notion of “information novelty” to reduce redundancy across the sentences to be selected. This approach is based on Maximal Marginal Relevance(MMR) model used in (Carbonell and Goldstein, 1998), but the difference lies in the computation of “novelty” and how it is used. Based on the MMR model we re-rank a subset of the sentences returned by the featuredbased system. In the next two subsection we will discuss in details the features and the re-ranking system. 2.1 Computing Sentence Importance The set of abstracts returned by eGRAB mdoule are preprocessed and segmented into sentences. A set of features are used to score and compute the importance of the sentences. Based on the weighted score, the sentences are ranked and top ranking sentences included in the summary. The first four features are used by generic extractive summarizers. We have added two new features which are more specific to the bio-medical domain. We name the system using the first four features as System-A. System-A will help us understand if and how well generic approaches adapt to particular domain. We hope to see a significant difference and improvement when the last two domain specific features are added(System-B). Sentence Position Feature This features encodes positional information about a sentence in an abstract. Sentence position can be one of the following: title, first, last and middle sentence. As argued in early work of extractive summarization by Edmundson(1969), first and last sentences are typically important than other sentences. Thus higher scores are assigned to first and last sentence positions as opposed to middle or title sentence positions. Title Words Feature This feature assigns a score between 0 and 1 to a sentence based on the presence of title words in the sentence. The title of the abstract are decomposed into words, the words stemmed. These words are regarded as “descriptive words” and each sentence is scored based on the frequency of occurrence of title words in them. The score is divided by the length of the sentence and then normalized. Sentence Length Feature Kupiec et al.(1995) used sentence length as one of the feature for summarization. In their implementation the feature was true if the sentence length was above a certain threshold, thereby giving less importance to very short sentences. In our system, we have used a low and a high threshold is used to assign low scores to very short or very long sentences. Very long sentences alongwith containing some relevant information contain unnecessary information(noise, we argue should also be given a low score. This helps us to select short sentences in which “noise” is minimal and thus is more informative to the user. It also helps us in the second phase - ”the re-ranking” step, by allowing more relevant and novel sentences to be selected. Frequency Based Feature This feature is used to assign a score to the sentence between 0 and 1, indicating the presence of “descriptive words” in the sentence. Most of the early works in the area of summarization used term frequency and its variations to identify the most descriptive words of a document. Term Frequency*Inverse Document Frequency (TF*IDF) has been used in the field of Information Retrieval(Salton and Buckley, 1988; Jones, 1972) as a measure of computing “descriptive words” in a document. We use eGIFT’s(Tudor et al., 2010) iTerm scores, a variant of TF*IDF weights to extract “descriptive words” in a set of abstracts relating to a gene. eGIFT automatically computes and associates informative term, iTerms with a gene based on frequency information from a set of abstracts returned by eGRAB module, which is called the About Set for the gene. It assigns scores to unigrams and bigrams, excluding stop-words, as well as a set of bio-medical terms that we extracted from different knowledge bases, including EntrezGene, Gene Ontology, NCBI Taxonomy, UMLS, and MeSH that matched in text. The terms are converted to base-form for scoring purposes. Each term is assigned a score depending on its frequency in the About Set, contrasted with it’s frequency in Background Set. The background set is the set of all abstracts in the bio-medical database. For each term t, a score s(t) is assigned as follows: s(t) = ( dfa (t) dfb (t) Nb − ) ∗ ln( ) Na Nb dfb (t) where dfa (t) and dfb (t) are the number of abstracts containing term t in the About Set for the gene and the Background Set, respectively, and Na and Nb are the total number of abstracts in these two sets. The difference between the normalized document frequencies dfNa (t) − dfNb (t) rewards terms a b occurring more frequently in the About Set and b ln( dfNb (t) ) penalizes very frequent terms in all documents. An important thing to note is that eGIFT considers document frequency as opposed to term frequency in a specific document. This is because, iTerms are “descriptive terms” across a set of abstracts and not a single document and thus yields better relevance of term to a gene. Given the score for each term a set of top ranking informative terms or iTerms are computed for gene. We score each sentence in the About Set of a gene by considering the occurrences of the iTerms and its score. The final score is divided by the number of words in the sentence and normalized. Gene Feature The abstracts returned by the eGRAB module are related to the gene, whose summary is to be extracted. This feature indicates the presence of the gene name in the sentence. The sentences may or may not contain the gene name, which might be used as an indicator of the sentence’s importance. This features assigns a score of 1 to sentences which contains the gene name and 0 otherwise. This boosts the score of sentences containing the gene name in them. A gene in bio-medical literature is referred by several names, abbreviations. For example the SMAD2 has variations such as Smad family member 2, smad-2, madr2, xsmad2 etc. eGIFT provides certain APIs which given a gene identifier returns all the variations of the gene name. It uses official names of genes provided by Entrez Gene(Maglott et al., 2005), synonyms, and word sense disambiguation techniques to return the different variations. Biological Cue Phrase Feature This features assigns a score between 0 and 1 depending upon the presence of certain phrases in the sentence. This approach is based on the fact that certain phrases in a document indicates sentence importance. Authors of technical documents follow certain writing styles, using certain phrases to indicate important relations between different entities in text. These writing styles are domain dependent and require study of the documents to identify them. We argue that phrases are more important than others to indicate a sentence important as they convey very strong relations between the entities in text. EntrezGene(Maglott et al., 2005) contains manually created summaries for some of the genes. We did a preliminary study of the human written summaries from Entrez, in-order to understand, what types of information is typically conveyed in a summary. We identified several aspects which are covered almost in every summary. • ATTRIBUTE: The different properties/attributes associated with a gene. • FAMILY: Gene family the gene belongs to. • FUNCTION: The various biological functions or processes the gene is involved in. • DOMAIN: The domains the gene contains. • INTERACTION: The interaction of this gene with other gene or proteins. • DISEASE: Diseases caused by this gene. These aspects were found to span multiple sentences or different aspects mentioned in a single sentence. For the purposes of this paper we explored the first three aspects. In next paragraph we examine first three aspects in some details and discuss the biological phrases associated with each. ATTRIBUTE: A gene typically has some wellknown properties which need to be captured in a summary. These are typically isA relations between a gene and a noun phrase. For example, sentence fragments like, “.. groucho proteins are transcriptional corepressors ..” and “.. groucho homolog tle-4 , a corepressor ..” both indicate the gene groucho is a corepressor. Thus for this as- pect we look for phrases like “is a”, appositives and relative clauses. The pattern should be immediately preceded by the gene in question for this feature to be considered. each bio-feature in a sentence is added and the scores normalized. FAMILY: Almost all gene belongs to a family of genes, which share certain common characteristics. Including the family information, helps biologists to ascertain certain important attributes of the gene. For example, sentence fragments like, “The Drosophila Groucho (Gro) protein is the defining member of a family of metazoan corepressors ..”, “Groucho (Gro) is the founding member of a family of transcriptional co-repressor..” indicate that grocho belongs to a family of gene which are corepressors. For this aspects we look for phrases like “belongs to” and “member of”. Similar to the above patterns, this pattern should be immediately preceded by the gene in question for this feature to be considered. Gene summary should contain as much diverse information as possible, thereby reducing the redundancy of information, while maintaining maximal relevance to the gene. As the number of abstracts in the About Set for a gene is very large in number, sentences extracted based only in feature scores may contain high amount of redundant information. Hence the removal of information is necessary, hence redundant sentences should not be selected when producing the final summary. The main intuition behind this method is based on Maximal Marginal Relevance (MMR)(Carbonell and Goldstein, 1998). A sentence which is “similar” to a sentence already selected should be penalized. A weighted combination of the “feature score” and “novelty score” is used to make selected maximally diverse and maximally relevant sentences to a gene. Algorithm 1 provides the pseudo-code for the re-ranking systems. Our re-ranking system takes as input the set of ranked sentences returned the featured based method discussed in section 2.1. For every selected sentence a set of important terms is computed. These include GO terms and UniProtKB keywords. Gene Ontology (GO)project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium member. The ontology covers three domains: cellular component, molecular function and biological process. The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. UniProtKB gene entries are tagged keywords relating to the gene. Instead of considering and minimizing similarity between two sentences as used in MMR, we compute “novel score’ for each sentence. When a sentence is selected, the GO terms and UniProt Keywords are added to the set seletedT erms. The novel score for a sentence is assigned based on the number of new GO terms and UniProt Keywords that is contained in the sentence. The final FUNCTION: Most of the sentences in the human written summaries contain this aspect. These indicate the different biological processes and functions the gene is involved in, required for etc. These are typically mentioned with different aspects, for example typically followed after an INTERACTION apsect. Identifying the different functions of a gene is very important and sentences which mention such kind of relations should be included in a summary. From the following sentence fragments we can determine easily that groucho is related to the biological functions such as notch signaling, segmentation and neural development. Examples: “Groucho is a transcriptional repressor implicated in notch signaling..”, “.. Groucho .. involved in neural development and segmentation in drosophila”, “Groucho is required for Drosophila neurogenesis, segmentation..” and “that Gro/TLE proteins play a role in the repression of target genes”. We look for the highlighted phrases mentioned in the above sentences when assigning this bio-feature. The gene may not immediately precede the pattern for this aspect, but further the gene from the phrase, the lower the score. Each sentence in the About Set for a gene is searched for the mentioned patterns. The sentence should also contains the gene name. The “lexical distance’ between gene mention and the pattern/phrase is considered while assigning the score for this feature. The distance should be small for FAMILY and ATTRIBUTE aspects, and may be longer for the FUNCTION aspect. The score for 2.2 Re-Ranking based on Novelty Input: Set of Ranked Sentences Set D Tuning parameter : λ Output: Set of Re-Ranked Sentences R selectedT erms ← empty; rerankedSents ← empty; while D is not empty do foreach sentence s in the set D do f Scores ← feature score for s; extract GO Terms for s; extract UniProt Keywords for s; add extracted terms to currT ermss ; newT ermss ← dif f (currT ermss , selectedT erms); nScores ← novelScore(newT ermss ); score ← λ∗f Score+(1−λ)∗nScore; end determine sent s0 for which scores0 is max; delete s0 from D; add s0 to R; add newT ermss0 to selectedT erms; end return R; Algorithm 1: Novelty Based Re-Rank Table 1: Features Based Ranking: Summary Phrases Matches System A System B Improvement SMAD2 3 5 66.7% VPS35 2 3 50.0% BRI1 3 2 -33.3% BAG3 0 3 NA% LTBP2 2 3 50.0% KAT2A 2 3 50.0% score is a weighted depending on a user-tunable parameter λ. The sentence with the highest final score is added to set of re-ranked sentences and deleted from the original ranking. Finally the GO terms and UniProt Keywords are added to the set selectedT erms. A λ value closer to 1 will yield a relevance based ranking while λ value closer to 0 will retrieve a novelty based ranking. When the initial rank set of sentences is empty the algorithm stop and yield a new ranking of sentences. 3 Results In this section we present the results of our evaluation. We used six genes for evaluation purposes. EntrezGene Summary for these genes were used as the gold set. We measured the number of phrase in the extracted sentences which matched with the phrases in the summary. While matching phrases we also considered the relation between the phrase and the gene. A phrase in extracted summary sentence was said to matched if it matched to a phrase in the gold set and had the same relation with gene as in the gold set. For example for gene kat2a a summary sentence is: “KAT2A, or GCN5, is a histone acetyltransferase (HAT) that functions primarily as a transcriptional activator.”. ReRanking system with λ = 0 extracted the following sentence : ‘‘histone acetyltransferases ( hats ) such as gcn5 play a role in transcriptional activation .” The phrase “transcriptional activation” is marked as matched because its has the same relation with the gene i.e. same function. Figure 1 shows the matching phrases for the gene smaad2 in the summary extracted from the feature based system. System A refers to output generated by using only generic features while System B refers to the output generated by adding the bio-domain specific features. The matched phrases are shown as bold text. Figure 2 shows matching phrases in the summary extracted by the re-ranking system with lambda = 0, 0.3and0.7. A lambda value closer to 0 indicated more importance to “information novelty”. Table 1 shows the comparison between System A and System B with respect to number of phrase matches each system achieved. The last column indicates the improvement of System B over System A i.e. improvement after adding bio-domain specific features. The results indicate adding domain specific features increase the phrase matches and thus improving the summary content. Table 2 shows the number of matched phrases for the re-ranking system over different values of λ. The first column with λ = 1 is the same as System B in table 1. In the evaluation of the re-ranking system we have the used th set of ranked sentences returned by System B only. The results indicate λ value closer to 0 yields the best results for most of the genes. For example for the gene bri1 the set of summary sentences : “BRI1 ligand is brassinolide which binds at the extracellular domain. Binding results in phosphorylation of the kinase domain which activates the BRI1 protein leading to BR responses”. is accurately captured by the reranker system (with λ = 0) sentence : “brassinosteroids ( brs ) bind to the extracellular domain of the receptor kinase bri1 to activate a signal trans- Table 2: Novelty Based Re-ranking: Summary Phrases Matches λ = 1 λ = 0.9 λ = 0.7 λ = 0.3 λ = 0 Max Improvement over System B SMAD2 5 5 5 5 7 20.0% VPS35 3 3 4 3 2 33.3% BRI1 2 2 2 3 4 50.0% BAG3 3 3 3 3 2 0.0% LTBP2 3 2 3 4 4 33.3% KAT2A 3 2 2 2 2 0.0% References duction cascade that regulates nuclear gene expression and plant development.” A similar example occurs for the gene smad2 with extracted sentence: “activated tbetari phosphorylates smad2 , which then heterodimerizes with smad4 , translocates into the nucleus , and subsequently effects gene transcription .” which perfectly captures the a set summary sentences(refer fig1). [Carbonell and Goldstein1998] Jaime Carbonell and Jade Goldstein. 1998. The use of mmr, diversitybased reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336. ACM. 4 [Edmundson1969] H. P. Edmundson. 1969. New methods in automatic extracting. J. ACM, 16(2):264– 285, April. Conclusion We combine generic features for computing sentence with certain bio-medical domain specific features like presence of gene name and biological cue phrases. We also use GO terms and UnitProtKeywords as a “novelty measure” to re-rank sentences and remove “information redundancy”. Our evaluation suggests that bio-medical features and “redundancy removal” augmented system extract much more informative summaries. One of the problems of these extractive approaches is the presence of noise in addition to relevant information in the extracted sentences. For example consider a extracted summary sentence for smad2: “second , the role of smad 2 , an intracellular mediator of activin and tgf-beta , in oocyte maturation was investigated”. Only the highlighted fragment is relevant and there is no need to include the entire sentence. In future, we hope that the biological relation patterns discussed in section 2.1 will helps us to determine only the “relevant” portions of a sentence. These patterns will helps us create an intermediate representation of the set of sentences like “smad2 [isA] intracellular mediator OF(activin)”. Instead of just extracting representative sentences from the About Set, these relations will helps us generate phrases and move toward abstractive summarization. We could combine different relations in a single depending on certain causal links like, INTERACTION aspect followed by FUNCTION aspect. [Jones1972] Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11–21. [Kupiec et al.1995] Julian Kupiec, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’95, pages 68–73, New York, NY, USA. ACM. [Maglott et al.2005] Donna Maglott, Jim Ostell, Kim D. Pruitt, and Tatiana Tatusova. 2005. Entrez gene: gene-centered information at ncbi. Nucleic Acids Research, 33(suppl 1):D54–D58. [McEntyre and Lipman2001] Johanna McEntyre and David Lipman. 2001. Pubmed: bridging the information gap. Canadian Medical Association Journal, 164(9):1317–1319. [Radev et al.2002] Dragomir R. Radev, Eduard Hovy, and Kathleen McKeown. 2002. Introduction to the special issue on summarization. Comput. Linguist., 28(4):399–408, December. [Salton and Buckley1988] Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5):513–523. [Tudor et al.2010] Catalina O Tudor, Carl J Schmidt, and K Vijay-Shanker. 2010. egift: Mining gene information from the literature. BMC bioinformatics, 11(1):418. Gene : SMAD2 Entrez Summary: The protein encoded by this gene belongs to the SMAD, a family of proteins similar to the gene products of the Drosophila gene 'mothers against decapentaplegic' (Mad) and the C. elegans gene Sma. SMAD proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways. This protein mediates the signal of the transforming growth factor (TGF)-beta, and thus regulates multiple cellular processes, such as cell proliferation, apoptosis, and differentiation. This protein is recruited to the TGF-beta receptors through its interaction with the SMAD anchor for receptor activation (SARA) protein. In response to TGF-beta signal, this protein is phosphorylated by the TGF-beta receptors. The phosphorylation induces the dissociation of this protein with SARA and the association with the family member SMAD4. The association with SMAD4 is important for the translocation of this protein into the nucleus, where it binds to target promoters and forms a transcription repressor complex with other cofactors. This protein can also be phosphorylated by activin type 1 receptor kinase, and mediates the signal from the activin. System A (without Bio-Features) System B(With Bio-Features) smad2 overexpression suppressed osteocalcin mrna expression in phosphorylation-dependent activation of the transcription factors ros17/2.8 cells . smad2 and smad3 plays an important role in tgfbeta-dependent signal transduction . tgfbeta signaling is initiated when the type i receptor phosphorylates the mad-related protein , smad2 , on c-terminal serine residues . we report that smad2 , a transcription factor activated by tgfbeta , mediates tgf-beta induction of enos in endothelial cells . mad-related genes on chromosome 18q21.1 are altered infrequently in escc . identification of smad2 , a human mad-related protein in the transforming growth factor beta signaling pathway . activation of transforming growth factor-beta ( tgf-beta ) receptors triggers phosphorylation of smad2 and smad3 . conclusions : the results suggest that mutation of smad2 does not play a key role in human stomach carcinogenesis . cells that lack smad2 may escape from tgf-beta-mediated growth inhibition and promote cancer progression . second , the role of smad 2 , an intracellular mediator of activin and tgf-beta , in oocyte maturation was investigated . phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent signal transduction . thus , heteromeric complex formation of smad2 with smad4 is required for nuclear translocation of smad4 . furthermore , we observed a strong correlation between sustained smad2 phosphorylation and resistance to tgf-beta1-mediated growth inhibition . evidence that smad2 is a tumor suppressor implicated in the control of cellular invasion . Figure 1: Feature-Based Ranked Summaries for SMAD2 for System A and B λ = 0.0 1. phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent signal transduction . 2. second , the role of smad 2 , an intracellular mediator of activin and tgf-beta , in oocyte maturation was investigated . 3. smad2 and smad3 are signalling proteins that are involved in mediating the transcriptional regulation of target genes downstream of transforming growth factor-beta and activin receptors . 4. activated tbetari phosphorylates smad2 , which then heterodimerizes with smad4 , translocates into the nucleus , and subsequently effects gene transcription . 5. identification of smad2 , a human mad-related protein in the transforming growth factor beta signaling pathway . 6. xmad2 , a recently identified tgf-beta signal transducer , forms a complex with the transcription factor in an activin-dependent fashion to generate an activated are-binding complex . 7. ligation of the t cell receptor complex results in phosphorylation of smad2 in t lymphocytes . λ = 0.3 1. phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent signal transduction . 2. second , the role of smad 2 , an intracellular mediator of activin and tgf-beta , in oocyte maturation was investigated . 3. smad2 and smad3 are signalling proteins that are involved in mediating the transcriptional regulation of target genes downstream of transforming growth factor-beta and activin receptors . 4. identification of smad2 , a human mad-related protein in the transforming growth factor beta signaling pathway . 5. thus , heteromeric complex formation of smad2 with smad4 is required for nuclear translocation of smad4 . 6. ubiquitination of smad2 is a consequence of its accumulation in the nucleus . 7. xmad2 , a recently identified tgf-beta signal transducer , forms a complex with the transcription factor in an activin-dependent fashion to generate an activated are-binding complex . λ = 0.7 1. phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent signal transduction . 2. second , the role of smad 2 , an intracellular mediator of activin and tgf-beta , in oocyte maturation was investigated . 3. identification of smad2 , a human mad-related protein in the transforming growth factor beta signaling pathway . 4. thus , heteromeric complex formation of smad2 with smad4 is required for nuclear translocation of smad4 . 5. we report that smad2 , a transcription factor activated by tgf-beta , mediates tgf-beta induction of enos in endothelial cells . 6. conclusions : the results suggest that mutation of smad2 does not play a key role in human stomach carcinogenesis . 7. evidence that smad2 is a tumor suppressor implicated in the control of cellular invasion . Figure 2: Re-Ranked Summaries for SMAD2 with λ = 0, 0.3, 0.7
© Copyright 2026 Paperzz