A Comparative Study: Automated Vs. Human Analysis of Two English Plays An MA Thesis Submitted to the Institute of Language Studies and Translation Faculty of Arts Alexandria University By: Mervat Mahmoud Ali Ahmed Under the Supervision of Prof. Zeinab M. Raafat Professor of English Literature Faculty of Arts Alexandria University Associate Prof. Sameh A. Ansary Associate Prof. of Computational Linguistics Department of Phonetics and Linguistics Faculty of Arts Alexandria University Acknowledgements Before all, I thank Allah for guiding me throughout my life, showing me the right path and giving me the strength to pursue my studies in a field so close to my heart. I wish to express my deepest gratitude and sincere appreciation to Prof. Zeinab M. Raafat. She has been and will always be my mentor. It has been an honour to work with her and learn from her academic expertise and personal dedication. I would also like to express my utmost thanks and appreciation to Prof. Sameh A. Ansary for his precious help, support and patience in supervision. His continuous guidance has been invaluable. I can never seize to learn from his expertise in the field of computational linguistics. I am deeply indebted to Prof. Hassan A. Taman who has opened my eyes and mind to the boundless realm of Applied Linguistics. His sincere love and dedication to his work was the main reason in pushing this research forward. I only wish he were present with us to make him proud of this work. May God bless his soul. I would like to thank my entire amazing family for their continuous love and support. I would like to thank Mohamed, my husband and soul mate, who has always been supportive and encouraging, putting up with my long working hours and mood swings. I would not have done it without him. I would also like to thank my mother, my brother Mohamed, my lovely sisters; Mai & Maha and my aunt for being there for me at all times. My thanks also go to all my friends and colleagues at work for their constant encouragement. i To my mother who has been and will always remain to be my anchor, my sail and my guiding star. Her belief in me and her constant prayers were the main reasons in pushing me as well as this work forward “to see the light”. ii Table of Contents ______________________________________________________________________________ List of Abbreviations …………………………………………………………………………...vii List of Tables …………………………………………………………………………………...viii List of Figures …………………………………………………………………………………..ix Abstract …………………………………………………………………………………………x Introduction ………………………………………………………………………………...... 1 Chapter 1 Theoretical Background: Computers and Text Analysis …………………… 4 1.1 Introduction …………………………………………………………………………….. 4 1.2 Argument for and against Computer-Aided Text Analysis ……………………………. 5 1.3 Natural Language Processing and Computational Linguistics ………………………… 9 1.3.1 Computational Syntax and Semantics ……....……………………………….. 10 Computational Discourse ……………………………………………………. 11 1.4 Concordances ………………………………………………………………………….. 12 Application Areas for Computational Linguistics …………………………………….. 14 1.3.2 1.5 1.5.1 Lexicography ………………………………………………………… 14 1.6 Overview of Computer-Aided Discourse Analysis …………………………………… 16 1.7 Overview of Computer-Assisted Stylistic Analysis …………………………………... 19 1.8 Comparing Human Analysis to Computer-Aided Analysis …………………………… 28 1.9 Corpus Linguistics …………………………………………………………………….. 28 1.10 CATA Software Selection ……………………………………………………………. 30 1.11 T-LAB Selection ……………………………………………………………………… 32 Introduction to T-LAB …………………………………………….. 32 1.11.2 T-LAB Pre-processing Steps ……………………………………… 35 1.11.1 1.11.2.1 Corpus Normalization and Disambiguation Operation … 35 1.11.2.2 Linguistic Dictionaries and Lemmatization ……………. 36 1.11.2.3 Corpus Segmentation ……………………………………. 37 iii 1.11.2.4 Multi-Word and Stop-Word detection …………….…. 38 1.11.2.5 Vocabulary building and Key-Terms selection …....... 39 Chapter 2 Data Analysis I: “She Stoops to Conquer” ………………………….…… 40 2.1 Introduction ……………………………………………………………….……….. 40 2.2 Part One: Human Analysis ……………………………………………….……….. 40 2.2.1 Register ………………………………………………………….………. 40 2.2.1.1 Marlow with Kate ………………………………………….….. 41 2.2.1.2 Kate with Marlow ……………………………………………... 42 2.2.1.3 Marlow with Mr. Hardcastle ……………………………….…... 44 2.2.1.4 Mr. Hardcastle with Marlow …………………………………... 47 2.2.2 Signs of Formality and Informality …………………………………….… 49 2.2.3 Dialect ……………………………………………………………………. 53 2.2.4 Repetition ………………………………………………………………… 56 2.2.5 Slang ……………………………………………………………………… 58 2.2.6 Naming ………...…………………………………………………………. 60 2.2.7 Figurative Language ……………………………………………………… 62 2.2.8 Archaic Language ……………………………………………………........ 66 2.3 Part Two: Computer-Aided Analysis ………………………………………………. 67 2.3.1 Introduction ………………………………………………………………. 67 2.3.2 “She Stoops to Conquer” CATA Results that agree with Human Analysis ..67 2.3.2.1 Register ……………………………………………………………68 2.3.2.2 Repetition …………………………………………………………74 2.3.2.3 Dialect and Slang …………………………………………………77 2.3.3 “She Stoops to Conquer” CATA Additional Semantic Contribution ……...80 2.3.3.1 Importance of Word Frequencies …………………………………80 2.3.3.1.1 Fortune and Marriage theme ……………………………80 2.3.3.2 Importance of Sequence Analysis Tool …………………………..85 2.3.3.2.1 The structure “But_a” …………………………………..85 2.3.3.3 Importance of thematic and cluster analysis ……………………..88 iv 2.3.3.3.1 Parents-Children Relationship theme ……………..88 Chapter 3 Data Analysis II: “The Caretaker” ……………………………………95 3.1 Introduction ………………………………………………………………………95 3.2 Part One: Human Analysis ……………………………………………………….95 3.2.1 Repetition ……………………………………………………………….95 3.2.2 Rhythm …………………………………………………………………99 3.2.3 Register ………………………………………………………………..101 3.2.4 Status Marked through Language ……………………………………..103 3.2.5 Pause and Silence ……………………………………………………...104 3.2.6 Turn-taking Technique ………………………………………………...108 3.2.7 Long Dialogues and Small Talk ……………………………………….109 3.2.8 Question Forms ………………………………………………………..111 3.2.9 Naming and Pronouns ……………..…………………………………..112 3.2.10 Stage Directions and Body Language …………………………………113 3.3 Part Two: Computer-Aided Analysis ……………………………………………116 3.3.1 Introduction ……………………………………………………………116 3.3.2 “The Caretaker” CATA Results that agree with Human Analysis …..116 3.3.2.1 Repetition ……………………………………………………116 3.3.2.2 Register ………………………………………………………122 3.3.2.3 Turn-taking and Interrogatives ………………………………128 3.3.2.4 Pause …………………………………………………………130 3.3.2.5 Silence ……………………………………………………….131 3.3.2.6 Signs of Formality and Informality ………………………….131 3.3.3 “The Caretaker” CATA Additional Semantic Contribution ………….136 3.3.3.1 Importance of Comparison between Word Pairs ……………136 3.3.3.2 Importance of List of Word Frequencies ……………………138 3.3.3.3 Importance of thematic and cluster analysis ………..……….143 v Conclusion ………………………………………………………………………….145 Appendix [1] ……………………………………………………………………......147 Appendix [2] ………………………………………………………………………..153 References ………………………………………………………………………….160 vi List of Abbreviations ________________________________________________________________________ AI CATA CALL CDA CL CU HMM HTML IR KWIC LU MT NLP OED SL SLT TRP UNLP Artificial Intelligence Computer-Assisted Text Analysis Computer Assisted Language Learning Critical Discourse Analysis Computational Linguistics Context Units Hidden Markov Models Hyper Text Markup Language Information Retrieval Key Word in Context Lexical Units Machine Translation Natural Language Processing Oxford English Dictionary Source Language Spoken-Language Translation Transition Relevance Place Universal Networking Language Project vii List of Tables ____________________________________________________________________________ Table (1): Table (2): Table (3): Table (4): Table (5): Table (6): Table (7): Table (8): Table (9): Table (10): Table (11): Table (12): Table (13): Table (14): Table (15): Table (16): Table (17): Table (18): Table (19): Table (20): Table (21): Table (22): Table (23): Table (24): Table (25): Table (26): Table (27): Table (28): Table (29): Table (30): Table (31): Table (32): Table (33): Word Net Semantic Relations ……………………………………………………...11 Country dialect in “She Stoops to Conquer” – Part 1 ……………………………...54 Country dialect in “She Stoops to Conquer” – Part 2 ……………………………...55 T-LAB word association tool: “Marlow” and “Hardcastle” in “She Stoops to Conquer” ……………………………………………………………………….......69 T-LAB Concordance of the lemma “Fellow” in “She Stoops to Conquer” ……….70 T-LAB word association tool: “Marlow” and “Madam” in “She Stoops to Conquer” …………………………………………………………………………..71 T-LAB word association tool: “Marlow” and “Child” in “She Stoops to Conquer” ……………………………………………………………………….......73 T-LAB word association tool: “Tony” and “Ecod” in “She Stoops to Conquer” …76 T-LAB Concordance of the lemma “Servant” in “She Stoops to Conquer” ………78 T-LAB Concordance of the lemma “Diggory” in “She Stoops to Conquer” ……...79 T-LAB Modeling of emerging themes tool: “Fortune” theme Part 1………………81 T-LAB Modeling of emerging themes tool: “Fortune” theme Part 2 ……………...83 T-LAB Modeling of emerging themes tool: “Fortune” theme Part 3 ……………...84 Syntactic analysis of “but a” structure in “She Stoops to Conquer” ……………….86 T-LAB Concordance of the structure “But a” in “She Stoops to Conquer” ……….87 T-LAB Modeling of emerging themes tool: “Age” theme in “She Stoops to Conquer” …………………………………………………………………………...89 T-LAB thematic cluster analysis in “She Stoops to Conquer” …………………….91 Davies dialect in “The Caretaker” ………………………………………………..101 Mick’s slang in “The Caretaker” …………………………………………………102 T-LAB Concordance of the lemma “Black” in “The Caretaker” ………………..117 T-LAB Modeling of emerging themes tool: “Jenkins” theme in “The Caretaker” ………………………………………………………………….119 T-LAB Modeling of emerging themes tool: “Sidcup” theme in “The Caretaker” ………………………………………………………………….121 T-LAB Concordance of the lemma “Davies” in “The Caretaker” ………………123 T-LAB Modeling of emerging themes tool: “Mick” theme in “The Caretaker” ..126 T-LAB word association tool: “Davies” and “Mick” in “The Caretaker” ………129 T-LAB word association tool: “Mate” and “Davies” in “The Caretaker”……….133 T-LAB Concordance of the lemma “Boy” in “The Caretaker” ………………….134 T-LAB Modeling of emerging themes tool: “Call” theme in “The Caretaker” …135 T-LAB word association tool: “Brother” and “Mick” in “The Caretaker” ………137 T-LAB word association tool: “Davies”, “Brother” and “Mick” ………………..138 T-LAB Modeling of emerging themes tool: “Bed” theme in “The Caretaker” …140 T-LAB Modeling of emerging themes tool: “Good” theme in “The Caretaker” ..141 T-LAB thematic cluster analysis in “The caretaker” …………………………….143 viii List of Figures ____________________________________________________________________________ T-LAB automatic lemmatization …………………………………………………37 T-LAB word association tool: “Marlow” in “She Stoops to Conquer” …………68 T-LAB sequence analysis tool with “Marlow” in “She Stoops to Conquer” ...... 71 T-LAB sequence analysis tool with “Ecod” in “She Stoops to Conquer” ……...74 T-LAB tool for comparison between word pairs: “Tony” and “Fellow” in “She Stoops to Conquer” ……………………………………………………..75 Figure (6): T-LAB pie chart: Percentage of “Ecod” with “Tony” and “Fellows” in “She Stoops to Conquer” …………………………………………………………75 Figure (7): T-LAB key contexts for thematic words tool in “She Stoops to Conquer” …….78 Figure (8): T-LAB word association tool: “Fortune” in “She Stoops to Conquer” …………80 Figure (9): T-LAB bar chart: Percentage of “happiness” with “love” and “fortune” in “She Stoops to Conquer” …………………………………………………………85 Figure (10): T-LAB sequence analysis tool with “But_a” in “She Stoops to Conquer” ………86 Figure (11): T-LAB word association tool: “Black” in “The Caretaker” …………………….116 Figure (12): T-LAB sequence analysis tool with “Name” in “The Caretaker” ………………118 Figure (13): T-LAB sequence analysis tool with “Pause” in “The Caretaker” ………………130 Figure (14): T-LAB sequence analysis tool with “Silence” in “The Caretaker” …………….131 Figure (15): T-LAB word association tool: “Mate” in “The Caretaker” ……………………..132 Figure (16): T-LAB tool for comparison between word pairs: “Brother” and “Mick” in “The Caretaker” ……………………………………………………………………….136 Figure (17): T-LAB bar chart: Percentage of “Worry” with “Brother” and “Mick” in “The Caretaker” ………………………………………………………………………137 Figure (18): T-LAB sequence analysis tool with “Bed” in “The Caretaker” ……………….. 139 Figure (1): Figure (2): Figure (3): Figure (4): Figure (5): ix Abstract _____________________________________________________________________________ For centuries, text analysis has been depending on human analysts. We have been depending entirely on human intuition in analyzing written and spoken discourse. However, suddenly this started to change 50 years ago. With the advent of technology and the invasion of computers to many fields, linguists started to ask a new question: Why don’t we use computers in text analysis? So, since the 1960s this new trend of using computers in text analysis started taking place. However, as it is the case with all new ideas and methods, many linguists started attacking it. Many linguists had doubts: Can computer software really assist in text analysis? If yes, to what extent? Even if it helps with normal texts, can it help in a stylistic analysis of the literary genre? So, in this thesis, I am trying to answer a vital question. Can computer software assist in the stylistic analysis of literary texts today? Now in order to answer such question, a text analysis software, called T-LAB, is used to assist in a stylistic analysis of two literary texts and then compare the output to a pure human stylistic analysis of the same texts carried out beforehand. The two literary texts used for analysis are two English plays. The first is the 18th century “She Stoops to Conquer” by Oliver Goldsmith, while the second is the 20th century “The Caretaker” by Harold Pinter. x
© Copyright 2026 Paperzz