Leximancer Tutorial 6 - Analysing Interview Data Interview transcripts can be analysed as normal text, and if you group interviews into files and folders, you can use Folder Tagging (see Tutorial 4) to enhance your analysis. However, if your transcripts are suitably formatted, Leximancer can let you select, ignore, or compare all the utterances of each distinct speaker. The appropriate format applies to plain text or word documents. To allow Leximancer to identify the speaker of any text segment, the speaker must be identified in a certain way whenever any new speaker begins. The format requires dialog markers which: are at the start of a paragraph; use upper case first letters for each constituent term; are made up of a maximum of three terms; and end in a colon followed by a space. For example: James: The vast rift in the crust was a confusing vista of velvet shadow, shattered granite, and radiant furnaces. Lightning flickered along the horizon, outlining the jagged towers of Ryleh in stark contrast. The giant winged beasts wheeled ever upwards on thermals driven by lurid magma pits. Dr. Anna Jones: How did you find the entrance to the Gate of Ages? Narrator: At this point, James fell to his knees, his face transfigured by a look of heart-sick desolation. .... Given text data in this format, Leximancer can extract the dialog markers as tag classes and identify the speaker of every subsequent sentence until the next dialog marker. To enable this function, first go to the main Leximancer menu and enable: Menu Complexity -> Advanced When you have chosen your text data (by double-clicking on the File Selection node), open the settings for the Preprocess Text node (by double clicking on that node). In the bottom left part of that screen, enable the Label Identification function: Label Identification -> Dialogue This will transform each dialog marker in the text into a tag, which is then inserted into each relevant sentence. Just above the Label Identification setting is another setting called Language Testing. If your data is quite colloquial and may not use standard stop-word usage, set Language Testing to OFF. This filter is most useful for prose interspersed with non-textual material, such as web pages. You should also consider whether the average length of each utterance in the text data is shorter than two sentences. If so, you should open the settings for the Automatic Concept Identification node and reduce the setting called Sentences per Context Block to 2 sentences, or even 1 sentence, as appropriate. Now run the two nodes called Preprocess Text and Automatic Concept Identification (by right clicking on the Automatic Concept Identification node and selecting Start). After those nodes have finished processing, you can open the Concept Editor node (by double clicking on it) and see the extracted textual concept seeds and the extracted dialog tag classes. You can see that a tag class has been extracted for each dialog marker. You can now run the Thesaurus Learning node, by right clicking on it and selection Start. After the Thesaurus Learning node has finished processing, you can then select whose utterances you wish to analyse, whose you wish to leave out, and what items you want on the map. These setting are performed in the second last node, which is called Locate Concept Occurrences. The tabs called Required Classes and Kill Classes let you select the utterances you wish to analyse, and those you wish to leave out. For example, if you wanted to leave out all the utterances of the Narrator from your analysis, you would move the tag class called TG_NARRATOR_TG into the Kill Classes list on the left (see below). This causes all utterences by the Narrator to only by tagged with this narrator tag. You should also leave the tag class TG_NARRATOR_TG out of the Entities list if you don't want it to appear on the map. Next, you should select which tag classes and concepts you want to appear on the map. You may want to inspect the relative ownership of the textual concepts between Dr Anna Jones and James. To achieve this, you must select the Entities tab in the settings for the Locate Concept Occurrences node. By default, there are two items in this list (on the left): *CONCEPTS and *NAMES. These are wildcards which match all the wordbased concepts and all the name-based concepts respectively. In the picture above, you can see (in the two lists on the right hand side) all the word-like and name-like concepts which are available in the current thesaurus for this project. Since you don't want the TG_NARRATOR_TG item to appear on the map, you need to remove the *NAMES wildcard from the Entities list. Instead, just insert the two desired tags into the entities list: TG_DR_ANNA_JONES_TG and TG_JAMES_TG. You can also see several name-based concepts, such as James and Gate_of_Ages in the list. These occur because speakers mention these names. You need to decide whether you are interested in these variable for this analysis. I will leave them in for now. Looking at the list of word-based concepts indicates that all these will be wanted on the map, so you can leave the *CONCEPTS wildcard to do do its job. The final Entities list looks as follows: Once these changes have been made, you can run the final two nodes to produce your map.
© Copyright 2026 Paperzz