Utilising Leximancer to characterise selected journals based on abstracts from 2007
Marion Burford, University of New South Wales
Abstract
The development of text-mining software such as Leximancer provides the opportunity to
conduct contextualised interpretive content analysis. This paper describes an exercise in
content analysis, providing insight into the processes and outcomes of Leximancer 'automatic
discovery of latent concepts' analysis of sets of marketing journal abstracts. The process
involved characterising the abstracts of each journal through detection of the main emergent
concepts. This was followed by identification of the journal's concepts contextualised within
the group of abstracts as a whole. In this relatively simple exercise, Leximancer proved a
powerful and time efficient tool for iterative content analysis of textual data.
Introduction
There is an emerging use of text-mining software within content analysis methodology
(Leong, Ewing and Pitt, 2002; Lloyd and Mortimer 2005; Pattinson, 2005). This paper seeks
to increase understanding of how qualitative interpretive approaches to analysis of text can
prove useful, particularly in interrogating large text-based data sets. The central question
guiding this paper is: How is a text-mining programme, in this instance Leximancer, used
iteratively to characterise selected marketing journals through content analysis of sets of
abstracts from the year 2007?
A range of software analysis and mapping programs, TextAnalyst, NVivo, TACT, NetMap,
Copernic Summarizer and Grokker (Pattinson, 2005), are available for text- mining and
content analysis. Leximancer in particular, developed at the University of Queensland,
provides the facility to develop an understanding of data without prior ‘coding’ and allows for
interactive visual mapping of resultant concepts and their inter-relationships. The resulting
content analysis is both conceptual and relational in nature (http://www.leximancer.com/
accessed 10th June, 2008).
In papers researching journals in the marketing discipline methods ranged from traditional or
conventional quantitatively-based content analysis, to interpretive qualitatively-based
approaches (Ahuvia, 2001; Krippendorff, 2004) The focus of these papers included issues
such as influences through citation analysis (Baumgartner and Pieters, 2003), usefulness of
marketing theory (Randall, Miles and Randall, 2004), subject specific investigations (Lloyd
and Mortimer, 2005), methodological orientations (Hanson and Grimmer, 2005; Saunders and
Lee, 2005; Svensson and Wood, 2007), company websites (Jalkala and Salminen, 2005) and
publication advice for academics (Stewart, 2002; Bauerly, Johnson and Singh, 2006; Perry,
Carson and Gilmore, 2008; Sawyer, Laran and Xu, 2008).
For the purpose of this study three journals have been selected: the Journal of Marketing
(JM), the European Journal of Marketing (EJM) and the Journal of Marketing Education
(JMEd). These journals were sampled to provide a range of discipline-related contexts. JM is
considered the most ‘influential’ of the discipline’s journals and is ranked first in the sub-area
of ‘marketing applications’ (Baumgartner and Pieters, 2003). EJM is the leading journal in
Europe (Svensson and Wood, 2007), ranked 17th in ‘level of influence’ and 12th in the
1
‘marketing applications’ sub-area (Baumgartner and Pieters, 2003). Both JM and EJM have
been evaluated in papers such as Hanson and Grimmer (2005) and Svensson and Wood
(2007). JMEd was included as it is the leading journal in the ‘marketing education’ sub-area
and ranked 24th overall for ‘level of influence’ (Baumgartner and Pieters, 2003).
Content analysis themes that emerge for each journal reflect its character as shaped by the
discipline it serves, and how its positioning has evolved and developed. Positioning is based
on differences in their corpus, in editorial advice to authors, in the ‘conversation’ undertaken
to gain publication (Perry, Carson and Gilmore, 2008), the content of individual papers and
the context of the research reported. An abstract is a structural element that distils and
encapsulates the main thrust of each academic paper within the journal. What is required in an
abstract differs for each of the journals in the sample. JMEd specifies only a word limit for the
abstract (Journal of Marketing Education: Manuscript submission guidelines, 2008). JM
requires the abstract to substantively summarise the article (Journal of Marketing: Manuscript
submission and preparation guidelines, 2008). The EJM requires a structured abstract
describing the paper using at least four out of six listed elements (European Journal of
Marketing: Author guidelines, 2008). Abstracts generally are concept-dense. Viewed
collectively they summarise the main themes, providing a window to the nature of their
respective journals.
Content Analysis
Content analysis is a contested concept. Traditional content analysis relies on the use of, or
the development of, a taxonomy or coding lexicon that captures content in a systematic and
reproducible fashion, as demonstrated by Saunders and Lee (2005). This traditional view has
been extended to include interpretation (Ahuvia, 2001) and to acknowledge the context of the
'codes' used in counting methods (Lloyd and Mortimer, 2005). The interpretive approach
(Graneheim and Lundman, 2003) has less reliance on 'codes' as it draws on 'latent' content
within the text. It relies on the researchers’ development of connotative meaning; extending or
reinterpreting that meaning from different perspectives.
Examples of content analysis methodology identified by Krippendorff (2004) include
discourse analysis, rhetorical analysis, ethnographic and conversational analysis. Other terms
used to identify particular methods used are descriptive (Zamitat, 2006), reception based
(Ahuvia, 2001) and interactive-hermeneutic (Krippendorff, 2004). Scott and Smith (2005)
state that content analysis is a systematic, replicable technique compressing many words of
text into fewer content categories based on explicit rules of coding. More recently developed
text analysis approaches have been linked with content analysis methodology as a means of
understanding content categories found in large databases that might otherwise be intractable
to traditional methods (Leong, Ewing and Pitt, 2002; Lloyd and Mortimer, 2005).
Validity and reliability are issues common to all methodology (Scott and Smith, 2005). In
traditional content analysis with 'manifest' coding of surface content, these issues are
addressed through the use of multiple trained coders and extensive code books (Krippendorff,
2004). Such coding is a laborious task, distancing the researcher-interpreter from the process.
Given the cost, possibility of coder error, and the effort required, the advantages of machine
coding become clear. Software helps to address validity and reliability issues, through its
ability to develop machine-learned dictionaries. The concept thesaurus captures the meaning,
allowing for different descriptors for the same concept or 'meaning unit'. Automating the
'manifest' coding process supports 'latent' content analysis by allowing the emphasis to be
1
placed on interpretation. 'Latent' content analysis is the 'analysis of the underlying meaning or
symbolism of a communication' (Scott and Smith, 2005, p.88). Such interpretation and
contextualization of the content is considered to be descriptively richer and has greater (face)
validity than traditional content analysis methods (Krippendorff, 2004).
Leximancer
The problems, highlighted above, in traditional content analysis and the increasing need to
handle larger amounts of data efficiently have led to the increasing use of text-mining
technology such as Leximancer to support interpretive content analysis. Leximancer can be
set to automate several of the content analysis functions such as the development of a
dictionary and thesaurus, the display of the visual conceptual structure, and the tracing of the
concepts or 'meaning units' back to their original text location (Scott and Smith, 2005;
Zimitat, 2006; Steel, 2007). Further, Leximancer allows the resultant concepts to be edited,
adapted through user-defined concepts, applied to other data sets and to be explored for interrelationships (Scott and Smith, 2005; Steel, 2007). A comparison of manual and automatic
coding is reported in the Grech, Horberry and Smith (2002) article, which looks at the extent
to which human error was a factor in maritime accidents. They found a close proximity
between hit rates for machine coded and manually coded results, supporting the reliability and
time efficiency of the automated process.
Leximancer-generated results can be interpreted through each concept's thesaurus, the spatial
map (or semantic patterns, Watson, Smith and Watter, 2005), the concept lists (absolute count
and relative frequency) and the co-occurrences of the concepts. The relationships are based on
an algorithm that evaluates the 'weighted sum of the number of times two concepts are found
in the same 'chunk'' (Davies et al, 2004, p.35). This enables identification of relative
importance (frequency of occurrence), links between concepts and differing levels of
abstraction through the control over the number of concepts found during analysis or revealed
through manipulation of the map.
Method
The approach used was to download abstracts from ProQuest and clean them to remove
extraneous information, creating data sets for each journal within a folder. Automatic concept
analysis of each journal individually was undertaken before they were analysed together.
Abstract specific structural concepts, such as author, were systematically removed before the
concept learning stage. Each separate analysis was assigned a project name and selected
output was copied to word documents to building an overview.
The unit of analysis is the journal. Abstracts for each paper published in 2007 were
downloaded and saved to three separate text files [JM, EJM, JMEd] within a folder. Each
abstract was tagged to identify the paper e.g. 'JEUM_2007_41_1AND2_Hanson_Grimmer:'.
Titles were kept within the text as they contain relevant content. A tag identifier in
Leximancer ':', was converted to ','. All other bibliographic information was removed. All of
the structured subheadings found in European Journal of Marketing papers (37 of 75, approx
50%) were removed as they duplicated abstract text. Individual data sets contain differing
numbers of abstracts, thus the 'combined' output is dominated by EJM and JM.
Leximancer 2.25 (http://www.leximancer.com/) was used on the default settings except for
changes made to accommodate data in abstract form. 'Menu complexity' was set on advanced,
'Make folder and file tags' was changed to on, 'sentences per context block' reduced to 2,
1
'learning threshold' or resolution was expanded to 21 and 'word classification setting' was
reduced to 2.3. Concepts removed after running the 'Automatic concept identification' were
article, authors, findings, marketing (an overarching concept), paper and study. The merged
plurals were consumer/s, customer/s, effect/s, firm/s and relationship/s.
Results of Process
The following is a summary of the main concepts only, giving a brief insight into the
possibilities for this approach to content analysis. Leximancer generated maps of concepts,
detailed listings of co-occurrences of concepts, concept thesaurus, and examples of 'meaning
units' within the text were some of the data collected in each iteration's overview. Space
constraints preclude their inclusion in this paper though they are an integral part of the
interpretive process. A concept or label represents sets of words that are consistently found in
proximity in the text. Each is understood through iterative interrogation of all parts of the
Leximancer output. Leximancer given the same data and program settings will reproduce the
same output, supporting the designers claim for reliability in coding text data.
The following tables list the top five concepts generated from 'automatic concept learning'
(after editing). Table 1 reports the profiles of each journal analysed separately.
Table 1: Automatic Concept Learning - individual journal
Top five concepts with absolute (AC) and relative count % (RC)
Journal/s
EJM
AC /
JM
AC /
RC
RC
Abstracts
74
50
AC
112
51
Concept
resea rch
100%
effects
100%
relationship
88%
customer
96%
trust
71%
consumers
94%
consumers
67%
firm
92%
literature
52%
relationship
75%
Total
27
19
analysed separately
JMEd
AC /
RC
25
students
business
learning
education
teaching
19
55
100%
55%
40%
24%
22%
Table 2 reports concept profiles that emerge when the three sets of abstracts are considered
together. This allows for cross journal influence on concept occurrence e.g. JMEd reveals its
sub-specialization ('students' and 'undergraduates') when analysed along side JM and EJM.
Table 2: Automatic Concept Learning – combined analysis (JM, EJM, JMEd)
Top five concepts, each Journal as a ‘concept’ - absolute, relative count % (AC / RC)
Journal/s TG_EJM_TG AC / TG_JM_TG AC / TG_JMEd_TG AC /
RC
RC
RC
Abstracts
74
50
25
AC
430
219
88
Concept
research
25%
consumers
20%
students
38%
trust
20%
expertise
18% undergraduate
30%
consumers
16%
stock
17%
literature
7%
partners
15% performance 16%
data
7%
model
13%
partners
15%
analysis
6%
Total
26
26
22
1
The identified concepts in Table 1 differ from Table 2 as the context or focus has changed.
'Consumers' in EJM and JM reported in Table 1 are not exactly equivalent concepts as they
have a thesaurus or word list that reflects the content within each individual journal. However
'consumers' in Table 2 is the same concept with a thesaurus developed from the content of all
three journals.
The following summary is provided as an example of profiling of the main themes
characterising each journal:
•
The European Journal of Marketing published 75 papers in 2007. It had a strong emphasis
on research. Trust and relationships, the consumer, and discussion of related literature
were common. EJM closely parallels JM in terms of received methodology.
•
The Journal of Marketing published 50 papers in 2007. In terms of focus there was a
stronger emphasis on the consumer and customers which emerged as two separate entities.
There was also a focus on issues around the performance of a firm. It has closer links to
EJM than to JMEd.
•
The Journal of Marketing Education published 25 papers in 2007. There was a noted
emphasis on undergraduate students and learning, reflecting its sub- area specialisation.
Methodologically JMEd had the closest links to EJM with the inclusion of more
qualitative research papers.
Reflection
Leximancer proved to be a very powerful tool in identifying concepts that populate the three
selected sets of journal abstracts. The summary tables provided above represent only a small
amount of the data that was, or could be, generated. It was impossible in a paper such as this
to reflect the depth of understanding I developed through iterative interrogation of the
Leximancer map, the concepts, the thesaurus, the co-occurrences connecting related concepts
and links to the text data. Iterative analyses led to a greater understanding of the text as a
whole rather than just simple codes or categories. A series of analysis runs exploring and
questioning the text could be carried out quickly and systematically. The ability to see the
interrelationships of the revealed concepts aided in the interpretation of the texts. The
capability to explore was seductive - opening up a level of interaction with the data, aiding
interpretation, which would be more difficult with traditional content analysis methods.
In terms of the ease of use of Leximancer, I was readily able to apply what I had learned in
two workshops and the on-line tutorials. The greatest difficulty encountered during the
analyses was keeping track of the various projects or iterations. Diligent records of the
project's Leximancer settings were made. Details of concept editing were also noted to enable
the replication of each project. Any problem encountered in selecting what to report from the
analyses was guided by recent publications and theses utilising Leximancer.
Overall, Leximancer provided a versatile tool to undertake interpretive content analysis.
Identification of journal themes through Leximancer was uncomplicated. Each output could
be replicated and the coding of text was automated. The facility for iterative inquiry, the map
and the concept information supported my further interpretation and in-depth understanding
of the data. Interpretive content analysis is just one of many methodologies within the
interpretive paradigm supported by Leximancer.
1
References
Ahuvia, A., 2001. Traditional, Interpretive and Reception Based Content Analyses:
Improving the Ability of Content Analyses to Address Issues of Pragmatic and Theoretical
Concern. Social Indicators Research, 54 (2), 139-172.
Bauerly, R.J., Johnson, D.T. Singh, M., 2006. Readability and Writing Well. Marketing
Management Journal, 16 (1), 216-227.
Davies, I., Green, P., Rosemann, M., Gallo, S., 2004. Concept Modelling – What and Why in
Current Practice. ER 2004 23rd International Conference on Conceptual Modeling, Shanghai,
30-42.
European Journal of Marketing: Author guidelines
<http://info.emeraldinsight.com/products/journals/author_guidelines.htm?id=ejm> accessed
10th June 2008.
Graneheim, U. H., Lundman, B., 2004. Qualitative Content Analysis in Nursing Research:
Concepts, procedures and measures to achieve trust worthiness. Nurse Education Today, 24,
105-112.
Grech, M.R., Horberry, T., Smith, A., 2002. Human Error in Maritime Operations: Analysis
of Accident Reports using the Leximancer Tool. Proceedings of the Human Factors and
Ergonomics Society 46th Annual Meeting.
Available at: http://www.leximancer.com/documents/HFES2002_MGRECH.pdf
Hanson, D., Grimmer, M., 2007. The Mix of Qualitative and Quantitative Research in Major
Marketing Journals, 1993-2002. European Journal of Marketing, 41 (1/2), 58-70.
Jalkala, A., Salminen, R.T., 2005. Customer References as Marketing Practice in Company
Web Sites - Content and Discourse Analysis. Frontiers of E-Business Research, Proceedings
of 5th eBRF Conference, Tampere, Finland, 165-180
Journal of Marketing: Manuscript submission and preparation guidelines
<http://www.marketingjournals.org/jm/ms_prep.php> accessed 10th June 2008.
Journal of Marketing Education: Manuscript submission guidelines
<http://www.sagepub.com/journalsProdManSub.nav?prodId=Journal200864> accessed 10th
June 2008.
Krippendorff, K., 2004. Content Analysis: an Introduction to its Methodology (2nd ed.). Sage
Publications, Inc. Thousand Oaks, California.
Leong, E.K.F., Ewing, M.T., Pitt, L.F., 2002. Analysis of Web Positioning Strategy via Text
Mining Technologies. In: ANZMAC Conference Proceedings. Melbourne, 1311-1316.
Leximancer Home page: <http://www.leximancer.com/> accessed 10th June 2008.
Lloyd, S., Mortimer, K., 2005. The Contribution of Text Analysis to an Understanding of
Constructs in Academic Literature: The Case of Corporate Reputation. In: ANZMAC
Conference Proceedings. The University of Western Australia, 8-15.
1
Pattinson, H.M., 2005. Mapping Marketing Decision Space: An exploration of Emerging
Cognitive Mapping Tools for Analysis of Marketing Scenario and Decision-Making. In:
ANZMAC Conference proceedings. The University of Western Australia, 23-32.
Perry, C., Carson, D., Gilmore A., 2003. Joining a Conversation. European Journal of
Marketing, 37 (5/6), 652-667.
Randall, J.E., Miles, M.P., Randall, C.H., 2004. An Attempt to make Marketing Theory
Useful: The Foundations of the Association of Marketing Theory and Practice and the Journal
of Marketing Theory and Practice. Journal of Marketing Theory and Practice, 12 (4), 2-8.
Saunders,. J., Lee, N., 2005. Wither Research in Marketing? European Journal of Marketing,
39 (3/4), 245-260.
Sawyer, A., Laran, J., Xu, J., 2008. The Readability of Marketing Journals: Are AwardWinning Articles Better Written? Journal of Marketing, 72 (January), 108-117.
Scott, N. Smith, A.E., 2005. Use of Automated Content Analysis Techniques for Event Image
Assessment. Tourism Recreation Research, 30 (2), 87-91.
Steel, C.H., 2006. What do University Students Expect from Teachers using an LMS? In ICT:
Providing Choices for Learners and Learning. Proceedings ascilite Singapore 2007, 942-950.
Available at: http://www.ascilite.org.au/conferences/singapore07/procs/steel.pdf
Stewart, D.W., 2002. Getting Published: Reflections of an Old Editor. Journal of Marketing,
66 (October), 1-6.
Watson, M., Smith, A., Watter, S., 2005. Leximancer Concept Mapping of Patient Case
Studies. Lecture Notes in Computer Science, Volume 3683, Aug, 1232 – 1238.
Available at:
http://www.leximancer.com/cms/index.php?option=com_content&task=view&id=15
Zimitat, C., 2006. A Lexical Analysis of 1995, 2000 and 2005 ascilite Conference Papers.
Proceedings of the 23rd annual ascilite conference: Who’s learning? Whose technology?
Sydney, 947-951. Available at:
http://www.ascilite.org.au/images/papers/conferences/sydney06/proceeding/pdf_papers/
1
© Copyright 2026 Paperzz