Lessons Collective Analysis Overview

Collective Analysis of RAF
Lessons using QDA Miner
Steve Redmond
Head of Lessons Analysis
HQ Air, RAF High Wycombe
01494 49 6682
[email protected]
UNCLASSIFIED
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Contents
•
•
•
•
•
Background
Research
Solution
Future
Demonstration
– The data in this presentation has been
changed to maintain MoD integrity.
Please interject at any time
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Background
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Overview
•
•
•
•
•
RAF
HQ Air Command, High Wycombe
Air Lessons Cell
Small team
Aim is to capture, learn, analyse and exploit the
lessons from operations and exercises
• Improvement
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
What is a lesson?
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Lessons database
• Defence Lessons Identified Management
System (DLIMS)
• Classified data stored separately
• Two main functions:
– Process the analysis of individual lessons
– Enable basic search and collation of lessons
for deeper analysis
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
The Lessons Process
Capture
Learn
Exploit
Analyse
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Individual analysis
• Lessons are entered into DLIMS
• Subject matter experts (SMEs) are appointed to
each lesson to validate, consider and
recommend action
• Ultimately each lesson gets closed (individual
analysis complete)
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Deeper analysis
• Aim unclear, something like:
– Key points
– Aggregates
– Trends
• Basic search functionality
• High volume of written information
• Requires significant time and effort
• Sensitivity analysis hard
• Feeling there must be a better way
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Research
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Challenges
• New
• Text – acronyms, terminology, spelling,
waffle, subjective, volume
• Tasking – informal, unclear
• Capability – simplistic, unscientific
• Resource – subject skills and experience
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Start
•
•
•
•
Early 2009
Blank sheet of paper
How do we do “deeper analysis”?
Research into
– Summarisers of content
– Mining of text
– Metadata analysis
– Other users (what do they do)
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Findings
•
•
•
•
•
•
Summer 2009
Identified key terms
Established scientific techniques
Established processes for text mining
Short list of preferred tools
Tested shortlist - online, email, phone
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Text mining
Text mining applies automation and science to
the analysis of large volumes of written data
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Text mining
Intelligent text mining requires a
taxonomy and thesaurus on the
subject matter of interest. The
more words and phrases that are
recognised, the more intelligent
the mining. A taxonomy is the
classification of words and
phrases by various criteria,
possibly hierarchy, similarity or
type. Supporting the taxonomy will
be a thesaurus or dictionary.
In the example on the left, the
recognised words and phrases are
coloured and the most common
classifications shown.
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Techniques
•
•
•
•
•
•
•
•
Word analysis – frequencies, patterns
Key Word In Context (KWIC)
Hierarchical analysis – higher links
Cluster analysis – groups, data reduction
Cross tabulation – control factors
Correspondence analysis – word associations
Heatmaps – uses colour, maintains fidelity
Thematic analysis – categories and links
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Recommendation
• Collective Analysis
• Define – themes, trends, unusual effects,
correlations, cause/effect, test hypothesis
• Routine – apply to all lessons
• Bespoke – respond to specific requests
• Manage – apply basic controls
• Software – purchase
• Science - apply
• Subject Matter Expertise (SME) - involve
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Expected benefits
• Time – text mining speeds up the processing of
lessons
• Cost – text mining and consultancy reduce the
effort required by the analyst
• Quantity - text mining increases the capability to
analyse more lessons
• Quality – scientific techniques make analysis
more objective, impartial and auditable
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Solution
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Tools
• Purchase suite of tools from Provalis Research:
– QDA Miner
– WordStat
– SimStat
• Not networked
• Copy of RESTRICTED DLIMS database
• Soon to get copy of SECRET DLIMS database
• Accepts non DLIMS sources
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
QDA Miner
• Enables filtering/searching, and then
coding/annotating/retrieving/analysing of
documents and images
– Projects
– Cases
– Variables
– Codes
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
QDA Miner - Cases
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
QDA Miner - Variables
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
QDA Miner - Codes
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
WordStat
• Enables text mining and content analysis of
large amounts of unstructured information
– Dictionaries
– Frequencies
– Phrase finder
– Crosstab
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Dictionaries
• Exclusion dictionary: words/phrases to ignore
• Inclusion dictionary: MOD Taxonomy
• Residual words: not excluded or included
• Hierarchies
• Synonyms
• Duplicates
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
MOD Taxonomy Level 1
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
MOD Taxonomy Level 2
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
MOD Taxonomy Level 4
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Dictionary rules
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Dictionary: Excluded words
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Frequencies: Included words
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Frequencies: Leftover words
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Frequencies: Phrase Finder
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Frequencies: Dictionary
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Keyword in Context (KWIC)
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Keyword Retrieval
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Hierarchical clustering
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Concept mapping
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Proximity plot
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Cross tabulation
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Correspondence map
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Future
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Schedule
• Publish Air themes Nov
• Paper Design of Collective Analysis by Nov
• Build Air dictionary (80%) by Dec
• Analyse non DLIMS lessons by Feb
• Offer generic capability in text analysis by Mar
UNCLASSIFIED © UK MOD
Crown Copyright, 2010
Demonstration
• QDA Miner
• WordStat
UNCLASSIFIED © UK MOD
Crown Copyright, 2010