IMS_ExpertGroupMeetingJune2014_3.1.UCSurveys_20140630

IMS 2.1.3
Proof of Concept for Data Capture
using Metadata
Bryan Fitzpatrick
Rapanea Consulting Limited
June 2014
Data Capture using Metadata
• Aim is to demonstrate designing and running a survey
questionnaire based entirely on metadata
• Aim is to use DDI metadata
•
•
•
•
design the questions
organise the questions into a questionnaire
present the questionnaire
capture and save the responses
• all based entirely on the metadata
Using DDI Metadata for Questionnaires
• DDI has metadata for Questions
• a simple question goes in a Question Item
– What is your age in years?
• a complex question goes in a Multiple Question Item
– Did you do paid work last week?
» Full Time or Part Time?
» How many hours?
o A Multiple Question Item can contain Question Items or other Multiple Question Items
Using DDI Metadata for Questionnaires
• Questions can link to one or more Concepts
• to indicate what the question is seeking to cover
o Age, Sex, Country, Income, Occupation, ...
o perhaps to qualify what is being covered
– eg Non-farm income, Tertiary qualifications
Using DDI Metadata for Questionnaires
• Questions have:
• Name
– just a multi-lingual name, not used in questionnaires
• Text
– the question that is asked
– can be conditional, multi-lingual, formatted
» can even have mixed language
• Question Intent
– some elaboration about what is being sought
» multi-lingual, formatted
• POC just uses simple unformatted multi-lingual Text
Using DDI Metadata for Questionnaires
• Questions have Response Domains
• what sort of answer is expected or valid
o Numeric domain
– can specify integer of decimal, valid formats and ranges, etc
o Text domain
– can specify format, length
o Category Domain
– valid list of multi-lingual values
» not really very much use
o Code Domain
– valid list of multi-lingual values with codes
» a classification
Using DDI Metadata for Questionnaires
• Questions have Response Domains
• what sort of answer is expected or valid
o Date-Time Domain
– can specify formats
o Geographic
– eg coordinates, other units
o Structured Mixed Response Domain
– a combination of all of the above
• all domain type can have labels and descriptions
Using DDI Metadata for Questionnaires
• Questions do not go directly into a questionnaire
o DDI calls a questionnaire an Instrument
• questions constitute a library available for use
o a “Question Bank”
• questions are selected and assembled into an
Instrument
• the assembling of questions is done with Control
Constructs
• an Instrument identifies a single Control Construct that
builds the questionnaire
Control Constructs
• Control Constructs are the critical component in building
a questionnaire
• they select the questions
• they control the flow of the questions
– branching and looping
• they insert non-question text
– “Now I want to ask you about other people in the household”
• they can compute values
• they link to Interviewer Instructions
o structured DDI Interviewer Instructions
o unstructured external interviewer instructions material
Control Constructs
• Several types of Control Constructs
• Question Construct
– selects a Question Item or Multiple Question Item
• Sequence
– selects a sequence of other control constructs of any type
• If-Then-Else
– defines an If condition with optional ElseIf clauses (multiple)
and optional Else clause
» each condition selects a single Control Construct to
include
Control Constructs
• Several types of Control Constructs
• Loop, Repeat-Until, Repeat-While
– eg to loop over people in a household
• Statement Item
– inserts non-question multi-lingual text (conditional,
formatted)
• Computation Item
– a calculation in some language that is assigned to a Variable
Instrument
• Identifies a single Control Construct to assemble the
questionnaire
o probably a Sequence construct
• Instruments can have an Type
o a single value taken from some Controlled Vocabulary
– a user-managed list of valid values
» eg, Paper, Internet, CATI, ...
• Instruments can have multiple Software specifications
o basically just identifying “software” used with instrument
– not a great deal of use
Instrument
• Instruments do not have any place for useful layout metadata
• just the type of the layout
• a fairly serious limitation
• We need quite a lot of information to do the layout
• how to represent lists
o tick boxes, list boxes, combo boxes, radio buttons
• how to show flow logic
• which questions to show at once, which to separate
• can the respondent backtrack
• We need additional Layout Metadata
• I have designed some
Interviewer Instructions
• A formal DDI metadata type
• Organised, structured instructions
• formatted multi-lingual text
o may be conditional
• May link to external, non-DDI material
• eg, PDF, Word documents
• Not used in this Proof of Concept
Classifications
• DDI holds Classifications as linked Code Schemes and
Category Schemes
• a Category Scheme is a list of Categories
o flat list of multi-lingual names and descriptions
o eg, Country names, Occupation names, etc
• a Code Schemes selects Categories from Category
Schemes, assigns a Code (not multi-lingual), and may
specify a hierarchy
o a Code Scheme may select Categories from multiple Category Schemes
o multiple Code Schemes may select the same Categories
Code Schemes and Category Schemes
• Used for
• Classifications
– a Classification is a Code Scheme
• Controlled Vocabularies
– lists of standardised terms
» defined by DDI, an organisation, a local area
Code Schemes and Category Schemes
• Used for
• Response Domains for Questions
– Code Domains and Category Domains
– Category domains are not much use in a multi-lingual
environment
» Categories have different names in different languages
• with no unique handle except a meaningless Id
• Representations for Variables
– Code Representation
Variables
• A Variable is a container that will hold a data value
• has a Name and Description (both multi-lingual)
• can be linked to a single Concept
– to indicate what the data represents
• can be linked to multiple Questions
– to indicate where the data comes might come from
• can have a Representation
– Code, Date/Time, Numeric, Text
» with constraints on values
• can identify a Response Unit and an Analysis Unit
– a population that it can apply to
Logical Record
• A Logical record consists of a sequence of
Variables
o groups data values for a purpose
– data from a questionnaire goes into one or more
Logical Records
o Logical Records can be linked
– eg, Households and Persons
o Logical Records are independent of any storage or stored format
Record Layouts and Physical Structures
• Map a Logical record to a physical record and an actual
stored file format
• Can support a very wide range of structures and storage
formats
• CSV, Binary file, XML, database
• multiple record types, linkages of many kinds
• POC does not actually use this
• Simple CSV file maps directly from Logical record
Physical Instance
• Holds information about actual data sets produced
• links to Physical Structures, Record Layouts, and
Logical records
• provides a central management of data from a
collection
• POC uses Physical Instance to manage data
o POC 2.3.3 builds on this POC to show how to use SDMX and DDI metadata together
– produces tables from SDMX DSD using data collected with DDI
» uses the Physical Instance information to find the
datasets
What does the POC do?
• Collects survey data based entirely on metadata
•
•
•
•
•
builds or imports all the metadata
assembles a survey instrument (questionnaire)
presents the questionnaire in a Windows Form
collects data into Logical Records
saves the data in CSV files
• POC 2.3.3 builds on this POC
• duplicates Concepts and Classifications to SDMX
o tightly-coupled set of metadata
• uses SDMX DSD to produces tables from the collected
data
How does the POC do This?
• Basically POC system is a metadata creator/editor
• Build, import Concepts
o build in UI, import from CSV, SDMX V2.0
• Import Classifications
o import CSV, SDMX V2.0
o did not implement build in UI
• Construct Questions in UI
o multi-lingual text with links to Concepts
o POC almost supports Multiple Question Items
– by the end of the week
How does the POC do This?
• Build Variables
o links to Concepts and Questions
• Build Control Constructs
o Question constructs
» link to a question
o Sequence constructs
» define a sequence of other Control Constructs
o If-Then-Else constructs
» allows conditional questionnaire flow
o POC does not support Loop, Repeat-Until, Repeat-While, Computation
Item, and Statement Item constructs
How does the POC do This?
• Define Instruments
o link to a single Control Construct
– probably to a Sequence construct
o has an instrument type
– Windows Form, Web, Paper, CATI, ..
– POC only supports Windows Form
• Define Logical Records
o collection of Variables
• Map Logical record to Instrument
o map Variables to Questions
– uses Concept links and question links if present
– allows user override in UI
How does the POC do This?
• Present the Instrument
o Render the instrument in a Windows Form
– using some layout metadata I made up
o Execute the Control Constructs to select questions and manage question flow
o present questions in list box, combo box, radio buttons, text box
– depending on response Domain and some Layout Metadata
» not DDI metadata, my design
o capture the responses into a Logical record
– based on the Logical Record – Instrument mapping
o present questions in language of choice
» limited choice
How does the POC do This?
• Retrieve Logical Record set
o run interview multiple times to get a set of logical records
o results displayed on screen in tabular form
o mapping as defined in Response Domains and Logical record to Instrument map
• Save the Logical records in data file
o CSV file
o no actual Record Layout and Physical Structure metadata
– simplest Record Layout and Physical Structure metadata is
almost empty anyway
» it is just saving the Logical Records with Variables
separated by commas
How does the POC do This?
• Save details of CSV data sets in Physical Instance
metadata
• so the data can be found for subsequent operations
o like producing tables in POC 2.3.3
Metadata is in a DDI Instance
file
DDI Instance
Layout Metadata file
Group
Concepts
Code Schemes and Category Schemes
Questions
Control Constructs
Instruments
Variables
Logical Records
Study Unit
Physical Instance
Layout Metadata
DDI is fairly complex
• But so is designing a Questionnaire!
• Constructing questionnaire involves constructing a lot of
metadata
• but DDI is fairly logical
• you need to think about what you want in the
questionnaire and how you want it to flow
o but you need to do this if you are designing a questionnaire manually
• once you design questionnaire you get re-use advantages
o
o
o
o
easy to modify, add languages
easy to adapt to some other purposes
easy to reuse useful questions and constructs
easy to capture the data
Let us look at my test questionnaire
• Simple questions about internet access
• based on Eurostat ICT survey
• includes two If-Then-Else constructs to manage flow
• we will look at the structure first
o a good way to plan your own questionnaire
o a good way to see how the DDI metadata works
ICT question flow
1
Country
2
Sex
3
Age
A1 Do you have access to a computer at home?
A2 Do you have access to the internet at home?
B1 When did you last use a computer?
If (within last 3 months)
B2 How often on average?
C1 When did you last use the internet?
If (within last 3 months)
C2 How often on average?
ICT question flow
1
Country
------- QC 1
2
Sex
------- QC 2
3
Age
------- QC 3
A1 Do you have access to a computer at home?
------- QC 4
A2 Do you have access to the internet at home?
------- QC 5
B1 When did you last use a computer?
------- QC 6
If (within last 3 months)
------- QC 7
B2 How often on average?
C1 When did you last use the internet?
------- QC 8
If (within last 3 months)
------- QC 9
QC – Question Construct
C2 How often on average?
ICT question flow
If C 1
1
Country
------- QC 1
2
Sex
------- QC 2
3
Age
------- QC 3
A1 Do you have access to a computer at home?
------- QC 4
A2 Do you have access to the internet at home?
------- QC 5
B1 When did you last use a computer?
------- QC 6
If (within last 3 months)
------- QC 7
B2 How often on average?
C1 When did you last use the internet?
------- QC 8
If (within last 3 months)
------- QC 9
C2 How often on average?
If C 2
QC – Question Construct
If C – If-ThenElse Construct
ICT question flow
Seq C 1
If C 1
1
Country
------- QC 1
2
Sex
------- QC 2
3
Age
------- QC 3
A1 Do you have access to a computer at home?
------- QC 4
A2 Do you have access to the internet at home?
------- QC 5
B1 When did you last use a computer?
------- QC 6
If (within last 3 months)
------- QC 7
B2 How often on average?
C1 When did you last use the internet?
------- QC 8
If (within last 3 months)
------- QC 9
C2 How often on average?
If C 2
QC – Question Construct
If C – If-ThenElse Construct
Seq C – Sequence Construct
Let us have a look at the POC
• Demo
What does the POC show?
What does the POC show?
• It is realistic to use the DDI metadata to design and
present a survey questionnaire
• The POC depended entirely on the metadata
o absolutely no knowledge about the survey built into the system
What does the POC show?
• It is realistic to use the DDI metadata to design and
present a survey questionnaire
• The POC depended entirely on the metadata
o absolutely no knowledge about the survey built into the system
• Designing a survey questionnaire in DDI is fairly
complex
• So is designing one by manually
What does the POC show?
• It is realistic to use the DDI metadata to design and
present a survey questionnaire
• The POC depended entirely on the metadata
o absolutely no knowledge about the survey built into the system
• Designing a survey questionnaire in DDI is fairly
complex
• So is designing one by manually
• There are some clear advantages
• modification and reuse is easy
• multi-lingual presentation is easy
What does the POC show?
• Really need some form of Layout Metadata
• DDI Instrument and Control Construct metadata gives
no guidance on layout
• POC case was very simple
o but still needed some layout information
o realistic survey questionnaire needs considerable layout information
• needs more thought to design this metadata
What does the POC show?
• POC used a Windows Form questionnaire
• no practical use for production survey
• but other questionnaire formats should be easy
o hope to have paper example (in Word) by end of week
o script for a Web Form questionnaire is straight-forward
– but Web Form system still needs to process Control
Constructs
o script for CATI is easy
o script for Blaise should be easy
• easy to have questionnaire available in multiple
formats
Thank you
• Questions?
•
Bryan Fitzpatrick
Rapanea Consulting Limited
[email protected]
Ph +44-7789-886536