Investigating the Effects of Trend Lines with the Microworld

TVE 17 006 juni
Examensarbete 30 hp
Juni 2017
Investigating the Effects of
Trend Lines with the Microworld
GridRail
Kristina Mach
Institutionen för teknikvetenskaper
Department of Engineering Sciences
Abstract
Investigating the Effects of Trend Lines with the
Microworld GridRail
Kristina Mach
Teknisk- naturvetenskaplig fakultet
UTH-enheten
Besöksadress:
Ångströmlaboratoriet
Lägerhyddsvägen 1
Hus 4, Plan 0
Postadress:
Box 536
751 21 Uppsala
Telefon:
018 – 471 30 03
Telefax:
018 – 471 30 00
Hemsida:
http://www.teknat.uu.se/student
In 1995 a research collaboration was launched between the department of
Information Technology at Uppsala University and the Swedish Transport
Administration (Trafikverket). Together they built a software protype STEG,
for controlling trains and planning new routes. The aim of the recent research
has been to understand why STEG is appreciated, how to improve it and for
collecting information about how to build future systems.As an introductory
explanation of the generally positive appreciations of STEG different hypothesis
were constructed. This thesis is part of a systematic variation and investigation
of one of thesehypotheses. The other two will be investigated in future
experiments.In order to test the hypothesis, a microworld was developed a simplified game of a complex system. a few experiments have previously been
conducted which did not support the first hypothesis about the trend lines causing
a positive effect, based on how quickly the participants finished playing GridRail.
More experiments are needed in order to look for explanations. The purpose of this
study was to see how the game play was affected by absence or presence of trend
lines.
A between study that was conducted; we made 32 participants use the microworld
GridRail a game with the purpose of driving 6 trains to opposit stations without
collisions as fast as possible. The results does not show any significant difference
on how fast the participans end each trial between the conditions with or without
trend lines. However, comments in verbalisations and notes of how many trains
each participant drives at the same time are an indication that the lines do affect
how participants play, even though the finish time is not affected.
Handledare: Anton Axelsson
Ämnesgranskare: Anders Jansson
Examinator: Nóra Masszi
TVE 17 006 juni.
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1
Previous experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
Questions to be answered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
4
4
4
2
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1
Early research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2
GridRail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3
Recent evaluation experiments with GridRail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4
Future experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3
Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Dynamic Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Microworld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Think Aloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1
Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2
Basic assumptions for the Think Aloud . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3
Problems with Think Aloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
13
13
14
15
15
16
4
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
Pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5
Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
17
18
19
20
20
5
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Learning curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Effects of trend lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1
Boxplot - Mean time per trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2
Boxplot - Mean time before each trial . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3
ANOVA for Mean time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
Analyses of collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1
Division by lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2
Division by verbalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3
Participants without trend line condition . . . . . . . . . . . . . . . . . . . . . . .
5.3.4
Participants with trend line condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.5
ANOVA for collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
24
25
25
26
26
28
29
29
30
31
31
5.3.6
ANOVA for collisions in Order 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Effects of verbalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1
Mean time based on verbalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.2
Variable interaction: Verbal and Order . . . . . . . . . . . . . . . . . . . . . . . . . .
Analyses of verbalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.1
Summation from table 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2
Breaking down numbers from notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.3
Notes from experiment by trend line condition and
verbalisation order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.4
Verbalisation transcript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
32
32
33
34
34
35
6
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
Discussion of methods used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4
Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1
GridRail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.2
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
41
43
43
44
44
44
7
References
45
5.4
5.5
..................................................................................................
35
37
1. Introduction
In 1995 a research collaboration was launched between the department of Information Technology at Uppsala University and the Swedish Transport Administration (Trafikverket). Together with Trafikverket, Uppsala University
built a software protype STEG, for controlling trains and planning new routes.
The prototype has after that been used in Norrköping and Boden with mainly
positive results. The aim of the recent research has been to understand why
STEG is appreciated, how to improve it and for collecting information about
how to build future systems.
As an introductory explanation of the generally positive appreciations of
STEG, the following three general hypotheses were identified: either the visualized trend lines in the interface support the train traffic controller when
he/she is making predictions, or; the spatial layout in the interface makes it
easier for the controller to map this analog representation to the real geographic area and thus facilitates his/her understanding of the actions going
on in the domain, or; the ability to re-plan with the help of direct interactive
feedback supports an efficient and cognitively less demanding way of working
with the interface. This thesis is a part of a systematic variation and investigation of the first of these hypotheses. The other two will be investigated in
future experiments.
In order to test the hypothesis, a microworld was developed - a simplified
game of a complex system. A few experiments have previously been conducted which did not support the first hypothesis about the trend lines causing a positive effect, based on how quickly the participants finished playing
GridRail
However, more experiments are needed in order to look for explanations,
and this with different methods or approaches. In this thesis we explore the
impact of the trend lines based on what the participants say during the game
and how they play rather than the output measured in time.
The data will be gathered by using a Think Aloud method, by observing
and taking notes and by recording the screen and voice activity.
3
1.1 Previous experiments
The latest research on STEG had the aim to understand why the train traffic controllers found the system helpful and what researchers should consider
when building similar systems in the future. STEG is a complex system in
which the traffic controller can see the ongoing process, plan and execute new
routes and see a prediction of future traffic. In order to test which components
of this system are good the system has been divided in to sub components of
which one is trend lines. These trend lines have been evaluated in earlier research based on the hypothesis that these trend lines were a part of why STEG
was perceived as a good system; with an outcome that they did not help the
user to finish a task of driving trains quicker.
It is however problematic to conclude that the trend lines are not helpful in
a big and complex system based on a single variable; time. Multiple experiments are needed to give more data upon which a conclusion can be based.
For example; we do not know if participants with trend lines actually noticed
and used the lines and we do not know if the participants steered and planned
the routes differently based on trend lines. Further investigations are therefore
interesting which can include new angles of the same experiment.
1.2 Questions to be answered
Building on earlier research two more specific research questions have been
formulated to give a broader bases for investigations into decision-making.
Hypothesis: The presence of trend lines in a simple dynamic system will
accelerate learning and improve performance.
• Are the trend lines in GridRail helping the users?
• Can the participant’s descriptions of how they solved the task give input to
whether the trend lines are helpful?
1.3 Delimitations
By isolating parts of STEG and testing them in the microworld GridRail,
where the idea is to drive 6 trains to an opposite station, the results can not
directly be interpreted as a result for how train controllers perceive the complex system STEG; even thought it can give us a direction of how people
react on trend lines.
Novice participants never used the system STEG or the microworld GridRail
and by only using novice test persons a better comparison can be done be4
tween participants and a comparison of improvement in games over time.
This selection of participants have a limiting impact in that sense that the
real controllers are not novice users; and the results will therefore not have
an impact on how to improve STEG today - this would require a test of
STEG in its natural environment with the traffic controllers as participants.
The experiment will not help us to understand if the trend lines in STEG are
helpful in the complex system when a controller has been learning it over a
longer time.
5
2. Background
2.1 Early research
In 1995 a research collaboration was launched between the department of Information Technology at Uppsala University and the Swedish Transport Administration (Trafikverket). The aim of this project was to gather knowledge
about the traffic control systems and how to develop a future interface for a
more effective and operational planning system; given high safety and a good
work environment. Extensive demands of the train network capacity together
with higher speeds and higher demands of trains being on time amounted and
still accounts for better planning tools. Research activities included description and analysis of the control tasks, design of new control principles, user
interface design, decision support systems, work organization and work place
design (Andersson et al, 2015)
Previous research resulted in a body of knowledge for interaction between
humans in various roles, various technical support programs and for a basic
analysis of train control operation. Some highlights of the insights from the
collected information are mentioned below:
• Technical systems must be built according to professional individuals needs
and domain specific knowledge.
• In future developments a focal point should be the professional environment
and the cognitive aspects.
• To increase the situation awareness for future controllers is of high importance.
• Develop a real time plan which can be adjusted and make it available for all
agents to see. Resulting in all agents feeling "In the loop".
• The real time plan should execute automatically without changing the operators planned routes.
• The planning interface should contain all information needed and the changes
in the train traffic should be done in the same interface. (Andersson et al.,
2015)
Control systems of railways in real time has to be dynamic, safe and precise
and it therefore results in a complex system. The traffic controllers are trained
to handle a high amount of information and they can overview, interpret and in
real time use almost unlimited amount of information. Therefore, the interface
was not build to be simple and downscaled but instead to be more efficient and
usable. (Andersson et al., 2013).
6
The complexity is defined by the properties of information which is given
by the process that is being controlled (Andersson et al., 2013). The research
group found that interfaces fail if the developer misunderstands the complexity and therefore does not provide the skilled operators with enough tools to
work with (Andersson et al., 2013). Humans need a regular environment or
surrounding to be able to predict outcomes and they also have to learn these
regularities by practicing them. When this is achieved expertise and intuitive
skills are developed. In different words, train controllers have learned to handle intense rush-hours, plan new routes near real time as new information is
flowing.
Consequently, focus was set on understanding the work process of tracking
diagrams, panels and how controllers solve conflicts when they occur (Andersson et al., 2013). A model was introduced to steer the research in the right direction for systems supporting human control of complex, dynamic processes.
The model is called GMOC, an acronym for Goal, Model, Observability and
Controllability (Andersson et al., 2015). These four areas are seen as necessary to take in account when building systems for human operators to achieve
control.
1. Goals
The goal is the basic specification of what the operator will, must, should,
or wants to achieve - a specification of the objectives of the control process.
2. Models
The model includes an understanding of the users work, fellow employees, the control systems, in other words - their whole work environment.
3. observability
Observability refers to the ability of the system to produce information
that can be used and interpreteted by the operator. It can be visual observation of information like an interface, signal lights, or other digital
instruments.
4. Controllabillity
Controllability means the ability of the system to provide means for the
operator to gain control over the process. It refers to different ways of
interaction for example keyboards, buttons or another steering mechanism or voice control. (Tschirner, 2015)
From the information collected through the years and from the analysis of
complexity the research group developed prototypes for the real time interface.
After evaluation experiments in laboratory environments the prototypes where
developed in to a full scaled system, named STEG. It was later implemented
in Norrköping (See figure 1) and Boden, two of the control centres in Sweden.
The implementations in the two control centres resulted in improved support
7
for the controllers and simplified the re-planning (Andersson et al., 2015).
Before STEG the controllers used a paper-based time-distance graph for replanning routes (see figure 2) and to remember what needs to be done at what
point in time. The paper based planning required a lot of memory from the
controller which gave a heavy cognitive workload. This, because during replanning the controller drew new time table lines on a paper sheet and used it
to remember what had to be done at which. At the same time new information
could appear on the screens or by phone and because the new paper based
plan was not integrated into the system there was a big risk that the different
automatic functions in the system will work against the new plan. Therefore, it
required the controller to always be alert in different systems and to remember
the planned changes. With STEG the planning is done in the software with
results showing immediately, relative to other trains, and it gets integrated the
automatic system see figure 3 (Andersson et al., 2013).
Figure 1. Control center in Norrkoping
8
Figure 2. Paper-based time-distance graph
Figure 3. STEG
2.2 GridRail
After the implementation of STEG in Norrköping and Boden the research continued but instead of focusing on the need the question was why the traffic
controllers actually liked the system. The process of focusing on usability
during all stages of the process and also further on, during the systems whole
life cycle, is called User-centered system design - valuable information when
build future systems.(UCSD) (Gulliksen et al., 2003).
The hypothesis for why the controllers liked GridRail came down to the
following;
9
• Trend lines visualizing the trains future position gives support for the train
traffic controller
• The geographic representation in the software system makes it easier for the
traffic controllers to plan their route
• The ability to re-plan in the interface makes it more usable than the paper
based planning
The ongoing research wanted to explore these hypothesis and started with
the first one mention above; that the trend lines improved the performance.
In order to test just the trend lines they were isolated from the whole system. A useful tool when testing a complex and dynamic system is to build
a microworld (see more in section 3.1) - a simplified model of the real and
complex system STEG.
2.3 Recent evaluation experiments with GridRail
Two experiments have been done on a microworld GridRail - programmed to
be a simple simulation of STEG, which is a complex system (more about microworlds in part 3.1 and 3.2). In 2016 an experiment 1 was done by Sercan
Caglarca to explore if the presence of trend lines, in a simple dynamic system,
will accelerate learning and improve performance. This was based on the hypothesis that these trend lines were a part of why STEG was perceived as a
good system. The trend lines visualize the history, the present situation and
the future position of the planned traffic. (Caglarca, 2015)
Additionally, the experiment also questioned whether introduction of a time
target, where the participant where introduced with a finish time to aim at,
would give an effect on the behaviour of the participants. The experiment was
divided in to two blocks with 20 attempts in each block. Different instructions
were given depending on the game - with or without trend lines (Caglarca,
2015).
The first hypothesis was falsified. It took the group with trend lines more
time to complete the task in the first 20 attempts than the group without trend
lines. In the last 20 attempts the participants which did not have trend lines almost stopped improving their performances in the last 20 attempts and the participants with trend lines kept slightly improving their performances and therefore converged the performance levels between the two test groups (Caglarca,
2015).
For the conditions with and without a target, results showed that if the participant did not have to finish the game at a specific range of time there were
10
no significant difference in performance between the groups with or without
trend lines. However, if the participants were told to aim at 32 seconds for each
round the participants with trend lines performed significantly worse. Furthermore, a post-hoc comparison showed that the mean time spent in a trial was
higher with trend lines and with a time target than in for participants without
trend lines and with a time target. The trend lines do not help the participants
to solve the puzzle when introduced to a target. Instead, the result showed in
a negative result with a higher mean time spent on a trial (Caglarca, 2015).
Conclusions where made that the presence of trend lines did not improve
performance and slowed down learning among the novice users for 40 trials.
Experimental evidence also confirmed that when trend lines are visible to the
participants, introduction of a target imposed a heavy cognitive load which
lead a perception of a more difficult task (Caglarca, 2015).
Caglarca suggested to slow down the speed of the trains in order to decrease the difficulty and to see if the results then differed. Moreover, he also
suggested to explore the relation of performance and long periods of practice where the participants could get two or three days of practice. (Caglarca,
2015).
Another recent experiment with GridRail was also done by Sercan Caglarca,
later summarised by Axelsson, and it was built on the previous experiment
mentioned above. However, this time the experiments focused on how the
performance and learning of novice users are affected by the presence of trend
lines in an interface of a dynamic system and if introduction to a target actually affect the user behaviour. Furthermore, the last question was how the
perceived difficulty is affected by the presence of trends and a target (Axelsson, 2016.
Participants used GridRail similarly to Experiment 1 apart from that a session contained 3 blocks of 20 trials instead of 2, and also the participants
returned the next day for a second session of 3 blocks of 20 trials, and a month
later for a third session of 1 block of 20 trials. (Axelsson, 2016).
The results showed that Performance Time means were higher in the condition with both Trend lines and target time compared to the one with only trend
lines and only target time. There is also a significant interaction in perceiving
the task difficult if the participants had both the trend lines and were given a
time target (Axelsson, 2016).
11
2.4 Future experiments
The cognitive workload is showed to be higher when participants are exposed
to trend lines in the recent experiments. This can be explained by the fact, as
mentioned earlier, that humans need to learn regularities by practicing them
in order to achieve expertise or intuitive skills. By making the participants to
come back another day, an attempt to increase the expertise was made in the
most recent experiment. However, the participants might not have understood
that the trend lines where something they actually should learn to use and the
possibility exists that they did not reflect upon it. The instructions for the participants with trend lines where extended with the following sentence; "In the
top of the game you will see lines representing a prognosis of the trains future
horizontal positions", on a two paged instruction manual this might have been
overlooked.
One of the most basic rules when it comes to writing a text or designing a
website is that the more important something is, the more prominent it should
be. A larger heading, bolder text or even a distinctive colour are examples of
how a reader could identify and understand what is important in the text. Another important aspect is that people often scan a text and do not always read
it word by word; it might be different if a participant is in a situation where he
or she gets an instruction manual instead of browsing a website. However, we
tend to think that our behaviour is more sensible then it really is. (Krug , 2006).
To see if the participants will perform better when actively learning and
noticing the trend lines, the participants will receive an instruction where the
importance of the trend lines are explicitly described. Furthermore, to understand if the participants actually use the lines, even if they appear on the
screen, a Think Aloud (TA) study will be conducted which can add insight
into the intentions and strategy of the participants.
During the analysis a comparison will be made between what the participants did from TA-recordings and was seen during the recordings. This might
help to shed light on previous experiments, where analysis was made with
an interview as foundation, and also to understand more about the perceived
cognitive difficulty and the actual difficulty.
12
3. Theory
3.1 Dynamic Decision Making
Dynamic decision means that the choices made are based on a series of earlier
decisions. In turn, the decisions are depending on a changing world driving
the process of choice making to be done in real time and without a long time
of deliberation (Brehmer, 1993, 211). So a decision is based on earlier decisions and a series of decisions are needed to reach a goal. With other words
the most recent choice made will constrain the following questions ahead and
the state of the problem will depend on the decision maker’s actions in real
time (Edwards, 1963).
The dynamic decision making can be studied in the field where a real system is in use, like a train traffic control centre. However, the data collection
is time consuming and the data gathering itself can be dangerous due to that
the collection require some level of interventions in the usual work process effects on the real time planning of trains can not be foreseen (Logie, 2011).
When studying dynamic decision making in an isolated part of a program
using a microworld is helpful (Brehmer, 1992). The term microworld was introduced by Seymour Papert and refers to a "subset of reality or a constructed
reality so as to allow a human learner to exercise particular powerful ideas or
intellectual skills" (Papert, 1980). So the idea of microworlds emerged from a
learning perspective. Nowadays however, microworlds are used in evaluation
of decision making and in usability studies. The microworld can be described
as a programmed simulation of an environment which enables a dynamic process where the user manipulates and explore the program
3.2 Microworld
The use of microworlds fits the purpose of studying decision making in a complex environment like train traffic control. With a microworld study the inclusion of dynamic dimensions, stress and real time decision making are feasible.
This, because the main purpose with the microworld is to simulate dynamic
decision making so by nature it has to have a level of complexity. They are designed to have a feedback that does not give a hundred percent clear response
- the user has to search, interpret and then decide what the next step should be.
13
Also, the feedback is not given immediately which defines a complex system
where the user has to use a feedforward strategy (Brehmer, 1992).
The progress of the simulation is therefore not to be controlled by the time it
takes for the user to process and think, instead it should be an ongoing process.
Four conditions should be remembered when building a microworld. Namely,
there has to be a goal with using the system, the user must understand in which
state the system is at and the system itself has to be interactive so the user is
able to change the state, with various actions. Furthermore, the microworld
must be a model of a system or at least that the controller should behave as
if he or she have some kind of familiarity with the system and its controls
(Brehmer, 1992).
3.3 Think Aloud
For an evaluation experiment different methods can be chosen. One of the
most used methods is the Think Aloud (TA) method. The TA-testing will not
prove whether one interface is better than another but instead provide an input
on why the Interface is or is not preferred. When we are using an interface
everything can make us stop and think, even if we do not reflect upon it; like
colouring or smart names. In general, people do not like to get disturbed by
thinking of the interface when wanting to complete a task (Krug, 2006). By
using a TA we want to capture those kind of sudden and unexpected thoughts
(Krug, 2006).
There are two types of TA; Retrospective and Concurrent. In the latter the
participants are asked to talk aloud when completing a task or a set of tasks;
this type enables an understanding of what participants think about the interface and how the process of using the trend lines play out. Instead of talking
aloud while completing the task the Retrospective approach askes the participant to do the task in silence. Only afterwards the participant can comment
on their experience of the interface, which can provide additional insight into
the intentions and strategy of the participant. To summarise: the Retrospective
type is focused on why the participant used the interface in a specific way and
the Concurrent type is focused on how the participant use the interface (Hanington and Martin, 180). With the purpose of extracting information on how
the students use GridRail or what they actually think of and use as a tool when
solving the task.
Furthermore, a good reason for choosing a TA rather than only an interview after the participants used GridRail is because humans are good at rationalizing. This can be expressed in making up convincing reasons for their
14
behaviour after the event, presumably making use of theories about what is
appropriate. So if parts of the task are done unconsciously then they are not
available for report later on. The person reporting may say what he genuinely
thought he did but this may not be what he actually did (Bainbridge, 1999).
In order for the TA experiment to give results the observed performance
must first be translated in to data and then analysed. In order to use different
analysing tools the data must be divided in to soft or hard data. Hard Data is
defined as data in the form of numbers or graphs and can therefore be used in
various statistical analysis and soft data. Soft data on the other hand can be
interpretation and opinions. In this thesis we are more interested in hard data
and in order to turn a Think Aloud in to hard data the recordings should be
translated in to a transcript (Ericsson and Simon, 1993).
3.3.1 Pre-processing
Words that are repetitions and sounds of stress are eliminated in the transcription. This step is called pre-processing. When the transcription is done, the
next step is to encode the text by translating it to terminology of the theoretical model used. Determining the terminology is done beforehand and then the
translation is done by a human who judges the information independently of
the surrounding segments (Ericsson and Simon, 1993).
During the translation each phrase can be analysed by content analysis.
During this stage sentences are categories by type and then the number of
phrases in each category Is counted. The categories of phrases types can be
based on the referents of content words in the phrases, their syntax or the implied cognitive processes (Bainbridge, 1999).
3.3.2 Basic assumptions for the Think Aloud
• Verbal behaviour is a recordable behaviour
• The cognitive process that generate any kind of recordable response behaviour
• The participants behaviour can be viewed as a search through a problem
space, accumulating knowledge about the problem situation as he or she
goes on.
• Each step in the search involves the application of an operator to knowledge
held by the participant. Application of the operator brings new knowledge
moving the subject to anew point in the problem space.
15
• The verbalisations of the subject correspond to some part of the information
he or she is currently holding and usually to information that has recently
been acquired.
• The information consists primarily of knowledge required as inputs to the
operators, new knowledge produced by operators and symbols representing
active goals and sub goals (Ericsson and Simon, 1993).
3.3.3 Problems with Think Aloud
The participant may not report what is obvious to him. He might collect unmentioned information while reporting other activities. Also, the participants
might use beginner’s methods or doing things in sequence rather than doing
several things at the same time because it is easier to describe. (Bainbridge,
1999). Furthermore, no matter if the thoughts are verbal or non-verbal, when
someone is reporting, he can choose what to make public. And because participants want to help they often say what they think the experimenter wants to
hear instead of what they actually think. Just because something is not mentioned in the protocol does not prove that the operator does not know it. For
this reason, it is good to follow up the Think Aloud with an interview. (Bainbridge, 1999).
16
4. Methods
Every piece of research based on usability is part of the ongoing project of
understanding users of complex systems and how to build better systems (Kuniavsky,2012). The future users may be the train controllers or strangers who
have never used complex systems but regardless of the user, the more experiments we conduct the bigger our knowledge bank will be.
The intention of the experiment is to understand whether or not the trend
lines are helping the users to plan the train route in the most efficient way.
Earlier research (see part 2.1 tracked the time it took for each participant to
complete each trial and compared trial time between the two groups with and
without lines. The results showed that the lines did not make the participants
finish the trials faster than the participants without lines. To further investigate
if the lines help to decrease the cognitive workload these experiments will not
only focus on the time itself. Instead, participants expressions and reactions
during the game play and how they plan and drive the trains will be included.
The expression and reactions will easiest be collected via a Think Aloud (see
section 3.3) because the thoughts are produced spontaneously during the game
and collecting them afterwards, via for example a survey, would not give the
precise content. To collect data about how the trains are driven; the amounts
of collisions will be counted. Participants which will play the game with the
intersections lines can subconsciously understand and might play the game so
that the trains will not collide; because the idea of the intersections lines is
to show where the trains intersect and therefore avoid collisions. An imaginable scenario is that the participants without lines might speed up the trains to finish the game quicker without considering that the trains should not collide. Therefore, collisions will be counted and included to the instructions, for
both groups, will be explicit information that the goal is to finish the game as
quick as possible without collisions. This also increases the chance that the
complexity is high for both groups.
4.1 Participants
For better comparison between experiments the difference between the participant’s demographics should be low. Earlier experiments where run with
participants from Uppsala University’s technical programs and PhD students
17
with an average age round 25 (Sercan, 2015, 45). Table 1 shows the requirements. The preference of participants who have taken a course in HCI is based
on the choice of method. A TA-study is only successful if the participants do
not forget to talk aloud while conducting the experiment. People who are familiar with these types of studies might feel more comfortable during a TA
and might understand that they should think aloud rather than explain what
they do.
Table 1. Demographics for recruiting
Demographics
Preference
Ages
20-35
Gender
AX
University education
Ongoing
Preference (not compulsory)
People who have taken at least one HCI- course
Experience with GridRail
None
Experience with computer products
High; at least two technical courses
Targeting single or multiple group
Single group
Undesirable characteristics
experience with GridRail
4.2 Design
The intention of the experiment is to understand whether or not the trend lines
are helping the users to plan the train route in the most efficient way. Earlier
research (see part 2.1) tracked the time it took for each participant to complete
each trial and compared trial time between the two groups with and without
lines. The results showed that the lines did not make the participants finish
the trials faster than the participants without lines. To further investigate if
the lines help to decrease the cognitive workload these experiments will not
only focus on the time itself. Instead, participants expressions and reactions
during the game play and how they plan and drive the trains will be included.
The expression and reactions will easiest be collected via a Think Aloud (see
section 3.3) because the thoughts are produced spontaneously during the game
and collecting them afterwards, via for example a survey, would not give the
precise content. To collect data about how the trains are driven; the amounts
of collisions will be counted. Participants which will play the game with the
intersections lines can subconsciously understand and might play the game so
that the trains will not collide; because the idea of the intersections lines is to
show where the trains intersect and therefore avoid collisions. An imaginable
scenario is that the participants without lines might speed up the trains - to
finish the game quicker - without considering that the trains should not collide. Therefore, collisions will be counted and included to the instructions, for
both groups, will be explicit information that the goal is to finish the game as
18
quick as possible without collisions. This also increases the chance that the
complexity is high for both groups.
During the experiments there will only be one evaluator present in the room
together with the participant. More evaluators could affect the participant and
we want to examine how the participant solve the game and behave as if he
or she would be in the participant’s own home. In other words, only one evaluator could make them be more relaxed. In order to compare the outcome
between the different participators a controlled setting, or at least similar, isolated rooms are needed. This to know that outer influences does not affect the
experiments differently. Therefore, one room will be booked for all the experiments and the setting will look similar for each participant (See figure 6 and
7). There will be two blocks that are 30 min each distributed on two different
occasions. See table 2 and 3. During the second block the TA-session will
be conducted and a structured interview. Half of the participants will have
explicit information about how to use trend lines
Table 2. independent variables
TA in Block 1
8 participants
8 participants
Lines
No lines
TA in Block 2
8 participants
8 participants
Table 3. Dependent variables
1
2
3
Collective Data
Time per trial
Number of collisions per trial
Think Aloud transcript
4.3 Pilot study
In order to test if the experiment setup was feasible a pilot study was conducted with the same game constructions which was used in earlier research
(see part 1.2) but added amount of trials so they were divided by two blocks
with 20 trials per block and 40 trials in total.The participants, both with and
without lines, acknowledged that the trains were moving slow. The slow trains
could be a possible obstacle in detecting whether the lines decrease the cognitive workload or not. The idea of using microworlds (as seen in part 4.1)
is to evaluate an complex system. Therefore, we increased the speed of the
trains proportional to each other to be sure to challenge the participants for
comparing the cognitive ability later between the groups.
19
4.4 Material
For this experiment a MacBook Pro (Retina, 13-inch, Late 2012) was used
together with a complementary mouse. A computer was necessary in order to
use the microworld GridRail.
GridRail is a software for testing how participants react or change their
behaviour based on trend lines. Beneath in figure 4 and 5 the interface of
GridRail is shown.
Figure 4. GridRail interface
Figure 5. GridRail interface
4.5 Procedure
All the participants were given a time slot of 2 hours for the experiment located
in the same room. See figure 6 and 7. In order to have comparable results the
input from the surrounding should be similar. The evaluator is one outer fac20
tor that influences the behaviour of the participant. To decrease the influence
a manuscript will be followed by the evaluator. The only conversation that is
Ad-Hock will be the answers to the participants questions which of natural
causes will be different in each situation. A manuscript was used for the conversation between me and the participant. See table 4 for the manuscript. The
part in the middle was said before the block with verbalisation. For half of the
participants this was done after block 1 and for the other half the text was read
coherent.
Based on the participants line condition a instruction over GridRail was
read after the first manuscript.(See appendix 1 and 2)
Before the start of GridRail a survey was conducted on the screen to determine demographic segmentation and technology use. This gives a possibility
to exclude participants to make the group more homogeneous. After filling in
the question sheet the first trial began.
In order to see the screen but still be out of sight the evaluator sat behind
the participant, taking notes. When 20 trials were over, the a text saying that
the participant has a 5 min break showed on GridRail. The participant was
then told that he or she could go to the toilet or just stand up and when they
are ready, they could start.
When the participants came back from the break they were either told them
that they did not have to verbalise in this block or they were read the verbalisation instructions (see table 4).
21
Table 4. Manuscript for the experiment
Welcome
This experiment will be held in English - I hope it’s ok with you.
Please take a seat in front of the computer.
Follow the text while i read the instructions
With your permission, we’re going to record what you do on the
computer screen and what you have to say. The screen recording will
be used only to help us in the experiment because I don’t have to
take as many notes. If you would, I’m going to ask you to sign
something for us. It simply says that we have your permission to
record you.
(Say this before TA-block)
This session we want to hear exactly what you do,
so please talk aloud while you think, don’t worry that you’re going
to say anything wrong.As we go along, I’m going to ask you to talk
out loud,to tell me what’s going through your mind. This will help
us.
If you have questions, just ask. I will not be able to answer them
right away, since we’re interested in how people do when they don’t
have someone sitting next to them, but I will try to answer any
questions you still have when we’re done.
Do you have any questions before we begin?
To demonstrate how to talk during a Talk Aloud I will talk aloud
while counting the windows in my mothers home.
Now it is your turn, please talk aloud
while mentioning one country in each continent.
Good, now lets begin!
22
Figure 6. Settings for the experiment
Figure 7. Settings for the experiment
23
5. Results
5.1 Learning curves
From figure 8 we obtain four graphs. The y-axis show the time measured in
seconds and the x-axis show each trial. The first picture in both rows represents the 20 trials in which the participants verbalised, the second picture on
the same row represents the block in which there were no verbalisation.
Figure 8. Time - Interaction between all the variables
In other words, in the first row the trials go from 1-20 and then 21-40 and in
the second row 21-40 and 1-20. The red Lines represent participants with the
24
trend line condition and the blue lines represent participants without the trend
line condition.
5.2 Effects of trend lines
This section includes figures and statistical analysis based on the research
question whether trend lines in GridRail are helping the users. Results are
divided based on the mean time it took for the participants to finish each trial
and the mean time the participants thought before each trial.
5.2.1 Boxplot - Mean time per trial
The boxplot in figure 9 show the mean time, divided between participants
that played with or without lines to visualize the difference. The statistical
difference is shown in table 6 and the table 5 shows the boxplot in numbers.
Figure 9. Time - Interaction between all the variables
Table 5. Box Plot.
Variable
No Lines 1-20
No Lines 21-40
Lines 1-20
Lines 21-40
Mean
119,700
87,917
110,485
89,721
Minimum
78,604
74,154
87,986
73,182
Median
101,858
87,116
101,378
91,387
Maximum
303,540
128,806
208,230
103,933
Continued on next page
25
5.2.2 Boxplot - Mean time before each trial
The boxplot in figure 10 shows how long each participant waited until starting
the next trial - this is assumed to be a time where the participants think and
plan their next trial.
Figure 10. Planning time before each trial
5.2.3 ANOVA for Mean time
Table 6 illustrates the result from the multiple variance analysis. The ANOVA
was conducted with performance time as a dependent variable and the target
conditions was Block (If you started with verbalisation or not) conducted with
performance time and if you had trend lines or not. The decision criterion of
5 percent is used for the analysis.
With a multiple-way analysis of variance (ANOVA) for the mean time we
can see the variation among variables and between variables. This helps us
understand if and how the intersection lines affected the mean time outcome
of the trials. With different variables involved it is important to segregate the
impact of the lines from the other dependent variables.
Table 6. ANOVA for mean time.
Effect
SS
(1)Lines
0
(2)Block
23709
Lines * Block
4004
Error
228727
(3)Verbal
24608
Verbal*Lines
2549
Verbal*Block
164518
Verbal*Lines*Block
1167
26
Degr. of Freedom
1
1
1
28
1
1
1
1
MS
F
p
0
0.000 0.998052
23709
2.902 0.099523
4004
0.490 0.489642
8169
24608
7.310 0.011525
2549
0.757 0.391560
164518 48.873 0.000000
1167
0.347 0.560636
Continued on next page
Effect
Error
(4)Trial
Trial*Lines
Trial*Block
Trial*Lines*Block
Error
Verbal*Trial
Verbal*Trial*Lines
Verbal*Trial*Block
Trial*Lines*Block
Error
SS
94254
305227
28297
24766
35625
514594
30158
17920
198045
43679
448637
Degr. of Freedom
28
19
19
19
19
532
19
19
19
19
532
MS
3366
16065
1489
1303
1875
967
1587
943
10423
2299
843
F
p
16.608
1.540
1.348
1.938
0.000000
0.067102
0.147693
0.010044
1.882
1.118
12.360
2.726
0.013361
0.327615
0.000000
0.000122
Continued on next page
Explanation of variables
Verbalisation : Mixes both blocks but groups the trials based on if there was verbalisation during those trials or not.
Block : Mixes both trials with and without verbalisation but groups trials
depending on if participants starts or ends with verbalisation.
Trial : There are 20 trials per block and hence the trials goes from 1-20
Lines : This condition separates the trials which have trend lines present
against those trials without trend lines.
Results from table 6
Main effect
From the results we can obtain that there is no significant main effect for variables Lines or block (p=0.998, p=0.099).
However, there is a significant main effect with the variables verbalisation
and Trial (p=0.012, p=0.000).
Two-interaction effect
There are no interaction effects between Lines and Block (p=0.489642), Verbal and Lines (p=0.391), Trials and Lines (p=0.067) or between Trial and
Block (p=0.148).
The interaction effect are found between following variables: Verbal and
Block (p=0.000) and Verbal and Trial (p=0.013).
27
Three-interaction effect
There is no significant effect between Verbal, Lines and Block (p=0.560) or
between Verbal, Trials and Lines (p=0.328).
However, the effects is seen between Trials, Lines and Block (p=0.010) and
between Verbal, Trials and Block (p=0.000)
5.3 Analyses of collisions
Looking at figure 11 we can see a difference between Line and Trial. Also,
when looking at the figures 12,13 and 14 which are just different ways of
representing the mean collisions per trial with respect to if the participants had
trend lines or not. In figure 12 which has separated the trials with respect to
verbalisation and order; we can then see that the participants who verbalised
in block 2 have a lower collision rate when playing GridRail with trend lines.
In the figures 12 and 14, we obtain that the participants with Lines started
and ended with a lower mean time, except in block 1 when participants start
with verbalisation.
The multiple-way analysis of variance (ANOVA) for collisions gives the
variation among variables and between variables. The decision criterion of 5
percent is used for the analysis.
The ANOVA for collisions was done to understand and see if the impact of
trend lines for collisions during GridRail is significant. There is no significant
effect loking overall on the trials.
However, we conducted another ANOVA, see table 8, because of the visual
difference in collisions (that is visualized in figure 12) and the results show a
statistically significant impact of lines on collisions when playing in block 2.
28
5.3.1 Division by lines
Figure 11. Collision Mean per trial
5.3.2 Division by verbalisation
figures with Order 1 shows the first 1-20 trials and Order 2 shows the last 2140 trials. The results are then divided in two two columns where Verb 1 shows
data from particiapnts that verbalised in the first block and the other column
Verb 2 shows data from participants who verbalised in block 2.
29
Figure 12. Collisions - Interaction between all the variables
5.3.3 Participants without trend line condition
Figure 13. Collisions - Trials without trend lines
30
5.3.4 Participants with trend line condition
Figure 14. Collisions - Trials with trend lines
5.3.5 ANOVA for collisions
Table 7. ANOVA for collisions.
Effect
Degr.
Lines
Block
Lines * Block
Verbal
Line*verbal
Order:verbal
Line*Order*verbal
Trial
Line*trial
Order*trial
Line*Order*Trial
Verb*Trial
Line*Verb*Trial
Order*Verb*Trial
Line*OrderVerb*Trial
of Freedom
1
1
1
1
1
1
1
19
19
19
19
19
19
19
19
MS
153.32
3.94
197.66
21.788
5.126
0.413
9.976
3.162
2.153
1.943
1.951
3.247
2.868
2.879
2.573
F
p
1.920
0.177
0.049
0.826
2.476
0.127
1.826
0.187
0.430
0.518
0.035
0.854
0.836
0.368
1.007
0.451
0.686
0.835
0.619
0.893
0.621
0.891
1.065
0.384
0.941
0.532
0.945
0.527
0.844
0.654
Continued on next page
5.3.6 ANOVA for collisions in Order 2
31
Table 8. ANOVA for collisions.
Summary
1 Block
Count
Sum
Average
Variance
20
51.125
2.556
0.398
Count
Sum
Average
Variance
20
27.625
1.381
0.132
Count
Sum
Average
Variance
40
78.75
1.968
0.612
Source of Variation
Sample
Columns
Interaction
Within
Totalt
SS
43.697
1.762
1.837
21.338
68.635
2 Block
Total
Without Lines
20
40
63.125 114.25
3.156
2.856
0.422
0.492
With Lines
20
40
27.5
55.125
1.375
1.378
0.169
0.147
Total
40
90.625
2.265
1.101
ANOVA
df
MS
1
43.697
1
1.762
1
1.837
76
0.280
79
F
155.634
6.278
6.545
p-value
4.489E-20
1.436E-2
1.250E-2
F-crit
3.966
3.966
3.966
Continued on next page
5.4 Effects of verbalisations
Based on the research question if participants can descriptions how they solved
the task and give input to whether the trend lines are helpful following results
are shown. The figures show how verbalisation gives an effect on the mean
time and what participants said during the verbalisation.
5.4.1 Mean time based on verbalisation
In figure 15 we can see the spread of mean time per trial based on the variable
Verbal. The variable is divided in to two groups; trials which are played during verbalisation (1) and trials without verbalisation (2). This visualisations,
together with the ANOVA in table 6 how verbalisation effect the mean time.
What can be read from Figure 15 is that the trials in which the participants
verbalized have both a higher maximum and a lower minimum than for trials
32
in which there were no verbalisation. Overall, the trials without verbalization
had a lower mean time.
Figure 15. Mean time grouped by verbalisation condition
5.4.2 Variable interaction: Verbal and Order
In figure 16 we can see that participants finish the game faster in block 2 regardless of when they verbalize but the variation within the same group is
bigger when the group is verbalizing.
The mean time for all trials in block two is similar regardless if one starts
or end with verbalization. However, fastest time in each group is similar for
the trials in block 1 however the group that is verbalizing in the second block
reaches a faster time.
Order 1 & Verbal 1:
Order 2 & Verbal 1:
Order 1 & Verbal 2:
Order 2 & Verbal 2:
Block 1 trials for participants who verbalized in block 1.
Block 2 trials for participants who verbalized in block 2.
Block 2 trials for participants who verbalized in block 1.
Block 1 trials for participants who verbalized in block 2.
Table 16 shows the interaction between the order of the verblisation, the
verbalisation and how it effects the mean time per trial.
33
Figure 16. Interaction between Verbalisation and Order
5.5 Analyses of verbalisations
The notes taken during the experiment are divided on both verbalisation and
trend lines and give information about the amount of trains used at the same
time. This gives indications on a difference between how GridRail is played
depending on trend line condition.
Table 9 which is obtained from the notes indicates a difference in how participants play GridRail based on how many trains they use at the same time.
Reading the verbalisation in table 11 we can see that participants express
that they feel stressed in all four conditions.
5.5.1 Summation from table 11
Comments about the trend lines:
• Now I’m going to think more about to look at the collision lines, they are
hard to look for but very helpful
• Now I’m totally looking at the lines so much more than previously.
• I’m thinking that I should use the lines more but I’m stressed because I don’t
want to collide and get a better time
34
• I’m just thinking of the lines and where they intersect
• You see directly when the trains are going to collide and then you also try to
buy some time
• Its hard now to align (the trend lines) and when I have more trains it get
problems
Positive comments from participants with lines:
• And the nice thing with this strategy is that I save the fast train last
• Its nice when you dont have to think about all the things at the same time
• You get happy when you see the time because you can finish the game very
quickly.
• I’m getting closer to my best time so it makes me happy.
• This feels better timewise.
• That feels quite good timewise.
• Now I’m going to think more about to look at the collision lines, they are
hard to look for but very helpful.
Positive comments from participants without lines:
• that is awesome.
• Oh no thats amazing. Now its quite easy because I can move them all
straight forward.
• I think it was quite good how I did. Why not, it worked this time.
• This was quicker!
• That was good, I think Im improving. I stopped thinking
• This was pretty good
5.5.2 Breaking down numbers from notes
The data in table 9 is taken from table 10.
Table 9. Numbers from Notes.
Condition
Driving multiple trains
No lines VB 1
3
No lines VB 2
5
Lines VB 1
7
Lines VB 2
5
Plans for a long time
3
5
2
2
Continued on next page
5.5.3 Notes from experiment by trend line condition and
verbalisation order
35
Table 10. Notes.
Notes
No lines VB 1
Have difficulties driving the trains and focuses on rules
Driving with multiple trains but not suceeding in steering without colliding
First just one train and then almost in block 2 the participant changes to multiple trains
Thinks a lot before every trial but stills holds on to one tactic
One train at a time, does not run smoothly
Thinks a lot before every trial and reads instruction frequently
Many trains but collides a lot, changes tactic to driving 1-2 trains with lower speed
Drives slowly but still collide a lot
No lines VB 2
Fokus mostly on speeds rather than Lines
Uses one strategy, mostly drives with two trains
Thinks a lot before every trial, have different stratagies
Multiple trains Plans for a long while and try to mesure actual distance on the screen
Many trains at the same time. Thinks a lot before. Does different test within one trial
Many trains. tries to calculate distances and speed. Thinks a lot before each trial
Many trains but gets stuck in trying get trains in to one intersection. Many collisions
Starts with more than 3 trains, thinks a lot. Draws a plan on a papaer.
Lines VB 1
Uses multiple trains
New tactic in block 2
First one train at a time, understand the lines in the middle of block 2
New stratagies and drives multiple trains at the same time
Many trains at the same time. Thinks a lot before each trial.
First just one train at a time and then to multiple trains, tries to calculate speed
Many trains at the same time (sometimes more than 4) new stratagies
Starts with 1-2 trains and then drives 3 trains at the same time in block 2. New stratagies
Lines VB 2
Testing a lot without a special tactic
Uses more than 2 trains at the same time, changing tactics a lot
Many trains at the same time and new stratagies
More than 2 trains at the same time, slowly and does not speed up after aligning the lines
Many trains at the same time. Hardly any collisions. Forgets one train many times.
Drives few (mostly 1-2) train at the same time, thinks a lot before each trial
Many trains. Thinks a lot before each trial. Try to calculate distance on paper
Drives only 1-2 trains at the same time. Confuses direction of train
Continued on next page
36
5.5.4 Verbalisation transcript
Table 11. Verbalisation transcript.
ID
Verb
Lines and Verbalisation in 1 block
2
Why is it stopping. I keep forgetting.
2
Whoops thats going to crash. Thats not god.
5
Not fast enough for the controlls. Its hard to get it right.
5
I could definitey done that one better.
5
Its hard to control, I want to go with multiple trains and its hard to
5
control all the trains at the same time
5
This did not work out at all. I dont learn from my misstakes.
5
You should not be tired when you do this.
5
I want to make a better time but when I try it I keep on crashing.
12
Its hard to focus the game pulls you in and its hard to multitask
12
We are not really made for it. Its all about moving multiple trains at the same time.
12
I totally see this beging used in education.
12
Where there job consists of making the train move without any collision ahappen
14
I guess I failed.
14
I’m thinking to try have them moving at the same time and then just pass each other.
14
I had problems.
18
The average time is sinking but im trying to get it
18
Im starting to get tired. A little bit better but still bad strategy
18
im trying the âmeeting in two placesâ strategy but it does not get any better.
22
Im trying to take in consideration the length of the road and the speed of the train
22 I dont know how to do this because I dont know the exact distance. Its mostly my intuition
25
Im thinking that my goal is to just click the trains once and not have to stop them at all.
25
No Im going to think more about to look more at the collisions lines
25
they are hard to look for but very helpful.
Lines and Verbalisation in 2 block
4
And the nice thing with this strategy is that I save the fast train last
4
Its nice when you dont have to think about all the things at the same time
4
Its hard now to align. When I have more trains it get problems,
4
im trying to not do to many stuff simultaneously
4
because you forget easily if two trains collide
4
so im trying to do as easy as you want for yourself and at the same time.
4
It gets harder when I have the fastest and if..
4
I only have two trains to think about its a low risk.
4
You see directly when the trains are going to collide and..
4
then you also try to buy some time
7
Okey its bad
7
Okey I should not have done that. That did not work..
7
changing direction. Still a long waiting.
Continued on next page
37
ID
7
9
9
9
9
9
13
13
13
13
13
13
13
13
13
13
13
13
13
13
19
19
19
19
19
19
23
23
23
23
23
23
28
32
32
32
32
32
32
32
32
32
32
32
32
38
Verb
Maybe this was to much
It so hard to controll them all if you move many but you
still want to use many to get a lower time
This gets so stressfull if you have all the trains.
This was to much clicking.
Its always the same speed even though I think that I’am succeeding.
Sometimes its easy to forget that you need an empty spot..
in the parking so you can fit another train.
But once you’ve solved that its pretty okey.
Sometimes when I think that there were passangers in the trains
I think they would get sick because I speed it up and then slow it down.
Its so weird because its only a game but its stressfull
to see where the trains will can see collide.
You get happy when you see the time because you can finish the game very quickly.
Im getting closer to my best time so it makes me happy.
I want to train many trains simultanously but if they are to close and
something happens then you don’t have the time to stop them.
And they will collide with each other even though they travel at the same direction.
Its like every time a get a pretty fast time I get to excited
and then i over think so I forget one and make stupid mistakes.
That feels quite good timewise.
I don’t know how to speed up and get as many trains at the same
time as possible. This is going to be bad
This feels better timewise.
Im to stressed, Im doing stupid misstakes when I’m stressed.
Im getting annoyed with myself
but it might be a bad choice.
"Its not the worse decision I made
I always end up with the slowest one
so what I want to go for is the average speed meets the slowest one to meet
but it is really hard to make that really happen as smoot as possible,
Dident make much of a differencee
Now I’m only thinking move faster, go faster and Im already thinking about my next trial.
Now I’m totally looking at the lines so much more than previuously.
I’m thinking of my time constantly because I want to be better.
I’m thinking that I should use the lines more
but Im stressed because I don’t want to collide and get a better time
Im just thinking of the lines and where they intersect,
even thought my time is not the best I like it more
because they give me more control over the situation and I like that.
I was to fast in my mind.
Im going to go with more trains and see what happends.
Its easier if only two trains are going against each other
I know that I have to change strategy but i dont know how.
Everything was going so good and then I forgot everything.
Continued on next page
ID
1
1
1
1
1
1
1
1
1
1
1
6
6
6
6
6
6
6
6
6
6
10
10
10
10
10
15
15
15
15
15
15
15
15
20
21
21
27
27
27
31
31
31
31
Verb
No Lines and Verbalisation in 1 block
that is awesome.
Wow it did not go , it sucks, I understand.
This one was a tough one
I dont know how to play this. Oh it worked, lets do this stupid tacktic again. Will it crash?
Oh no thats amazing. Now its quite easy because I can move them all straight forward.
It was not easy but it was not to difficult but I would say more difficult than easy.
I think it was quite good how I did. Why not, it worked this time. How is that possible?
I am doing to many things at the same time.
That was stupid.
What am I doing.
I tried a new way to do it but I just got lost. I think its easier to do it as I did before
Its difficult.
This was quicker.
Okey, this one was a bit quicker.
Bad. This is messy.
This is worse
Its getting quite stressful. Okey I forgot about the white one.
Not to bad.
This is not good timing Okey. It was also really quick even if it was not good
No now its messy,
I was trying to increase the train but it did not do much for the timing.
Ill try this again because I did some errors
Im thinking if i can improve it somehow. Because now I just do trial and error.
I don’t know if this is the best strategy but sometimes..
you just have to pick something and optimise it
Its easy to solve but hard to master.
Okey so this did not work out to well.
This did not go to well. You need to have a sense for multitasking for this to work.
It still takes over a minute.
The hardest part is that you want all trains to move simultaneously
because you want the fastest time but then its hard to control all the trains.
I’m feeling a bit frustrated. It feels as if it should not be that hard.
Okey so that seems as a good way of doing this.
Its hard to avoid collision.
I think I have messed it up already.
This is not fun
Its not har to play but its hard to understand the rules
Oh no they would collide.
The red and the white train can meet if you make them to max speed
but the hard part is not make them crash and also turn on the other trains.
This game is quit hard I think.
efore I controlled the speed of all trains and
its a little bit confusing so if I control one and the other are in between.
It did not work out at all. I am actually a bit confused.
Continued on next page
39
ID
3
3
3
3
3
3
8
8
8
11
16
16
16
16
16
17
17
17
17
17
17
17
17
17
17
30
30
40
Verb
No Lines and Verbalisation in 2 block
I thought wrong. Im not sure what to do, maybe put the grey in the position
no then they would crash.
That was a failed attempt.
No that wont work
Maybe I should try something else.
And now we have a crash. I think I know what to do now, I just need to do it precises.
So I just tried to do three train at the same time but I failed.
Im trying to use many trains at the same time but it does not work, lets try it again.
So now I tried to do three trains at the same tie but failed. Im trying this new game
It was very strange that it did not really do what I thought it will do.
Its really fun, It was not as easy as I thought
That was good, I think Im improving. I stopped thinking
I just keep getting worse.
This was pretty good
Im feeling that I’m to smart but i cant play this
I dont really know the logic behind this strategy
but trial and error has shown that this strategy is effective.
Im starting to think that I should do something else but
I don’t now how to do it more efficient.
I tried to make the crossing with all the trains at the same time
but its really hard and apparently I also forgot one train
I think that if I do this propperly my time will be better than my record.
I dont have any better ideas unfortunately.
I feel kind of stuck and that I dont do progress anymore.
Maybe I could do a quicker execution.
No, that was bad really bad.
This was a bad idea they collide and lets try something else
Continued on next page
6. Discussion
6.1 Analysis of results
To answer our research questions if the trend lines in GridRail helping the
users and if the participantâs descriptions of how they solved the task give input to whether the trend lines are helpful we will go through the results.
In section 5.1 the learning curves show that there is no significant difference on how the participans mean time decreases between the conditions with
or without trend lines.
Looking at the boxplot in section 5.2.1 and the Analysis of variance for
mean time per trial the main effect of trend lines is not significant in the
mixed-ANOVA but the variable is included in a significant three-interaction
effect between trials, lines and in which block the participants verbalised.
We can see a difference between the mean time it took for each participants
to think before starting a new trial (see boxplot 5.2.2). Participants with trend
lines did not take as long pauses between the trials. This might be because
of various reasons and in part 3.1 we can read about the line of necessary decisions that have to be made in order to reach a goal and that the cognitive
workload is lower after repeating the same task multiple times because it is
connected to the short term memory. An idea is that participants with trend
lines stick to one or a few strategies, repeating them, and therefore do not have
to go back to the process of new decisions and re-learning. Additionally, because of the fact that the mean time of the trials does not differ in the same way
as the planning time and the participants with trend lines do not re-plan their
action in the same amount it could indicate that they are satisfied with their
strategy. Meaning that the participants align the trend lines so there would not
occur any collisions and then wait for the trial to finish.
On the subject of different playing strategics the figures 14 and 12 give an
indication of a difference of the amount of collision incidents per trial between
participants with and without trend lines. When looking at the mixed-ANOVA
for all the trials and conditions there were no significant results. However, the
two-way ANOVA was constructed for trials where verbalisation was done in
block 2. The two-way ANOVA showed an effect of trend lines which is statistically significant for participants who verbalise in block 2.
41
One could argue that keeping an eye on the trend lines require the same
mental resources as actually planning and driving the trains and that the mean
time results proves it. (more about cognitive workload can be read in section
3.1). One could also argue that these experiments have given more legitimacy
to earlier experiments, adding transparency and depth for further discussions,
where the hypothesis about trend lines being helpful was falsified.
Yet, as mentioned above, there is a difference between how long each participant wait and think before each trial.Participants with trend lines are more
satisfied with their option. While participants without lines might not remember entirely how they arranged the route when they don’t see the intersetion
lines through the whole trial; making them try new ideas.
Another thinkable scenario, could be that the participants without lines are
exposed to higher cognitive workload resulting in a high cognitive fatigue and
longer resting and planning periods in between trials. (Read more about cognitive fatigue in section 3.1).
The trend lines seem to make the participants play the game differently,
even though resulting in similar mean time. An observed difference is also
seen in the amount of trains driven at the same time.Participants with trend
lines use multiple trains at the same time more often as seen in table 9.
So to answer the whether the trend lines in GridRail are helping the users
we can look at what the participants said and as seen in subsection 5.5; in the
few cases where the participants mention the trend lines the majority is a positive opinion (5 out of 6). We can also see that the trend lines are helpful when
it comes to avoiding collisions, as mentioned above.
Based on the statistical analysis in table 6 we found that verbalisation gives
an significant main effect of how well participants perform. In figure 15 we
saw that the participants which started verbalising in the first 20 trials had a
higher mean time than participants that did not verbalise in the first 20 trials. However, the improvement between block 1 and block 2 was significantly
higher for the group who did not verbalise in block 2 and hence, in the last
20 trials the mean time evened out; both groups ended with almost the same
mean time.
The participant’s descriptions of how they solved the task have given an
input on whether the trend lines are helpful. However, the participant’s descriptions have not given more insight in to previous experiments made on
GridRail; comments about trend lines where not many.
42
6.2 Discussion of methods used
One of the goals with the experiment was to get new insight and shed light on
previous experiments. However, these experiments have instead invoked new
questions about trend lines.
The instructions given to the participants told them to drive the trains to the
opposite stations, without collision, as fast as possible. Looking at the time;
the mean time data showed that the lines did not give any significant effect.
However, when looking at the videos of the trials we can see a difference in
behaviour. This was tried to be highlighted with data of amount of collisions,
how long it takes for a participant to plan before each trial and how many trains
the participants drive at the same time; data that give a hint to a difference in
behaviour but does not statistically prove the differences. Observed in the both
the comments from the participants, and form the notes taken during the trials was that some participants with Lines did not use them in the beginning
of the game. After noticing or understanding the trend lines the participants
expressed positive emotions.
Excluding trend lines based on mean time per trial before making an experiment based on the behaviour differences would therefore not give the entire
picture. Further investigation are needed on the difference in the actual planning of routes before we exclude that trend lines does not help the users.This,
because the aim of driving the trains for both the controller and the participant
using the microworld is to smoothly plan and execute new routes, with multiple trains. Measuring the time should therefore just be one of many factors.
6.3 Conclusion
The purpose of this study was to see how the game play was affected by absence or presence of trend lines. A between study that was conducted; we
made 32 participants use an already developed microworld called GridRail a
game with the purpose of driving 6 trains to opposit stations without collisions
as fast as possible.
There is no significant difference on how fast the participans fnished each
trial between the conditions with or without trend lines. However, comments
in verbalisations and notes of how many trains each participant drives at the
same time are an indication that the lines do affect how participants play, even
though the finish time is not affected.
43
6.4 Future research
6.4.1 GridRail
GridRail should be made easier without any playing exceptions. For example;
it should be made possible to enter an intersection even if there is one paused
train on it. So the train should not stop until it actually bumps in to the other
train.
Furthermore; the trend lines should also include when trains from the same
station will collide with each other (now the lines only show when you will
bump in to a train from the opposite station) and circles which indicate how
close two trains can be without colliding in to each other should be visible all
the time (now only the klicked/marked trains circle is showing).
These changes would make it easier for the players to focus on the actual
task with driving the trains to opposite stations without collision and planning
their routes. The situation caused by the different rules and exceptions make
it difficult for the players, they tend to focus on what is allowed and why the
trains sometimes fit in to the intersetion and sometimes not. When the lines
are not showing when a train will collide with a train from the same station
and participants with trend lines have more trains moving at the same time it
could be an explanation to why participants with lines have almost the same
collision rate.
Making a solution of withdrawing the amount of collisions direct from the
database would give more reliable results than obtaining numbers from counting collisions from the trial videos. Human error can give wrong numbers.
6.4.2 Experiment
Another useful change for future experiments is to verbalise during the whole
session, not just one block. Participants who started to verbalise during the
first block continued subconsciously during the second block - which could
have cause a longer mean time in the second block for that group. Also, because we obtained a significant impact of verbalisation.
Another useful change would be to have a collision counter visible after
each trial so both participants with and without lines aim for driving the trains
smoothly while driving multiple trains.
Observations made on how often players change tactics could be useful in
understanding the difference in the cognitive workload and investigate further
if cognitive fatigue is more common among participants without trend lines.
44
7. References
Andersson.AW, Jansson.A, Sandblad.B, Tschirner.S. Recognizing
Complexity: Visualization for Skilled Professionals in Complex Work
Situations. Uppsala University. Department of Information
Technology. 2013.
Andersson.AW, , Sandblad.B, Tschirner.S, Jansson.A. Framtida tagtrafikstyrning ,Sammanfattande forskningsrapport, Slutrapport fran FOT-projektet.
Uppsala University. Department of Information Technology. 2015
Axelsson.A. Experiment write-up. Division of Visual Information and
Interaction. Uppsala University. 2016.
Bainbridge. Verbal reports as evidence of the process operator’s knowledge.
International Journal of Human-Computer Studies, 51(2). 1999.
Brehmer.B. Dynamic decision making: human control of complex systems.
Acta Psychol.81. 1992.
Brehmer.B Dorner.D. Experiments with computer-simulated microworlds:
Escaping both the narrow straits of the laboratory and the deep blue sea of
the filed study. Comput. Hum. Behav. 9. 1993.
Caglarca.S, Investigating the effects of trends in an interface to a
dynamicsystem. diploma thesis, Uppsala University. Department of
Information Technology. 2015.
Edwards, W. Dynamic decision theory and probabilistic information processing. Human Factors, 4. 1963.
Ericsson.K.A and Simon.H.A. Protocol analysis, revised edition.
MIT Press. 1993.
Gulliksen.J , Goransson.B, Boivie.I Blomkvist.S, Persson.J and Cajander.A
Key principles for user-centred systems design. Behaviour Information
Technology. November-December 2013.
Hanington.B, Martin.B. Universal Methods of Design: 100 Ways to Research
Complex Problems, Develop Innovative Ideas, and Design Effective
Solutions. Rockport Publishers. 2012.
Krug.S.Don’t make me think - A Common Sense Approach to Web Usability.
New Riders Publishing, Berkley, California, 2 edition. 2006.
Kuniavsky.M. Observing the User Experience, volume 2. Elsevier Inc.
Rockport Publishers. 2012.
Logie.R.H. The functional organization and capacity limits of working
memory. Current Directions in Psychological Science, 20. 2011.
Papert.S. Computer-based microworlds as incubators for powerful ideas. The
computer in the school: Tutor, tool, tutee. New York: Teacherâs College
45
Press. 1980.
Sandblad.B, Andersson.A.W, Kauppi.A Isaksson-Lutteman.G. Human-Computer
Interaction, Dept of Information Technology, Uppsala University, Sweden
Tschirner.S. The gmoc model - supporting development of systems for human
control. Diploma thesis, Uppsala University. Department of Information
Technology. 2015.
46
Appendix1
Instructions
Goal:Drivetrainsfromonestationtotheotherwithoutcollision
Time:Approximately60minutes
Break:5minbetweensessions
Welcome!
Youwillplayagamewiththeaimofdrivingtrainstoanoppositestationwithoutcollision
youwillget20trialspersession.
Lineexample
Atooltoyoucanusetoavoidcollisionswiththetrainsisavailableintheformofalinewhich
givesapredictionofwherethetrainswillmeet.
Thelinesofthethreetrains(purple,whiteandgrey)showthatthetrainswillintersectata
pointwherethereisonlyonetraintrack=collision.
Here,thelinesofthethreetrains(purpleandgrey)showthatthetrainswillintersectata
pointwherethereexistsameetingpoint=nocollision.
Interfaceandcontrol
Beneaththetrainrailsyouwillfindacontrolpanelforselectionoftrainandspeed.The
circlesrepresenteachtraincorrelatingtothetrain’scolourandthespeedgoesfrom
backwards(dragbartotheleft),standstill(middle)andforward(dragbartotheright).
Trials
Goal:Movethethreegreyscaletrainstotheleftstationandthecolouredtrainstotheright
station.
Totalamountoftrials:40trialsintotaldividedintotwosessionsof20trials.
Trialtime:Thetimerstartsonceyouselectthefirsttrain.Thetimecountwillgiveyouahint
ofwhichsolutionismostoptimum.Itisshownafterthetrial.
Afteronetrial:Youwillbegivenapossibilitytostartanewgamebyclickingonthestart
button.
Ifyouhaveanyquestions,pleaseaskthetestleadernow.
Goodluck!
Appendix2
Instructions
Goal:Drivetrainsfromonestationtotheotherwithoutcollision
Time:Approximately60minutes
Break:5minbetweensessions
Welcome!
Youwillplayagamewiththeaimofdrivingtrainstoanoppositestationwithoutcollision
youwillget20trialspersession.
Collisions
Toavoidcollisions,youhavetobeawareoftheintersections.
Apointwherethereisonlyonetraintrack=collision.
Apointwherethereexistsameetingpoint=nocollision.
Interfaceandcontrol
Beneaththetrainrailsyouwillfindacontrolpanelforselectionoftrainandspeed.The
circlesrepresenteachtraincorrelatingtothetrain’scolourandthespeedgoesfrom
backwards(dragbartotheleft),standstill(middle)andforward(dragbartotheright).
Pleasenote:Youcannotclickdirectlyonatraintoselectit.
Trials
Goal:Movethethreegreyscaletrainstotheleftstationandthecolouredtrainstotheright
station.
Totalamountoftrials:40trialsintotaldividedintotwosessionsof20trials.
Trialtime:Thetimerstartsonceyouselectthefirsttrain.Thetimecountwillgiveyouahint
ofwhichsolutionismostoptimum.Itisshownafterthetrial.
Afteronetrial:Youwillbegivenapossibilitytostartanewgamebyclickingonthestart
button.
Ifyouhaveanyquestions,pleaseaskthetestleadernow.
Goodluck!