Rolf Molich - DialogDesign

Take a web-site
Take N professional usability teams
Let each team usability test the web-site
Are the results similar?
Practical Results from
Large-Scale
Web Usability Testing
Rolf Molich
DialogDesign
Download Test Reports and Slides
http://www.dialogdesign.dk/cue2.htm
Slides in Microsoft PowerPoint 97 and Adobe Acrobat format
This slide will reappear at the end of this presentation
How It All Started...
A recent survey shows that

80% of all Danish drivers think
that their driving skills
are above average.
How It All Started...
A recent survey shows that

80% of all Danish drivers think
that their driving skills
are above average.
How about usability testers?
How It All Started...

Too much emphasis on one-way
mirrors and scan converters

Little knowledge of REAL
usability testing procedures mainly beautified descriptions

”Who checks the checker?”
”Who Checks the Checker?”

When did YOU last have an
objective check of your
usability testing skills?

Who would you trust as an
evaluator of your usability
testing skills?
Comparative Evaluations
Test
End
Test object
1
Oct 97
9 Danish web-sites
50
0
2
Oct 98
9 Danish web-sites
50
0
3
Dec 97 CUE-1: Win Calendar Progr.
0
4
4
Dec 98 CUE-2: www.hotmail.com
2
7
All results point in the same direction.
Student Professional
teams
teams
Student Tests

Introductory couse in Human-Computer Interaction at
the Technical University of Copenhagen

Two courses, 120 students per course

Fifty teams of one to three students

2 x 9 Danish web-sites tested by four to nine teams
with at least four test participants

Quality of Usability tests and reports is acceptable
considering that most teams used 20-50 hours
www.bokus.com - Bookstore
Buttons in lower right corner:

Empty shopping basket

Change order

Continue shopping

Go on with your purchase
Conclusions
-

Inhuman treatment of users on
many e-commerce web-sites

On-site searching seldom
works. Users are better off
without on-site searching

Many web-sites focus on the
company, not the user
Conclusions
+

Nice layout and graphics

Good response time

Give correct results
Problem Example
User task:

You want to take your business to BG Bank.
Make an appointment with the bank

Hard to find in menu structure

Users entered ”appointment” as keyword for
Search
How to Improve Search

Provide human error messages
(constructive)

Recommend index, site-map

Special handling of frequent keywords

Show user search keywords in context
CUE-1
Comparative Usability Evaluation 1

Four professional teams usability
tested the same Windows calendar
program

Two US teams (Sun, Rockwell),
one English (NPL) and
one Irish (HFRG, Univ. Cork)

Results published in a panel and a
paper at UPA98

Main conclusions similar to CUE-2
CUE-2
Comparative Usability Evaluation 2

Nine teams have usability tested
the same web-site
– Seven professional teams
– Two student teams

Four European, five US teams

Test web-site: www.hotmail.com
Purposes of CUE-2

Investigate the reproducibility of
usability test results

Survey the state-of-the art within
professional usability testing of
web-sites.
NON Purposes of CUE-2

To pick a winner

To make a profit
Usability Test Procedure

Web-site address (www.hotmail.com)
disclosed at start of three week test period.

Client scenario
(written by Erika Kindlund and Meeta Arcuri)

Access to client through intermediary

Three weeks to carry out test using
standard approach

Deliver anonymized usability test report
(Erika Kindlund)
Problems Found
CUE-1
CUE-2

Total number of problems
141
300

Found by seven teams
six teams
five teams
four teams
three teams
two teams
Found only by one team
1
1
11
128 (91%)
1
1
4
4
15
49
226 (75%)






CUE-2 Credits

Barbara Karyukina, SGI (USA)

Klaus Kaasgaard & Ann D. Thomsen, KMD (Denmark)

Lars Schmidt and others, Networkers (Denmark)

Meghan Ede and others, Sun Microsystems, Inc., (USA)

Wilma van Oel, P5 (The Netherlands)

Meeta Arcuri, Hotmail, Microsoft Corp. (USA) (Customer)

Rolf Molich, DialogDesign (Denmark)
(Coordinator)
CUE-2 Credits

Joseph Seeley, NovaNET Learning Inc. (USA)

Kent Norman, University of Maryland (USA)

Torben Norgaard Rasmussen and others,
Technical University of Denmark

Marji Schumann and others,
Southern Polytechnic State University (USA)
Resources
Team
Person hours
used for test
A
B
136 123
C
D
E
84 (16) 130
F
G
50 107
H
J
45 218
# Usability
professionals
2
1
1
1
3
1
1
3
6
Number of tests
7
6
6
50
9
5
11
4
6
Usability Test Reports
Team
A
B
C
D
E
F
G
H
J
16
36
10
5
36
19
18
11
22
Exec summary
Y
Y
N
N
N
Y
N
Y
Y
# Screen shots
10
0
8
0
1
2
1
2
0
Severity scale
2
3
2
1
2
1
1
3
4
# Pages
Usability Results
Team
A
B
C
D
E
F
G
H
J
# Positive findings
0
8
4
7
24
25
14
4
6
# Problems
26 150
17
10
58
75
30
18
20
% Exclusive
42
24
10
57
51
33
56
60
71
Usability Results
Team
B
C
D
E
F
G
H
J
# Problems
26 150
17
10
58
75
30
18
20
% Core problems
(100%=26)
38
73
35
8
58
54
50
27
31
136 123
84
Person hours
used for test
A
NA 130
50 107
45 218
Results from All Four Studies

There are overwhelmingly many usability
problems.

Many of them are serious.

Limited overlap between team findings.
Conclusions

In most cases, no form of cost-effective
testing will find all or most of the problems or even most of the serious ones

Claims like
”Method x finds at least 80% of all serious
usability problems”
are not in accordance with the results of
this study
Problems Found in CUE-2








Total number of different
usability problems found
300
Found by seven teams
six teams
five teams
four teams
three teams
two teams
Found only by one team
1
1
4
4
15
49
226
Problem Found by Seven Teams
During the registration process
Hotmail users are asked to
provide a password hint
question. The corresponding text
box must be filled.
Most users did not understand
the meaning of the password
hint question. Some entered
their Hotmail password in the
Hint Question text box.
Clever but unusual mechanisms
like the password hint question
must be explained carefully to
users.
Language Related Problems
Examples of language related problems that
were found by European teams
Language Related Problems
Examples of language related problems that
were detected by European teams

Send Mail: Term "Compose" difficult to
understand. Use "Create new message" or
"Write Mail” (5/9)

Create new account: "State/Province"
textbox is required but does not make
sense in many countries (2/9)
Language Related False Problems
Some language related problems
suggested by an US team were
not confirmed by European test teams

Change "last name" to "family name”

Meaning of "U.S. Residents only" and
"Non-U.S. Residents Only" is unclear
Advice for a Usable Usability Report




List problems with severity, #users
Provide short executive summary
Keep it short
Distinguish clearly between
–
–
–
–
Personal opinions,
Expert opinions,
User opinions,
User findings
Some State-of-the-Art Boundaries

No power user test,
although four teams also recruited power users

Few tests that require complicated setup.
Examples: Attachments; boundary testing, e.g.
large number of e-mails in in-box

Teams completed their usability tests within
schedule, but they hesitated to compared their
results to those from the other teams
Conclusions

The total number of usability problems for
each tested web-site is huge,
much larger than you can hope to find in
one series of usability tests

Usability testing techniques can be
improved

We need more awareness of the
usability of usability work
Download Test Reports and Slides
http://www.dialogdesign.dk/cue2.htm
Slides in Microsoft PowerPoint 97 format
CUE-2 Panel:
Tuesday at 4.30 p.m.