Concurrency Checking with CHESS:
Learning from Experience
Tom Ball, Sebastian Burckhardt, Chris Dern,
Madan Musuvathi, Shaz Qadeer
Outline
• What is CHESS?
– a testing tool, plus
– a test methodology (concurrency unit tests)
– a platform for research and teaching
• Chess design decisions
• Learnings from CHESS user forum, champions
What is CHESS?
• CHESS is a user-mode scheduler
• Controls all scheduling nondeterminism
– “Hijacks” scheduling control from the OS
• Guarantees:
– Every run takes a different thread schedule
– Reproduce the schedule for every run
Concurrency Unit Tests
“Generally, in our test environment,
we want to test what we call
scenarios. A scenario might be a
specific feature or API usage. In my
case I am trying to test the scenario
of a user canceling a command
execution on a different thread.”
Steve Hale, Microsoft
A Concurrency Unit Test Pattern:
Fork-Join
void ForkJoinTest() {
var t1 = new Thread(() => { S1 });
var t2 = new Thread(() => { S2 });
t1.Start(); t2.Start();
t1.Join(); t2.Join();
Debug.Assert(...);
}
Concurrency Unit Tests
• Small scope hypothesis
– For most bugs, there exists a short-running
scenario with only a few threads that can find it
• Unit tests provide
– Better coverage of schedules
– Easier debugging, regression, etc.
CHESS as Research/Teaching Platform
http://research.microsoft.com/chess/
• Source code release
– chesstool.codeplex.com
• Courseware with CHESS
– Practical Parallel and Concurrent
Programming
– coming this fall!
• Preemption bounding [PLDI07]
• Store buffer simulation
[CAV08]
• Preemption sealing
[TACAS10]
– orthogonal to preemption
bounding
– where (not) to search for bugs
– speed search for bugs
– simple counterexamples
• Best-first search [PPoPP10]
• Automatic linearizability
• Fair stateless exploration [PLDI08]
checking [PLDI10]
– scales to large programs
• More features
• Architecture [OSDI08]
– Data race detection
– Tasks and SyncVars
– API wrappers
– Partial order reduction
– More monitors…
CHESS Design Decisions
•
•
•
•
Stateless state space exploration
No change to underlying scheduler
Ability to enumerate all/only feasible schedules
Schedule points = synchronization points and use
race detection to make up the difference
• Serialize concurrent behavior
• Suite of search/reduction strategies
– preemption bounding, sealing
– best-first search
• Monitor API to easily add new checking capability
Stateless model checking [Verisoft]
Given a program with an acyclic state space
Systematically enumerate all paths
Don’t capture program states
Not necessary for termination
Precisely capturing states is hard and expensive
At the cost of potentially revisiting states
Partial-order reduction alleviates redundant exploration
CHESS architecture
Unmanaged
Program
Win32
Wrappers
Windows
CHESS
Exploration
Engine
CHESS
Scheduler
Managed
Program
.NET
Wrappers
CLR
• Capture scheduling nondeterminism
• Drive the program along an interleaving of choice
Running Example
Thread 1
Lock (l);
bal += x;
Unlock(l);
Thread 2
Lock (l);
t = bal;
Unlock(l);
Lock (l);
bal = t - y;
Unlock(l);
Introduce Schedule() points
Thread 1
Schedule();
Lock (l);
bal += x;
Schedule();
Unlock(l);
Thread 2
Schedule();
Lock (l);
t = bal;
Schedule();
Unlock(l);
Schedule();
Lock (l);
bal = t - y;
Schedule();
Unlock(l);
Instrument calls to the
CHESS scheduler
Each call is a potential
preemption point
First-cut solution: Random sleeps
Thread 1
Sleep(rand());
Lock (l);
bal += x;
Sleep(rand());
Unlock(l);
Thread 2
Sleep(rand());
Lock (l);
t = bal;
Sleep(rand());
Unlock(l);
Sleep(rand());
Lock (l);
bal = t - y;
Sleep(rand());
Unlock(l);
Introduce random sleep at
schedule points
Does not introduce new
behaviors
Sleep models a possible
preemption at each location
Sleeping for a finite amount
guarantees starvation-freedom
Improvement 1:
Capture the “happens-before” graph
Thread 1
Schedule();
Lock (l);
bal += x;Sleep(5)
Schedule();
Unlock(l);
Thread 2
Schedule();
Schedule();
Lock (l);
(l);
Lock
bal;
tt == bal;
Schedule();
Schedule();
Unlock(l);
Unlock(l);
Schedule();
Schedule();
Lock (l);
Lock
bal = (l);
t - y;
bal
=
t
- y;Sleep(5)
Schedule();
Schedule();
Unlock(l);
Unlock(l);
Delays that result in the
same “happens-before”
graph are equivalent
Avoid exploring equivalent
interleavings
Improvement 2:
Understand synchronization semantics
Thread 1
Schedule();
Lock (l);
bal += x;
Schedule();
Unlock(l);
Thread 2
Schedule();
Schedule();
Lock (l);
(l);
Lock
bal;
tt == bal;
Schedule();
Unlock(l);
Schedule();
Unlock(l);
Schedule();
Lock (l);
Schedule();
bal = t - y;
Lock (l);
Schedule();
bal = t - y;
Unlock(l);
Schedule();
Unlock(l);
Avoid exploring delays that
are impossible
Identify when threads can
make progress
CHESS maintains a run
queue and a wait queue
Mimics OS scheduler state
Emulate execution on a uniprocessor
Thread 1
Thread 2
Schedule();
Lock (l);
t = bal;
Schedule();
Unlock(l);
Schedule();
Lock (l);
bal += x;
Schedule();
Unlock(l);
Schedule();
Lock (l);
bal = t - y;
Schedule();
Unlock(l);
Enable only one thread at a
time
Linearizes a partial-order
into a total-order
Controls the order of dataraces
CHESS modes: speed vs coverage
Fast-mode
Introduce schedule points before synchronizations,
volatile accesses, and interlocked operations
Finds many bugs in practice
Data-race mode
Repeat
Find data races
Introduce schedule points before racing memory accesses
Captures all sequentially consistent (SC) executions
Capture all sources of nondeterminism?
No.
Scheduling nondeterminism? Yes
Timing nondeterminism? Yes
Controls when and in what order the timers fire
Nondeterministic system calls? Mostly
CHESS uses precise abstractions for many system calls
Input nondeterminism? No
Rely on users to provide inputs
Program inputs, files read, packets received,…
Good tradeoff in the short term
But can’t find race-conditions on error handling code
CHESS architecture
Unmanaged
Program
Win32
Wrappers
Windows
CHESS
Exploration
Engine
CHESS
Scheduler
Managed
Program
.NET
Wrappers
CLR
CHESS wrappers
Translate Win32/.NET synchronizations
Into CHESS scheduler abstractions
Tasks : schedulable entities
Threads, threadpool work items, async. callbacks, timer functions
SyncVars : resources used by tasks
Generate happens-before edges during execution
Executable specification for complex APIs
Most time consuming and error-prone part of CHESS
Enables CHESS to handle multiple platforms
Learning from Experience:
User forum, Champions
http://msdn.microsoft.com/en-us/devlabs/cc950526.aspx
http://social.msdn.microsoft.com/Forums/en-US/chess/threads/
“CHESS Doesn’t Scale”
Hmm… we just ran CHESS on the Singularity
operating system (and found bugs in the
bootup/shutdown sequence)
What they usually mean:
“CHESS isn’t very effective on a long-running test”
“There are a lot of possible schedules!”
Time for enumerative model checking
(Time to execute one test) x (# schedules)
Find lots of bugs with 2 preemptions
Program
Lines of code
Bugs
Work Stealing Q
4K
4
CDS
6K
1
CCR
9K
3
ConcRT
16K
4
Dryad
18K
7
APE
19K
4
STM
20K
2
TPL
24K
9
PLINQ
24K
1
Singularity
175K
2
37 (total)
“CHESS Isn’t Push Button”
“The more I look at CHESS the more
I realize that I could use some
general guidance on how to author
test code that will actually help
CHESS reveal concurrency bugs.”
Daniel Stolt
Challenge -> Opportunity:
New “Push button” concurrency tools
Cuzz [ASPLOS 2010]: Concurrency Fuzzing
Attach to any running executable
Find concurrency bugs faster through smart fuzzing
Lineup [PLDI 2010]: Automatic Linearizability Checking
Generate “thread-safety” tests for a class automatically
Use sequential behavior as oracle for concurrent behavior
CHESS underneath
“CHESS Doesn’t Find This Bug”
void ForkJoinTest() {
int x = 0;
var t1 = new Thread(() => { x=x+1; });
var t2 = new Thread(() => { x=x+1; });
t1.Start(); t2.Start();
t1.Join(); t2.Join();
Debug.Assert(x==2);
}
RTFM is not helpful
Instead, generate helpful warning messages
“Warning: running CHESS without race detection can miss bugs”
Or, turn race detection on for a few executions.
“CHESS Can’t Avoid Finding Bugs”
“Solution is working and
found two bug with CHESS .
To get the second bug, I
had to fix first bug first”
“That liveness bug is such a
minor performance
problem that I won’t fix it.”
Playing CHESS with George
Sealed
Methods
Asserts
Timeouts
Livelocks
Deadlocks
Leaks
Pass
5
3
40
0
0
5
+TryDequeue
6
5
0
1
1
40
+WaitForTask
5
5
0
2
1
40
+Reg.Recv.
+PostInternal
5
5
0
0
0
43
“CHESS is Confusing Me”
The Nondeterminism Saga: static data,
lazily initialized
If replay of p.E fails, yielding
p.F, then try again and see
if p.F replays
p
E
F
Report lost coverage
Nondeterminism Junkie:
Too much information
“Why does this test pass
instead of say ‘Detected
nondeterminism’ outside
the control of CHESS"?
“Is this good behavior for
CHESS to return three
different results for the
same code?”
“CHESS Time Isn’t Real Time”:
It’s a feature, not a bug.
“The call to WaitOne(60000, false) immediately
returns false, which isn’t correct. If I use
WaitOne() or WaitOne(Timeout.Infinite, false)
instead of WaitOne(60000, false), the
WaitHandle waits till the Event is set, returns
true and everything goes fine. But waiting
without a timeout isn't an option in my case.”
The expected: “I can’t play CHESS on”
x64
Multi-process programs
Message passing, distributed systems
The Boost library
.NET without the CLR Profiler
Java
Unix
…
Learning from Experience:
Forums, Champions
Chris Dern, Steve Hale,
Ram Natarajan, Roy Tan
“Congratulations CHESS team!!!!! I have proven
outside of CHESS that the issue it is finding in our
product on the 106th thread schedule looks like a valid
product bug!!
I wrote a quick application to launch my CHESS test
outside of CHESS and by freezing/thawing threads I
was able to reproduce the issue independently. This is
incredibly exciting!!! Many thanks for your patience,
perseverance, and CHESS bug fixes as I’ve struggled to
understand CHESS.”
Steve Hale, Microsoft , 2/12/2009
PLINQ
Parallel.For
TaskScheduler
Task
BlockingCollection
ConcurrentBag
ConcurrentDictionary
Barrier
SemaphoreSlim
ManualResetEventSlim
“As the true value of a test is in its ability to find bugs,
let’s take a look at how our CHESS tests did. Over the
development cycle to date, the CHESS test found seven
bugs, and was used to reproduce another seven for a
total of 14, out of the 276 high priority bugs over the
same time. While only 14 bugs against 276 appear
sadly anemic, it’s important to dig a bit deeper. If we
address each of the issues raised, would we find more
bugs?”
Chris Dern, PFX_CHESS_Review_Final.docx
“Early on the adoption of CHESS, we made a fatal
mistake. Perhaps it was wishful thinking on our part, or
perhaps we believed too much in the marketing hype
and didn’t read the fine print. We believed early on that
CHESS was a turnkey solution capable of using existing
tests and test approaches and ‘finding the bugs’. “
C. Dern
“The schedule for any product group is always under
attack. Over the life cycle of a product, features are in
constant flux, with managers always balancing risk and
reward. In the face of this pressure, any untried tool,
methodology, or approach faces an uphill battle.”
C. Dern
“For tool developers, it’s important that once you
engage with a customer you help find then drive to
some level of success. Finding a single bug is a priceless
commodity when arguing to continue the time
investment in a specific tool. Take small bites, set
modest goals and drive to success. Perfect is the enemy
of good, or at least good enough right now.”
C. Dern
Dern’s DO’s and DON’Ts
DO NOT expect that CHESS will ‘magically’ find
your bugs. CHESS is a tool, mainly focused at
enumerating schedules for a given bound. While it can
find specific types of concurrency bugs, e.g. deadlocks,
for ‘free’ the value and benefit of CHESS comes with
deliberate tests.
DO develop an
understanding of what
properties, invariants, and
behaviors your test is testing
DO run your tests. While this
may seem a silly tip, but it’s
important to remember that
CHESS enables the familiar write,
run, refactor test experience for
concurrent tests, which we enjoy
with sequential tests today.
DO NOT add artificial spinning/busy work in the
test. CHESS will explore all schedules for your specified
bound. Adding busy work, like you may find in a ‘stress’
test to increase coverage, only increases the test
runtime when under CHESS.
AVOID blindly converting an existing ‘stress’ style
unit test into a CHESS test. The size, scale, and
assertions that one tends to find in those types of tests
make for a weak CHESS test at best, or a unusable
CHESS test at worst.
Stepping Back from the Fray: High-level
Learnings
Proper expectation setting
Good methodology
Good default behavior
Good warnings and messages
Minimize cognitive dissonance
Cultivate champions
Listen to them and learn!
Three CHESS Learnings
1. If you want
deterministic scheduling
with ability to explore all
schedules
without changing the
underlying scheduler
Then its hard to achieve
high API coverage
robustness
Action: we need observable
and controllable schedulers!
2. Concurrency unit testing
can be effective, but
requires careful planning and
scoping
3. Search/reduction strategies
are absolutely essential
Uplifting Message and
Blatant Advertisement for LineUp Talk
“Partnerships and Collaborations
The success of the LineUp work is a perfect example of
[the benefits of] an open dialog between the teams
along with continual experimentation by both sides.
Combining innovations from both research and product
testing group, we create[d] a complete solution to one
area of concurrency testing.”
C. Dern
© Copyright 2025 Paperzz