Programming w/ Concurrency #2:
Multithreaded Programming with
Shared Memory
Joe Duffy
FUN405
Program Manager, CLR Team
Microsoft Corporation
Agenda
Shared Memory
Lock Implementation Trivia
Memory Models
GUIs and COM
Wrap-Up
2
Shared Memory Basics
Concurrent workers can share to communicate
Objects in the heap
Raw memory in the address space
System-wide kernel objects and memory mapped I/O
Pr ocess-Shar ed Heap
a
Syst em -W i d e
Resour ces
c
0x74F00AB3
mutex
b
Process 1
...
Thread 1
Thread ...
...
Thread n
Process 2
MemMapped IO
0x081F3726
Process ...
With sharing comes responsibility
Dealing with broken invariants, avoiding corruption
3
Shared Memory Basics
Concept recap
Invariants are assumed conditions in your code
When invariant are broken, locking can ensure:
Serialization: Things happen one after the other
object myLock = new object();
void Foo() {
lock (myLock) {
// munge the data structure (not happy )
// but leave it in a happy state
}
}
Atomicity: Either it happens fully, or the effects are not
visible at all
object myLock = new object();
void Foo() {
lock (myLock) {
try {
// munge the data structure
// and leave it in a happy state (unless an exception occurs)
} catch {
if (NotConsistent)
// erase any partial munges
}
}
}
4
Sharing Memory
Shared state in our code
WinFX code modifies statics and internal CLR
state in a thread-safe manner
Designed to tolerate concurrency
Avoids corrupting shared state
Suggests to hosts when to rip the AppDomain instead
of aborting a single thread
Instances are not thread-safe
We don’t know it’s shared if you share it
You are responsible for ensuring thread-safety
There are very few exceptions, e.g. Thread
Recommended guidance for reusable libraries
5
Locking Challenges
Heisenbugs
General challenges
Deadlocks
Priority inversion
Lock convoys
1 11 (low priority)
lock a
lock
lock a
Lock
Priority
Convoys
Inversion!
Deadlock!
22 (highlock
3priority)
b4
2
Accidental deadlocks can be caused
by locking
on
Cross-AD bled objects, e.g. System.Type
Can also lead to orphaned monitors due to AppDomain death
State publicly accessible from libraries
Scalability challenges
Granularity
Too coarse can lead to decreased throughput
Too fine incurs perf overhead of lots of little locks
6
Atomicity Challenges
Asynchronous exceptions
Goal: Those who lock never see inconsistencies
How? Patch up broken invariants upon failure
Rock solid atomicity is actually quite hard
Async exceptions can happen nearly anywhere, e.g.
sophisticated hosts inject ThreadAborts
Suspend during CERs, catch/finally, .cctors, native
code
Don’t panic!
If you’re under a lock, you can assume the AD is
being unloaded
Finally blocks and finalizers are often good enough
If you’ve mutated process- or system-wide state you
7
Running Code In Parallel
How to run code in parallel (on the CLR)?
You have many options, in order of preference:
1.
Parallel worker APIs
Async APIs specific to some types
ThreadPool.QueueUserWorkItem(…),
or BackgroundWorker (for UIs)
2.
Explicit threading (e.g. Thread..ctor, .Start)
TP.QUWI and (usually) Async APIs follow the
APM
Rendezvous occurs with one of:
1.
2.
3.
4.
Callback delegate
IAsyncResult.IsComplete
IAsyncResult.WaitHandle, or
Just EndXxx (automatically blocks if !IsComplete)
EndXxx always necessary to release resources
8
Agenda
Shared Memory
Lock Implementation Trivia
Memory Models
GUIs and COM
Wrap-Up
9
Writing Your Own Lock (?)
Want a spin-lock?
Easy enough to implement yourself…
class SpinLock {
private int state;
public void Enter() {
while (Interlocked.CompareExchange(
ref state, 1, 0) != 0);
}
public void Exit() {
state = 0;
}
}
…or perhaps not
10
Hand Written Spin Lock
11
Writing Your Own Lock (?!)
Not so fast!
Summary:
99% of the audience shouldn’t need to!
Extremely easy to get wrong, we write them for you
Original attempt robs forward progress
Can hold the bus
Starves other hardware threads
And besides… It’s silly to spin on a single proc
CLR doesn’t know it’s a lock unless you tell it
Begin/EndCriticalRegion tells the host that aborting a
single thread could lead to instability (e.g. deadlocks)
And we didn’t even discuss reentrancy and
affinity
12
Agenda
Shared Memory
Lock Implementation Trivia
Memory Models
GUIs and COM
Wrap-Up
13
Constructor Race Condition
Can inst refer to an uninitialized Foo?
class Foo {
static Foo inst;
string state;
bool initialized;
private Foo() {
state = “I’m happy”;
initialized = true;
}
public Foo Instance {
get {
if (inst == null)
lock (this) {
if (inst == null)
inst = new Foo();
}
return inst;
}
}
}
// Two threads concurrently:
Foo i = Foo.Instance;
Might look something like this (psuedo-jitted code):
Foo tmp = GCAlloc(typeof(Foo));
tmp->state = “I’m happy”;
tmp->initialized = 1;
inst = tmp;
But what if it turned into this?
inst = GCAlloc(typeof(Foo));
inst->initialized = 1;
inst->state = “I’m happy”;
Thread 2 could see non-null inst, yet:
(1)
Initialized == 0, or
(2)
Initialized ==1, but state == null
14
Read/Write Reordering
Compilers (JIT) and processors want to execute read
and/or writes out of order, e.g.
// source code
static int x, y;
void Foo() {
y = 1;
x = 1;
y = 2;
// …
}
// can become
static int x, y;
void Foo() {
y = 2; // swap and delete one
x = 1;
// …
}
We say the write of x passed the 2nd write to y
Code motion: JIT optimizations
Out of order execution: CPU pipelining, predictive execution
Cache coherency: Hardware threads use several memories
Writes by one processor can move later (in time) due to buffering
Reads can occur earlier due to locality and cache lines
Not legal to impact sequential execution, but can be visible to
concurrent programs
Memory models define which observed orderings
are permissible
15
Memory Models
Controlling reordering
Load acquire: Won’t move after future instrs
Store release: Other instrs won’t move after it
Fence: No instructions will “pass” the fence in
either direction
A.k.a. barrier
...
...
...
...
default
load/store
ld.acq x
st.rel x
mf
...
...
...
...
16
Memory Models
On the CLR
Strongest model is sequential/program order
Seldom implemented (x86), we are a bit weaker
Reordering is for performance; limiting that limits the processor’s
ability to effectively execute code
Locks acquisitions and releases are fences
Makes code using locking simple[r]
Lock free is costly and difficult – just avoid it!
Notice this didn’t solve the ctor race, however
ECMA specification
Volatile loads have acquire semantics
Volatile stores are release semantics
v2.0 implements a much stronger model
All stores have release semantics
Summary: 2.0 prevents the ctor race we saw earlier
But on strong model machines, won’t cause problems on 1.x
17
Memory Models
Why volatile and Thread.MB()?
Reorderings are still possible
1.
2.
Non-acquire loads can still pass each other
st.rel followed by a ld.acq can still swap
Volatile can fix #1, Thread.MB() can fix #2
Q: For example, can a > b?
static int a;
static int b;
// Thread #1
while (true) {
int x = a;
int y = b;
Debug.Assert(y >= x);
}
// Thread #2
While (true) {
b++;
a++;
}
A: Yes, unless a (or b) are marked volatile
18
Agenda
Shared Memory
Lock Implementation Trivia
Memory Models
GUIs and COM
Wrap-Up
19
GUIs and Messages
1. Post- or SendM essage()
GUI Message
Queue
Thread 1
...
...
...
...
Thread ...
2. GUI Thread Pumps
GUI Thread
(affinitized)
Thread n
Message
Pump
3. M essage is Dispatched
WndProc
Dispatch
Message
5. Done
4. M essage is Executed
20
COM Threading Model
Making concurrency simple
Process 1
Single Thr eaded
Apar t m ent (STA)
a
c
M ult i-Thr eaded
Apar t m ent (M TA)
d
e
Neut r al
Apar t m ent
(NA)
f
x
b
Thread 2
(affinitized)
Thread 1
(affinitized)
Thread …
(affinitized)
Thread n
(affinitized)
y
z
Message Queue
n STAs per process
1 affinitized thread for its lifetime
0 or 1 M TA per process
n affinitized threads
1 NA per process
No threads, just components
21
GUIs, COM and Messaging
Pumping and reentrancy
COM uses a GUI thread for STAs
Each STA thread has a queue and a pump
Method calls on a STA COM proxy pUnk turns into a
PostMessage, then pumps waiting for a reply
STA must to pump to dispatch the call, then
PostMessage the “return” to the caller
Dispatched calls are stacked onto the STA’s
existing call stack
Called reentrancy
Thread-wide state can be implicitly shared
If the pump isn’t running, the queue isn’t
draining… deadlocks, “(Not Responding)”, etc.
22
GUIs and Messaging
CLR interoperability
Good news! The CLR does a lot for you
Cross apartment transitions and marshaling
Pumping the STA whenever you do a managed block
Places your threads into an MTA by default
You can override the default apartment choice
STA or MTAThreadAttribute applied to the entry-point
Thread.SetApartmentState for explicit threads
But Visual Studio sticks an STAThreadAttribute on
many projects
Some project types require STA, e.g. GUIs (Windows Forms
and Windows Presentation Foundation) require it
Using the wrong type can cause COM interop headaches
23
Fun Pumping and Reentrancy
Parlor Trick
24
Finalization
Concurrency ‘gotchas’
Finalizer accesses your components from a
different MTA thread
CLR objects assuming thread affinity could be
surprised
STA components require finalizer to transition
If the STA’s thread isn’t pumping, the finalizer isn’t finalizing
On a server with lots of STA components, not a GoodThing™
Resurrection dangers
Somebody in a finalization queue can republish your
pointer to the world
And then you can be finalized and called concurrently
Can lead to subtle, difficult to find bugs
Moral: Don’t do it (1) to yourself and (2) to others
25
Agenda
Shared Memory
Lock Implementation Trivia
Memory Models
GUIs and COM
Wrap-Up
26
Summary
The Platform strives to make Concurrency tractable
We continue to make it easier over time
Locking makes it easier, attempting to be clever comes with a tax
Hardcore architecture and implementation details are fun, provide
insight and appreciation, but not necessary to do your day job
The future is a fun place to be: Remember Jan’s graph?
TRY IT OUT!!! (and don’t block your UI thread)
5B
<10 GHz
log transistors/die
log CPU clock freq
100 M
3 GHz
>30%/y
!
<10%/y
10,000
1 MHz
1975
2003
2015
27
Other Talks
On the DVD (if you missed it)
FUN302: Programming with Concurrency (Part 1)
DAT301: High Performance Cluster Computing
Concurrency futures
FUN323: Fri 8:30 a.m.
MSR: Future Possibilities in Concurrency
TLN309: Fri 10:30 a.m.
C++: Future Directions in Language Innovation
Other related talks
FUN308: Wed 1:45 p.m.
Developing Rock Solid Reliable Apps
FUN412: Thu 10:00 a.m.
Five Things Every Win32 Developer Should Know
TLN306: Wed 1:45 p.m.
The .NET Language Integrated Query Framework
28
Q&A/Resources
My blog:
http://www.bluebytesoftware.com/blog/
Chris Brumme’s blog:
http://blogs.msdn.com/cbrumme/
Herb Sutter’s blog:
http://www.pluralsight.com/blogs/hsutter/
Other CLR Team Blogs:
http://blogs.msdn.com/shawnfa/archive/2
005/02/08/369384.aspx
.NET Framework 2.0
Joe Duffy, ETA Q4 2005, ISBN: 0764571354
Patterns for Parallel Programming
Timothy G. Mattson, et al, ISBN: 0321228111
Concurrent Programming in Java™
Doug Lea, ISBN: 0201310090
29
© 2005 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
30
© Copyright 2026 Paperzz