It is strongly recommended not to update UI controls etc from a

Everything you always
wanted to know
about threading....
. . . B U T W E R E A F R A I D TO A S K
E L L I OT H . O M I YA ( E H O)
P R I N C I PA L S D E
W I N D O W S D E V E LO P E R E X P E R I E N C E
Agenda
• A short history of threading in Windows
• Threading in the Windows Runtime
• Threading and UI programming
• Async
Short History of Threading in Windows
• Anyone remember _beginthread(ex)?? Going back to VC++ 1.0 and
the early to mid-90’s (16-bit real mode Windows)
• Eventually we sorted out CreateThread versus _beginthread issues
and interactions with the CRT.
• Inevitably this led to an explosion of threads in Windows programs.
• Everyone started hand-rolling their own “thread pools” until we
shipped ThreadPool in Windows NT.
• Dedicated threads (i.e. CreateThread) are still popular but harder for
application to manage.
ThreadPools
• Simple: queue, “n” threads, and an algorithm to dequeue work and
run it.
• Not simple: the actual implementation
• Job 1: queue (actually, this part is relatively simple)
• If you have priorities, one queue per priority level is a fine implementation
• Job 2: threads
• First decision: how many threads?
• One per core?
Worker threads for thread pool
• One per core is fine unless you have:
• IO
• Threads that can block on event/semaphore
• A “busy” thread (waiting on IO or event) can do no more work
• Result: you need a pool of threads greater than the # of cores
because every worker thread can potentially block
Work types
• Turns out you need “work types” (in the internal implementation)
• TP_WORK
• TP_WAIT
• TP_IO
• TP_TIMER
• etc.
• Basically these are “execution triggers” – how they start executing. You
could also explicitly set up work items that you know do IO or will wait on an
event/semaphore.
• So the API just speaks of work items, but there is additional complexity in
the actual implementation.
Thread explosion
• There are several classic programming problems that can lead to
thread explosion.
• Simple example: web server servicing web requests:
• “n” worker threads servicing incoming requests
• One worker thread to read actual html file and associated resources from disk
(into cache)
• Simple programming error can lead to disastrous results.
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
...
Work Item Queue
Worker Threads
Every worker thread is blocked on an event
Event
(web page
read)
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
...
Work Item Queue
Worker Threads
But the worker thread that signals the event is in the queue! (Bad code!)
Event
(web page
read)
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
...
Work Item Queue
Worker Threads
So you have to create another thread in the pool...
Event
(web page
read)
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
...
Work Item Queue
Worker Threads
But you could have a large number of requests for the same
web page...
Event
(web page
read)
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
...
Work Item Queue
Worker Threads
Result: thread explosion
Event
(web page
read)
Thread explosion
• Turns out there are many variations of this problem.
• Another example: freezing threads for a GC operation
• We call this programming defect “popular internal dependency”
• Solution: create a short (or long) delay – (yes, a heuristic)
• Algorithm assumes the resource dependency will resolve (i.e. does
not cover the programming error we previously described)
• Alleviates the thread explosion during this dependency resolution
Fairness
• Recap: “n” priority work queues and a system that does IO and has
events/semaphores
• OS has a highly efficient signaling mechanism: IO completion ports –
kernel knows when IO completes. Can also be used for events.
• So kernel knows immediately when a work thread that was blocked is
ready to run (ReadyThread).
• It can release work to the user mode side of the threadpool when it is
idle.
Fairness
• Windows 7: TP_WORK always trumped TP_IO and TP_WAIT
• So lots of TP_WORK could starve work items that unblocked (i.e.
unfair)
• A fairer algorithm blends TP_WORK, TP_IO, and TP_WAIT.
• Servicing “IRP’s” is a kernel mode concept so the “worker thread
factory” must be in kernel mode.
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item Queue
User
Work Item Queue
Kernel
Work
Work
IO
Worker Thread Factory
IO
Wait
...
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item Queue
User
Work Item Queue
Kernel
Work
Work
IO
IO
Worker Thread Factory
IRP’s complete
Wait
...
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item
Work Item Queue
User
Transported and
running
Work Item Queue
Kernel
Work
Work
IO
Worker Thread Factory
IO
Wait
...
Fairness
• Fairness could only be implemented with kernel mode managing
work items, IO, timers, and event waiters.
• Bonus: IRP’s complete rapidly, we can batch up and do a single
transport for multiple work items (fewer ring transitions)
• Very difficult to balance throughput, resource management, and
fairness in user mode only.
• This is a C++ conference, why do you care?
Your unique value add is the diverse types of
work you do in ISO C++
• On Windows, think of how your library functions can benefit from a
highly optimized threadpool.
• On desktop, you can access CreateThread and the latest TP API’s
• On WinRT, you only have the threadpool.
• This is where you throw tomatoes at me.
I need dedicated threads I can control...
• We hear this a lot. 
• Not every workload is amenable to (hopefully short) chunks of work
items.
• “I have long running work”.
• Examples:
• Populate off-screen parts of game board (viewport scrolling optimization)
• Lazy layout (optimize start-up performance)
• (insert your favorite scenario here, you all have one or three)
Time-sliced
• The Windows Threadpool does have a dedicated thread model: timesliced.
• Trade-off of latency versus throughput (and resource consumption)
• Using work items is efficient, many work items processed by
relatively few threads. (batched mode).
• Using “time-siiced” has less latency but burns an expensive resource:
a thread. (quantum mode)
• Plug: This is a summary of Pedro Texeira’s excellent talk on the
threadpool on MSDN. Many more details there.
WinRT threadpool
• Windows::System::Threading::ThreadPool has support for:
• Run a single work item - RunAsync(WorkItemHandler^)
• Run a single work item based on timer - CreateTimer(TimerElapsedHandler^, delay)
• Run a periodic work item based on a timer –
CreatePeriodicTimer(TimerElapsedHandler^, period)
auto WorkItem = ref new WorkItemHandler( [&](IAsyncAction^ workItem)
{
..... // background work
});
IAsyncAction ^ ThreadPoolWorkItem = Windows::System::Threading::ThreadPool::RunAsync(WorkItem);
• Eventually, background work is reflected in UI, result(s) need to run on the correct UI
thread: use CoreWindow Dispatcher
Agile Objects (are your friend)
In WinRT objects are objects
• And threads are threads.
• Forgive me while I digress into COM for a bit (you will forgive me ).
• How many people in this audience know what a COM apartment is?
(If you do, how do you like dealing with apartments?)
• Better question: how many people want to know what a COM
apartment is?
Traditional view of multi-threading
• Objects can be accessed from any thread
• The hard part is dealing with multi-thread contention, resource
management, locking, deadlock prevention, etc.
• The hard part is hard enough.
• You already deal with “context” on a thread no matter what platform
you run on: at a minimum, you deal with the context that is “UI”.
• But, you can set up your code so that most objects can run on any
thread.
• And you want multi-threading to be no harder than this.
Agile Objects
• WinRT objects are agile by default.
• This bit of “magic” (1) allows WinRT objects to just be “multi-threaded objects” (as
you have traditionally known them).
• They can run anywhere (hence the term “agile”) and you have the job of dealing
with multi-threaded resource contention. Period.
• This is the technique that allows you to ignore that there is a thing called an
apartment.
• There are also UI-affine objects in WinRT but you are used to dealing with these on
whatever platform(s) you code for.
(1)
Actually not magic: uses FTM, lousy name, but it saves you from knowing anything about
apartments. 
(2)
Deep dive on this from Martyn Lovell’s talk at build 2013. And our Channel9 video.
UI and ASTA
UI Threads
• All mainstream systems bind a single thread to UI operations.
• Implicitly or explicitly a UI thread has a “context” into which it is
bound.
• In the 90’s we created the notion of “apartment” which to this date
almost no one understands. But just think of it as a context.
• Very popular “other OS” says: It is strongly recommended not to
update UI controls etc from a background thread (e.g. a timer, comms
etc). This can be the cause of crashes which are sometimes very hard to
identify. In other words, don’t perform UI operations out of context.
UI Contexts
• In this world of many threads, we’re still stuck with a single thread
and UI.
• It gets worse: turns out rather than multi-threaded races and
deadlocks, UI threads can suffer from “re-entrancy”.
• Re-entrancy: I’m in the middle of doing thing “A”, why am I all of a
sudden doing thing “B”?
• Re-entrancy is the cause of a large number of crashes and deadlocks.
• UI has two models: re-entrancy == surprises, non-re-entrancy ==
potential deadlocks.
How to safely update UI 101
• Option 1: Background threads call directly into UI threads. Only safe
if you are REALLY careful. (See slide on re-entrancy).
• Option 2: Background threads post notifications that UI threads
process “when they are ready”. (Dispatcher model).
• Can UI threads call directly into objects running on background
threads? Only safe if you are somewhat careful.
• Oh, and don’t take too long to update UI.
• What is definition of “too long”? In the last few years it was all about
frame rate. But now, it means “be immediately responsive to input”.
Pretend you have the chance to define a
new platform... 
• What choices do you make?
• One of the first questions: Does your UI model allow re-entrancy?
• Next: do you allow arbitrary threading?
• How much synchronous UI update do you allow in your platform?
(i.e. how much risk of “spinning donuts” do you want?)
We ended up calling that new platform:
WinRT (“Windows Runtime”)
• Choice #1: Non-re-entrancy (“ASTA”)
• Choice #2: ThreadPool API’s but not arbitrary threading
• Choice #3: No spinning donuts: async across the API surface
• Choice #4: No arbitrary window creation (and you don’t have to
know what an HWND is)
• We still have single-threaded UI frameworks: HTML/CSS and XAML
• And there are plenty of things you can do synchronously in these frameworks,
e.g. toggle button state or update textbox/listbox/etc. contents.
Tying it together
• Turns out that you have to tie together the following elements for the
UI and programming model to work:
• Window creation and behavior
• Window event (message) processing
• Dispatcher processing (remember those UI notifications we need?)
• Call processing (incoming and outgoing)
• Input
• And you have to prioritize these with respect to each other.
The entity that ties these together is ASTA
• ASTA == “UI context” (forget that one of the A’s is “apartment”)
• One thread per window, windows are created by contract activations (e.g.
click on a tile or share something from an app).
• The main thread of a WinRT application runs in a multi-threaded context
but is not a UI thread. i.e. the main thread is not a UI thread.
• As a result, of course, there is a main UI window / thread.
• ASTA’s are non re-entrant. When an outgoing call is in progress, an
incoming call blocks.
• This requires careful planning, but there are no surprises. (Re-entrancy is
nearly always a surprise).
Async (no spinning donuts)
• There has been a concerted effort to make UI responsive, this is key
to the “fast and fluid” platform promise.
• Every API was (and is) reviewed. Any synchronous call that takes
longer than the low 10’s of milliseconds to execute must be async.
• Every async API conforms to a uniform pattern.
• The language projections (C++, C#/VB.Net, Javascript) all have builtin async support.
WinRT Async 101
• All WinRT async operations are “hot start”, as soon as they are
produced, they are running.
• You can “attach” to a running async operation by supplying a
completion handler (put_Completed).
• When the operation completes, it fires a completion callback. The
type of each callback is the result type of the async operation.
• Of course, when you supply the completion callback, it may fire
completion immediately, i.e. the operation completed already.
WinRT Async 101
• When an async operation completes, its status is reported:
• Complete 
• Canceled (user request to cancel was honored)
• Error 
• WinRT async operations are a one-way state machine:
AsyncMethod => Running => Terminal State {Completed|Error|Canceled}
• Processing always occurs on background thread (usually a TP work
item)
• Pattern is completely documented in code: see WRL’s async.h
Async and Agile
• Async Operation objects are a good example of agile objects
• Mostly designed to be called from UI threads
• But do all of their work on background (TP) threads
• As a result, async operation objects are agile (and therefore directly
callable from UI threads)
Async support (C++ / PPL)
• Language projections are key to productivity in producing WinRT
apps.
• “Natural and familiar” is a key design point: make the experience
natural and familiar for the language that is being used.
• For C++, the natural means of consuming async is based on the
Concurrency Runtime (PPL).
• The model is continuation-based (.then() + lambda/functor/function)
with support for exception handling and cancellation.
Async and contexts
• Consuming async operations is often done for the benefit of UI
• create_task takes note of the originating context, i.e. what kind of
thread made the original call. (use_default).
• create_task will return the result of an async operation (if any) to that
originating context by default.
• C++ developers of course have control over this (e.g. you may want to
“de-bounce” transitions back to the UI thread until the end of a series of
continuations).
• We’ll talk about CoreWindow’s Dispatcher method soon.
CoreWindow Dispatcher
• One of the things that the ProcessEvents loop on a CoreWindow
schedules is dispatcher work items.
• CoreWindowDispatcher::RunAsync() schedules a work item to run on
the UI thread associated with the CoreWindow.
• Since the CoreWindow processing loop processes multiple types of
items, a rich priority scheme is supported.
CoreWindow Dispatcher Priority
• Normal priority (default) means: “run dispatcher callbacks in FIFO
order, cooperative with input (and window management events)”.
(Everyone runs cooperatively).
• Low priority means: Dispatcher callbacks run when there is no input
pending. (Input beats dispatcher, i.e. app is responsive to input).
• Idle priority means: Dispatcher callbacks run when there is nothing else
in any queue. (everything beats dispatcher).
• High priority means: dispatcher callbacks run ahead of everything else.
(dispatcher beats everything). Docs say: “don’t use this” (more on this
in a second).
Rendering UI
• This is not your father’s message loop. Everything used to happen on
the UI thread: message processing, paint, computation, animation,
alpha blending, etc.
• As applications became more graphically intensive, everything
became a slave to frame rate (60 fps == 16.667ms “window”). Get
everything done by deadline. Very difficult model to get right.
• WinRT render threads are separate from ASTA UI threads.
Composition is separate from render.
• Goodness: relieves pressure on the (precious) UI thread
Responsive UI
• The “paint beat” is tied to the frame rate. The frameworks (WinJS
and xaml) take care of this.
• Most UI operations are “tweaks” to layout (change text in a text box,
scroll, get image ready to render, etc.). These occur directly on UI
thread.
• Initial layout and large changes to layout (e.g. navigation) are critical
and time-sensitive. You have to beat the next vsync.
• If you can’t beat the next vsync (common) then you can create a
transition animation while the next layout is being prepared.
(independent animations do not run on the UI thread)
Responsive UI
• What does this have to do with threading?
• CoreWindowDispatcher priority High serves a couple of different
“system” purposes:
• In WinJS – High priority is used for layout changes. Think: layout beats
everything
• In CoreApplication, the “app object” in every WinRT application, the suspend
notification is delivered on UI threads via the Dispatcher (for apps, responding to
Suspend is highly time critical)
• If you choose to use High priority, remember that it is an extremely
sharp knife: highly effective and dangerous when used improperly.
Tying it all back together
• UI threads are created by the system and managed by CoreWindow
processing loop.
• Background threads run via WinRT ThreadPool or language projection
components (sitting on WinRT ThreadPool).
• CoreWindow Dispatcher schedules work on UI threads, with a rich
priority scheme.
• Responsive and useful app UI means balancing:
• CPU utilization on background threads
• UI thread processing that is responsive to input and never blocks (including never
running long workloads)
Which means...
• Responsive programs are sequences of async processing followed by a
return to UI thread either by:
• Direct context capture (return to point of origination); or
• Explicit call to CoreWindow Dispatcher
Async Investments
• Microsoft continues to invest heavily and innovate in the async space.
C# introduced await keyword, dramatically simplifying async
programming.
• Await allows async consumption code to read in a more logical flow.
• It looks like synchronous code, but the block of code following an await
statement executes “later”.
• Think you don’t need/want this? Go to Herb Sutter’s //build 2013 talk,
and fast forward to about 50 minutes in.
• And then make sure you attend Deon’s talk later today where we tell
you Everything You Ever Wanted To Know about C++ await.
In Summary...
• Probably not everything you always wanted to know 
• The valuable assets you have in your C++ code come to light in UI
• And the threading and coding rules for UI are different than they were
5-10 years ago (more input responsiveness, more async, etc.)
• Optimize background processing in units of work and think carefully
about the relative priorities of that work. (And don’t reinvent the
threadpool!)
• You do really want await (don’t miss Deon’s talk).