Processes and Threads

Scheduling
Operating Systems
CS 550
Spring 2017
Kenneth Chiu
Scheduling
• Low-level mechanisms of running processes
– Limited direct execution
– Context switching
• High-level policies – scheduling policies
• Scheduling appears in many disciplines (e.g., assembly
lines for efficiency)
• How to develop a framework for studying scheduling
policy?
– Key assumptions
– Key metrics
– Basic approaches
Workload assumptions
• Workload – processes/jobs running in the system
– Critical to building policies the more you know,
the more fine-tuned your policy can be
– Unrealistic, initially, but will be relaxed later
1.
2.
3.
4.
Each job runs for the same amount of time
All jobs arrive at the same time
All jobs only use CPU (i.e., they perform no I/O)
The run-time of each job is known – scheduler knows
everything!
Scheduling metrics
• To compare different scheduling policies
• Metrics – used to measure something
– Turnaround time: the time at which the job
completes minus the time at which the job arrived
in the system (a performance metric)
– Fairness
– A tradeoff between fairness and performance
First In First Out (FIFO)
• A.k.a. First come, first served (FCFS)
A
0
B
C
20
40
60
Time
80
• Each job runs 10 sec.
• Average turnaround time = ?
= (10+20+30)/3 = 20
100
120
First In First Out (FIFO)
• What kind of workload could you construct to make FIFO
perform poorly?
[B,C arrive]
A
0
20
40
B
60
Time
80
100
C
120
• Average turnaround time = ?
= (100+110+120)/3 = 110
• Convoy effect (supermarket checkout)
– a number of relatively-short potential consumers of a resource
get queued behind a heavyweight resource consumer
How can you solve this problem?
Shortest Job First (SJF)
• Average turnaround time = ?
= (10+20+120)/3 = 50
B
0
C
20
A
40
60
Time
80
100
120
• Assuming all jobs arrive at the same time, SJF
is optimal
Shortest Job First (SJF)
• SJF with A arrives at 0, and B, C at 10
[B,C arrive]
A
0
20
40
B
60
Time
80
C
100
120
100
120
• How would you do better?
• Preemption
[B,C arrive]
A
0
B
C
20
A
40
60
Time
80
Shortest Time-to-Completion First (STCF)
• Shortest Time-to-Completion First (STCF) or
Preemptive Shortest Job First (PSJF): any time a new
job enters the system, it determines of the remaining
jobs and new job, which has the least time left, and
then schedules that one.
[B,C arrive]
A
0
B
C
20
A
40
60
Time
80
100
120
• Average turnaround time = ?
= ((120-0)+(20-10)+(30-10))/3 = 50
A new metric: response time
• Interactive applications
• Response time = time from when the job
arrives in a system to the first time it is
scheduled
• Example:
[B,C arrive]
A
0
B
C
20
A
40
60
Time
80
100
120
Average response time = ?
(0+0+10)/3 = 3.3 sec.
A new metric: response time
• STCF and related disciplines are not
particularly good for response time
– If three jobs arrive at the same time, for example,
the third job has to wait for the previous two jobs
to run in their entirety before being scheduled just
once.
– While great for turnaround time, STCF is quite bad
for response time and interactivity.
• How can we build a scheduler that is sensitive
to response time?
Round Robin (RR)
• Instead of running jobs to completion, RR runs a job
for a time slice (scheduling quantum) and switches to
the next job in the ready queue; repeatedly does so
until jobs are finished
• Time slicing – length of time slice = multiple of timerinterrupt period
– SJF response time = 5
– RR response time = 1
A
0
B
5
C
10
ABCABCABCABCABC
15
Time
20
25
30
0
5
10
15
Time
20
25
30
Round Robin (RR)
• Length of time slice on response time ?
– The shorter the time slice is, the better the performance of RR under
the response-time metric.
– Can we make the time slice as small as possible?
– Where do the overheads come from?
• Context switch (saving/restoring registers), cache/TLB flushed …
– Design trade-off: making the time slice long enough to amortize the
cost of switching, but not to long that the system is no longer
responsive.
• Amortization
– Used in systems when there is a fixed cost to some operation
– By incurring that cost less often, the total cost to the system is
reduced.
Round Robin (RR)
• RR is one of the worst policies if turnaround time is
the metric.
• More generally, any policy (such as RR) that is fair,
(i.e., that evenly divides the CPU among active
processes on a small time scale) will perform poorly
on metrics such as turnaround time.
• Common tradeoff: performance (turnaround time)
vs. fairness (response time)
Incorporating I/O
• Process is blocked waiting for I/O completion
• When I/O completes, an interrupt is raised, and OS
runs and moves the process that issued the I/O from
blocked state back to the ready state
A
A
A
A
A B A B A B A B A B
A B B B B B
CPU
CPU
Disk
Disk
0
20
40
60
80
Time
100
120 140
0
20
40
60
80
Time
100
120 140
What we discussed so far
• The basic ideas behind scheduling and
developed two families of approaches.
– The first runs the shortest job remaining and thus
optimizes turnaround time.
– The second alternates between all jobs and thus
optimizes response time.
• Incorporating I/O.
• One fundamental problem remains: the inability
of the OS to see into the future.
MULTI-LEVEL FEEDBACK QUEUE
(MLFQ)
• Problems to be addressed:
– Optimize turnaround time (by running shorter
jobs first).
– Minimize response time for interactive users.
• Do we know anything (running time) about
the jobs?
– How to schedule without perfect knowledge?
– Can the scheduler learn? How?
• Learn from the past to predict the future.
Basic Setup
• List of queues, each
assigned a different
priority level.
• Jobs that are ready are
on a queue.
• First two basic rules for
MLFQ:
– Rule 1: Higher priority
queues have strict
priority.
• If Priority(A) > Priority(B),
A runs (B doesn’t).
– Rule 2: For jobs in the same queue , RR is used.
• If Priority(A) = Priority(B), A & B run in RR
Feedback
• MLFQ varies the priority of a job based on its observed
behavior.
– Job relinquishes CPU while waiting for input from
keyboard, keep priority high.
– Job uses the CPU intensively for long time, reduce its
priority.
– Use history of job to predict its future behavior.
• Workload: a mix of
– Interactive jobs that are short-running (and may frequently
relinquish CPU), and
– Some longer-running “CPU-bound” jobs that need a lot of
CPU time but where response time isn’t important
Changing Priority
• Rules on changing job priority
– Rule 3: When a job enters the system, it is placed
at the highest priority (the topmost queue).
– Rule 4a: If a job uses up an entire time slice while
running, its priority is reduced (i.e., it moves down
one queue).
– Rule 4b: If a job gives up CPU before the time slice
is up, it stays at the same priority level.
• One long-running job
• One long-running job
and one short job:
– Scheduler first assumes
it is a short job
– Approximate SJF?
• If it actually is a short
job, it will run quickly
and complete; if it is
not a short job, it will
slowly move down the queues, and thus soon prove
itself to be a long-running
• In this manner, MLFQ approximates SJF
How about I/O?
• Rule 4b: If a job gives up CPU
before the time slice is up, it stays
at the same priority level.
So Far…
• MLFQ rules thus far
– Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t).
– Rule 2: If Priority(A) = Priority(B), A & B run in RR.
– Rule 3: When a job enters the system, it is placed at
the highest priority (the topmost queue).
– Rule 4: If a job uses up an entire time slice while
running, its priority is reduced (i.e., it moves down
one queue); If a job gives up CPU before the time
slice is up, it stays at the same priority level.
• Any potential problems?
Problems
• Problems with the current version of MLFQ
– Starvation
• Too many interactive jobs, and thus long-running jobs
will never receive any CPU time.
– Gaming the scheduler
• Doing something sneaky to trick the scheduler into
giving you more than your fair share of the resource.
– What if a CPU-bound job turns to I/O-bound
• There is no mechanism to move the job up to queues
with higher priorities!
Boosting priority
• Rule 5: After some time period S, move all the jobs in the
system to the topmost queue.
• What problems does this rule solve?
– Starvation
– When a CPU-bound job turns to I/O bound.
CPU time accounting
• The scheduler keeps track of how much of a time slice a job
used at a given level; once a job has used its allotment, it is
demoted to the next priority queue.
Rule 4: If a job uses up an entire time
slice while running, its priority is
reduced (i.e., it moves down one
queue); If a job gives up CPU before
the time slice is up, it stays at the
same priority level.
Rule 4: Once a job uses up its time
allotment at a given level (regardless
of how many times it has given up the
CPU), its priority is reduced (i.e., it
moves down one queue).
CPU time accounting
Tuning MLFQ
• How to parameterize MLFQ ?
– how many queues
– how big should time slice be per queue
– how often should priority be boosted
• No easy answers and need experience with workloads
– varying time-slice length across different queues (e.g., high-priority
queues are usually given short time slices, and low-priority queues
are with long time slices)
– Some schedulers reserve the highest priority levels for operating
system work.
– Some systems also allow some user advice to help set priorities
(e.g., for example, by using the command-line utility nice you can
increase or decrease the priority of a job (somewhat) and thus
increase or decrease its chances of running at any given time).
Summary
• Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t)
• Rule 2: If Priority(A) = Priority(B), A & B run in RR
• Rule 3: When a job enters the system, it is placed at the highest priority
(the topmost queue)
• Rule 4: Once a job uses up its time allotment at a given level (regardless of
how many times it has given up the CPU), its priority is reduced (i.e., it
moves down one queue)
• Rule 5: After some time period S, move all the jobs in the system to the
topmost queue
• Instead of demanding a priori knowledge of a job, it instead observes the
execution of a job and prioritizes it accordingly
• It manages to achieve the best of both worlds: it can deliver excellent
overall performance (similar to SJF/STCF) for interactive jobs, and is fair
and makes progress for long-running CPU-intensive workloads
PROPORTIONAL SHARE
Proportional-share scheduler
• MLFQ – two goals: optimizing turnaround time &
minimizing response time
• A different type of scheduler -- proportionalshare scheduler (a.k.a. fair-share scheduler).
– Instead of optimizing for turnaround or response
time, a scheduler might instead try to guarantee that
each job obtain a certain percentage of CPU time.
– Example: lottery scheduling
– Basic idea: the scheduler hold a lottery to determine
which process should get to run next; processes that
should run more often should be given more chances
to win the lottery.
34
Lottery scheduling: tickets
• Fundamental concept – tickets
– Used to represent the share of a resource that a
process should receive.
– The percent of tickets that a process has represents its
share of the system resource in question.
– Example:
• 2 processes 100 tickets in the system, process A has 75
tickets, process B has 25 tickets  process A should receive
75% of the CPU, and process B should receive 25%
• Assuming A hold tickets 0~74, B holds 75~99
35
Principle behind lottery scheduling:
randomness
• What are the advantage of using randomness?
– Avoids strange corner-case behaviors that a more
traditional algorithm may have
• LRU algorithm: performs poorly for some cyclic-sequential
workloads
– Lightweight
• Traditional fair-share algorithm needs to perform perprocess accounting to track how much CPU a process has
received
– Fast: depends on how fast the random number
generation algorithm is (but the faster, the more
towards pseudo-random)
36
Ticket currency
• Lottery scheduling provides a number of
mechanisms to manipulate tickets.
• Ticket currency: allows a user with a set of
tickets to allocate tickets among their own
jobs, using own "currency".
37
Ticket currency
• Why the ticket currency mechanism is desirable?
– Consider the case where in a multi-user system, a user
manages multiple processes
– Want to let her favor some threads over others
without impacting the threads of other users
– Will let her create new tickets but will debase the
individual values of all the tickets she owns
– Her tickets will be expressed in a new currency that
will have a variable exchange rate with the base (or
global) currency
38
Ticket currency
• Example 1:
– User A currently manages three processes, with 10 tickets.
• A has 5 tickets
• B has 3 tickets
• C has 2 tickets
– User A creates 5 extra tickets and assigns them to a new
process D
– User A now has 15 tickets
– These 15 tickets represent 15 units of a new currency
whose exchange rate with the base currency is 10/15
– The total value of A tickets expressed in the base currency
is still equal to 10
39
Ticket currency
• Example 2:
– Users A and B have each been given 100 tickets.
– User A is running two jobs, A1 and A2, and he
gives them each 500 tickets (out of 1000 total) in
User A’s own currency.
– User B is running only 1 job and gives it 10 tickets
(out of 10 total).
40
Ticket transfer
• With transfers, a process can temporarily hand
off its tickets to another process.
• What is the scenario this mechanism is useful?
– In a client/server setting, a client process sends a
message to a server asking it to do some work on the
client’s behalf.
– To speed up the work, the client can pass the tickets
to the server and thus try to maximize the
performance of the server while the server is handling
the client’s request.
– When finished, the server then transfers the tickets
back to the client.
41
Ticket inflation
• Lets processes create new tickets
– Like printing their own money
– Counterpart is ticket deflation
• Normally disallowed except among mutually
trusting clients
– Lets them to adjust their priorities dynamically
without explicit communication
42
Implementation
• The most significant advantage of the lottery
scheduling is the simplicity of the implementation.
– A good random number generator to pick the winning
ticket.
– A list to track the processes of the system.
• Example: three processes A, B, and C
Suppose the lottery number is 300
43
Lottery scheduling dynamics
• A brief study of the completion time of two
jobs, A and B, competing against one another,
each with the same number of tickets (100)
and same run time (R).
• Unfairness metric U
– U = A's completion time / B's completion time
– If R=10 ms, A completes at time 10 ms, and B at
20 ms, then U = 10 ms/20 ms = 0.5
– A perfectly fair scheduler would achieve U = 1.
44
Lottery scheduling dynamics
• Varying R from 1 to 1000 over thirty trials , we have
• The longer the job run time is, the more fair the outcome is.
• Why?
45
Stride scheduling
• The lottery scheduling relies on randomness
to achieve fairness  not deterministic!
– it occasionally will not deliver the exact right
proportions, especially over short time scales.
• Stride scheduling – a deterministic fair-share
scheduler.
46
Stride scheduling
• Each job in the system has a stride, which is inverse in
proportion to the number of tickets it has.
–
–
–
–
Three jobs: A, B, and C
Tickets: 100, 50, 250
Strides: 100, 200, 40 (10,000/tickets)
Every time a process runs, the scheduler will increment a
counter for it (called its pass value) by its stride to track its
global progress.
– The pass value are used to determine which process to
schedule.
47
Stride scheduling
– Three jobs: A, B, and C
– Tickets: 100, 50, 250
– Within a fixed period, A runs twice, B runs once, and C runs 5 times.
48
Stride scheduling
• Given the precision of stride scheduling, why
use lottery scheduling at all?
– The lottery scheduling does not need to maintain
a global state, which make it easy for the
scheduler to cope with the dynamics of processes.
– Think about what you would do when processes
come and leave for the stride scheduling.
49
MULTIPROCESSOR SCHEDULING
Multiprocessor scheduling
• So far what we discussed focused on singleprocessor scheduling.
• How can we extend those ideas to work on
multiple CPUs?
• What new problems must we overcome?
51
Background: multiprocessor architecture
• The fundamental difference between singleCPU hardware and multi-CPU hardware is …
the use of hardware caches
52
An example
• A program running on CPU 1 reads a data item (with
value D) at address A  not in CPU 1 cache read from
memory.
• The program then modifies the value at address A, just
updating its cache with the new value D' (depending on
the cache write-back policy, the new value may not be
written back to memory immediately).
• OS decides to stop running the program and move it to
CPU 2.
• The program then re-reads the value at address A 
there is no such data CPU 2’s cache, read from memory.
• Thus the system fetches the value from main memory,
and gets the old value D instead of the correct value D'.
53
Cache coherence
• The previous problem is known as “cache coherence”
problem.
• A vast research literature on this regard.
• The basic solution is provided by hardware. By
monitoring memory accesses, hardware can ensure
that the “right thing” happens and that the view of a
single shared memory is preserved.
54
Cache affinity
• It is often advantageous to run a process on
the same CPU. Why?
• A multiprocessor scheduler should consider
cache affinity when making its scheduling
decisions, preferring to keep a process on the
same CPU if at all possible.
55
Single-Queue Multiprocessor
Scheduling (SQMS)
• The most basic approach is to simply reuse the
basic framework for single processor scheduling,
by putting all jobs that need to be scheduled into
a single queue.
• Advantage: simplicity
– easy to adapt the existing single-processor policies to
work on more than one CPU.
• Disadvantages: locking mechanisms need to be
applied to the scheduler code
– causes performance degradation.
56
Single-Queue Multiprocessor
Scheduling (SQMS)
57
Multi-Queue Multiprocessor
Scheduling (MQMS)
• To overcome the problem of SQMS, we can opt for
multiple queues, one per CPU.
• In MQMS, each queue will likely follow a particular
scheduling policy, such as round robin.
• When a job enters the system, it is placed on exactly
one scheduling queue, according to some heuristic
(e.g., random, or picking one with fewer jobs than
others).
• Then it is scheduled essentially independently, thus
avoiding the problems of information sharing and
synchronization found in the single-queue approach.
58
Multi-Queue Multiprocessor
Scheduling (MQMS)
• More scalable than SQMS, but …
59
Multi-Queue Multiprocessor
Scheduling (MQMS)
• A fundamental problem in MQMS  load
imbalance.
60
Multi-Queue Multiprocessor
Scheduling (MQMS)
• How to solve the load imbalance problem?
– Process migration
– Basic approach: work stealing
61
xv6 scheduling
• One global queue across all CPUs
• Local scheduling algorithm: RR
• scheduler() in proc.c
62
Linux scheduling overview
• O(1) scheduler
– Multiple queues
– priority-based (similar to MLFQ)
• Completely Fair Scheduler (CFS)
– Multiple queues
– deterministic proportional-share approach (like the Stride
scheduling)
• BF Scheduler (BFS)
– Single queue
– proportional-share, based on a more complicated scheme
known as Earliest Eligible Virtual Deadline First (EEVDF)
63
Linux scheduler implementations
• Linux 2.4: global queue, O(N)
– Simple
– Poor performance on multiprocessor/core
– Poor performance when n is large
• Linux 2.5 and early versions of Linux 2.6: O(1)
scheduler, per-CPU run queue
– Solves performance problems in the old scheduler
– Complex, error prone logic to boost interactivity
– No guarantee of fairness
• Linux 2.6: completely fair scheduler (CFS)
– Fair
– Naturally boosts interactivity
64