Introduction to Parallel Algorithms Cilk+ Dynamic Multithreading I Also known as the fork-join model I Shared memory, multicore I Cormen et. al 3rd edition, Chapter 27 Nested Parallelism I Spawn a subroutine, carry on with other work. I Similar to fork in POSIX. I The multithreaded model is based on Cilk+, available at svn://gcc.gnu.org/svn/gcc/branches/cilkplus I Programmer specifies possible paralellism I Runtime system takes care of mapping to OS threads I Cilk+ contains several more features than our model, e.g. parallel vector and array operations. I Similar primitives are available in java.util.concurrent Parallel Loop I iterations of a for loop can execute in parallel. I Like OpenMP Writing parallel (pseudo)-code Fibonacci Example Keywords parallel Run the loop (potentially) concurrently spawn Run the procedure (potentially) concurrently sync Wait for all spawned children to complete. Serialization I remove keywords I serialized (correct) parallel code is correct serial code Adding parallel keywords to correct serial code might make it incorrect I I I missing sync loop iterations not independent function Fib(n) if n ≤ 1 then return n else x = spawn Fib(n − 1) y = Fib(n − 2) sync return x + y end if end function I Code in Java, Clojure and Racket available from http: //www.cs.unb.ca/~bremner/teaching/cs3383/examples Computation DAG Work and Speedup Strands Sequence of instructions containing no parallel, spawn, return from spawn, or sync. T1 Work, sequential time. function Fib(n) if n ≤ 1 then . return n else x = spawn Fib(n − 1) y = Fib(n − 2) . sync return x + y . end if end function nodes strands Tp Time on p processors. Work Law Tp ≥ T1 /p speedup := T1 /Tp ≤ p Figure clrs27_2 in text down edges spawn up edges return horizontal edges sequential critical path longest path in DAG Parallelism span weighted length of critical path ≡ lower bound on time Tp Time on p processors. T∞ Span, time given unlimited processors. Span and Parallelism Example We could idle processors: Tp ≥ T∞ (1) Best possible speedup: parallelism = T1 /T∞ ≥ T1 /Tp = speedup Assume strands are unit cost. I T1 = 17 I T∞ = 8 I Parallelism = 2.125 for this input size. Figure clrs27_2 in text Composing span and work Work of Parallel Fibonacci Write T (n) for T1 on input n. T (n) = T (n−1)+T (n−2)+Θ(1) Substitute the inductive hypothesis Let φ ≈ 1.62 be the solution to T (n) ≤ a(φn−1 + φn−2 ) − 2b + Θ(1) A A B φ2 = φ + 1 B =a A+B We can show by induction that AkB T (n) ∈ Θ(φn ) series T∞ (A + B) = T∞ (A) + T∞ (B) series or parallel T1 = T1 (A) + T1 (B) T (n) ≤ aφn − b T∞ (n) = max(T∞ (n − 1), T∞ (n − 2)) + Θ(1) = T∞ (n − 1) + Θ(1) Transforming to sum, we get T∞ ∈ Θ(n) T1 (n) parallelism = =Θ T∞ (n) I φn n So an inefficient way to compute Fibonacci, but very parallel φ+1 n φ −b φ2 = aφn − b Assume Span and Parallelism of Fibonacci choose b large enough ≤a parallel T∞ (AkB) = max(T∞ (A), T∞ (B)) φ+1 n φ − b + (Θ(1) − b) φ2 (IH) (Ω() is left as an exercise)
© Copyright 2026 Paperzz