レポート(1): 分岐予測の実装と評価 計算機アーキテクチャ特論 (Advanced Computer Architectures) gshare分岐予測を実装し,その予測ミス率を測定せよ.また,bimodal 分岐予測との予測精度(20本のベンチマークのミス率の算術平均)の比 較を示せ. 6.分岐予測(3) ハードウェア量を 2KB, 4KB, 8KB, 16KB, 32KB, 64KBとしてグラフを描け. 予測ミス率が低い(性能が高い)と高得点. 1月 最初の講義の開始時にレポートを提出 1 ハードウェア量を 2KB, 4KB, 8KB, 16KB, 32KB, 64KBとしてグラフを描け. パーセプトロン分岐予測 を工夫して実装し,予測ミス率を測定せよ. 講義ホームページから bpkit03.tgz をダウンロード コードの説明(コードは少ないほどベター),工夫した点 ハードウェア量の計算方法を明示 ミス率のグラフ(表ではなく,見やすいグラフにしてください) 考察と感想 Gshare (TR-DEC 1993) レポート(1):分岐予測の実装と評価 トレースデータ,命令アドレスと分岐結果の系列 /***** BPKit 0.5 trace file *****/ //trace_name________: CBP1-IT1 //total_branches____: 4184792 //total_instructions: 29499987 004058fb 0 00405910 0 while(!gzeof(gzfp)){ 0040591c 0 gzgets(gzfp, buf, BUFSIZE); sscanf(buf, "%x %d", &pc, &taken); 00405925 0 0040592e 0 bp_predict(pc, NULL, &pred); /* prediction */ 0040593a 0 bp_regist(pc, taken, NULL); /* update storage */ 00405944 0 if(pred==taken) hit++; else miss++; 0040594b 0 } 0040492d 1 0040494f 0 グローバル分岐履歴と分岐アドレスとの排他的論理和によりパターン履歴表 へのインデックスを作成 パターン履歴表は2ビット飽和型カウンタの配列で,選択された2ビットカウンタの 値により分岐方向を予測(bimodalと同じ) 分岐結果を用いて,予測に利用したカウンタを更新 010101000 (シフトレジスタ) Program Counter 1 w0 w1 w2 ... Untaken Taken Untaken Strongly Untaken (00) 2 bit Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 分岐アドレスによりパーセプトロンを選択 グローバル分岐履歴を用いて予測 分岐ミス,|y| < θ の時にトレーニング y の値がある閾値より高い場合に成立と予測する. x2 Weakly Untaken (01) Prediction Weakly Taken (10) 4 パーセプトロン分岐予測器 (HPCA 2001) w は符号付きの整数で表現 x1 Taken Untaken Untaken 3 x1からxnまでのnビットの分岐履歴を入力とする. y を計算する. Strongly Taken (11) Taken n パーセプロロンモデル Taken Pattern History Table (PHT) 2n entry XOR パーセプトロン Branch History Register (BHR) m n … Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 2 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 8KBのハードウェア量で,34ビットの分岐履歴 Program Counter xn Branch History (x) wn … y y Compute y Prediction Perceptron Model Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 Selected Perceptron Table of Perceptrons (w) 5 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 6 1 Partial Pattern Matching (CBP 2004) パターンマッチング 予測: 0 or 1? 分岐履歴 Table 3 Table 4 pc pc h[0:9] Table 2 Table 1 Table 0 pc h[0:19] pc h[0:39] pc h[0:79] 12 0 1 0 3b m ctr 最長一致パターン 3b ctr hash hash 10 8 8 bit tag u hash 3b ctr 8 1 Prediction hash 8 8 bit tag u 3b ctr 1 hash 8 8 bit tag u 3b ctr 1 8 8 bit tag u 8 =? 1 hash 10 8 =? 1 hash 10 8 =? 0 1 0 hash 10 =? 1 1 1 1 1 過去の履歴との最長一致のパターンを見つける. 1 それぞれの最長一致パターンの次に出現する0と1の回数を 数え,多い方を予測値とする. 1 CBP2004プレゼンテーションスライド 7 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 prediction 0/1 8 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 分岐予測のまとめ 重要な分岐予測の根底にあるアイデアとその構成を解説 Filter BTB, de-alias PACT 1996 hash function, hybrid Bimodal ISCA 1981 Agree ISCA 1997 Gskew, E-gskew ISCA 1997 PPM CBP 2004 2Bc-gskew TR-INRIA 1999 Prophet-Critic Hybrid Branch Prediction hybrid Two-level adaptive MICRO 1991 Gshare TR-DEC 1993 branch history, hash function Bimode YAGS MICRO 1997 MICRO 1998 de-alias Perceptron Fast-Neural HPCA 2001 MICRO 2003 Ayose Falcon, UPC, Jared Stark, Intel, Alex Ramirez, UPC, Konrad Lai, Intel, Mateo Valero, UPC ISCA-31 pp. 250-261 (2004) table element Variable Length Path ASPLOS 1998 history length optimization ISCA (International Symposium on Computer Architecture) MICRO (International Symposium on Microarchitecture) PACT (International Conference on Parallel Architectures and Compilation Techniques) ASPLOS (International Conference on Architectural Support for Programming Languages and Operating Systems) Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 9 Introduction 研究の目的と概要 今までにない,斬新なアイデアの分岐予測器を提案 分岐予測器の予測精度の向上,プロセッサの性能向上を目指す. 予測ミス率を大幅に低減する. 10 Conventional predictors are analogous to a taxi with just one driver. 未来の情報を利用できないか? He gets the passenger to the destination using knowledge of the roads acquired from previous trips; i. e., using history information stored in the predictor’s memory structures. Gccの場合には,予測ミス率を 3.11% から 1.23%に削減 When he reaches an intersection, he uses this knowledge to decide which way to turn. Intel Pentium 4 プロセッサをベースとして,uPC (Uops Per Cycle) で5.2%の性能向上 (Perceptron分岐予測の場合). The driver accesses this knowledge in the context of his current location. Program Counter Modern branch predictors access it in the context of the current location (the program counter) plus a history of the most recent decisions that led to the current location. Branch predictor Branch History Register (BHR) Taken or Untaken 従来の分岐予測器の枠組み Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 11 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 12 2 Introduction Prophet/critic hybrid の構造 Prophet/critic hybrids are analogous to a taxi with two drivers: the front-seat and the back-seat. Prophet: 予言者 Critic: 批判家,鑑定家 The front-seat driver has the same role as the driver in the singledriver taxi. This role is called the prophet. The back-seat driver has the role of critic. She watches the turns the prophet makes at intersections. She doesn’t say anything unless she thinks he’s made a wrong turn. When she thinks he’s made a wrong turn, she waits until he’s made a few more turns to be certain they are lost. (Sometimes the prophet makes turns that initially look questionable, but, after he makes a few more turns, in hindsight appear to be correct.) Only when she’s certain does she point out the mistake. To recover, they backtrack to the intersection where she believes the wrong-turn was made and try a different direction. Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 13 Prophetは,分岐命令Dの方 向を,・・・XYZABCの履歴を 用いて予測する. Criticは,分岐命令Zの方向 を,・・・XYZABCDの情報を 用いて鑑定する. Criticは,Zの予測を鑑定する が,それには,Zの予測時刻 より後の(Zを基準として未来 の)情報を利用できる. ある予測を前提に処理を進 めたところ矛盾が生じる. 前提がおかしいと判断. Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 14 Decoupled front-end architecture, ISCA-1999 Prophet/critic hybrid の予測の例 基本ブロック 0, 1, 4, 7 が 正しい制御の流れとする. 分岐 A の予測を考える. Branch History: W, X, Y, Z Branch Future: A, C, G and J Prophet Critic 過去の履歴において, W, X, Y, Z | A=T, C=T, G=T, J=T の場合には,分岐予測ミス が多いことを記録し,これ を用いて鑑定する. Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 15 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 Decoupled front-end architecture を利用した Prophet/critic hybrid の実装 評価環境 16 現実的な,サイクル当たり6命令を処理する高性能マイクロプロセッサ. Prophet, Critic それぞれの予測機構には,従来の分岐予測機構の任意 の組み合わせを利用できる. Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 17 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 18 3 Future bit の重要性, 何回先の予測結果まで利用すればよいか? 予測精度の比較 (16KBのハードウェア資源) Future bits = 0 の構成は従 来のハイブリッド分岐予測器 Future bits = 1 でも,予測 精度が向上する. 平均では,Future bits の数 を増やすことで予測精度が 向上する. ただし,大きくしすぎることで 精度が低下するベンチマー クが存在する. 19 20 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 Agree/Disagree ? 予測精度の比較 (32KBのハードウェア資源) 単体では perceptron がベスト. 更に,提案 手法により 26.8%の予 測ミスを削減 21 The critic’s critiques of the prophet’s predictions determine the effectiveness of the prophet/critic algorithm. We classify the final critique of a branch according to the prophet’s prediction (i. e., correct or incorrect) and the critic’s critique (i. e., agree or disagree). Ideally, the critic disagrees when the prophet mispredicts (incorrect disagree). When the critic agrees with the prophet (correct agree if the prophet didn’t mispredict; incorrect agree if it did), the final prediction does not change, so there is neither gain nor harm. However in both these cases, the use of the critic was needless since there was no change in the final prediction; and for incorrect agree, an opportunity was lost to correct a mispredict. The worst case—and the case that should be minimized—is when the critic disagrees with a prophet’s correct prediction (correct disagree). 22 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 プロセッサの性能向上の比較 Agree/Disagree ? Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 更に,提案 手法により 25.4%の予 測ミスを削減 Future bits は8ビット程度と する. Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 単体では perceptron がベスト. 23 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 Perceptron がベ スト. Perceptron, 12 future bitの場合 には, 従来手法 と比較して, 5.2%の性能向 上 24 4 Prophet-Critic Branch Prediction まとめとコメント 今までにない,斬新なアイデアの分岐予測の提案と評価. 予測ミス率を大幅に低減する. Completion Time Multiple Branch Prediction for Enhancing Trace Cache Performance Gccの場合には,予測ミス率を 3.11% から 1.23%に削減する. Intel Pentium 4 プロセッサをベースとして,uPC (Uops Per Cycle) で 5.2%の性能向上 (Perceptron分岐予測の場合). Ryan Rakvic et al. コメント アイデアとして面白い. ハードウェアの規模と複雑さも問題にならないレベル. アイデアは分岐予測以外の分野でも利用できるのではないか. Carnegie Mellon University ISCA-2000 pp.47-58 25 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 26 複数分岐予測 (Multiple Branch Predictor) Tree-Based Multiple Branch Predictor (TMP) これまでの分岐予測は,1サイクルに1つの分岐方向を予測 ハードウェア資源の増加にともなって,大規模なプロセッサが実現 可能 Tree-based Pattern History Table (Tee-PHT) B-3 Taken B-2 Tree TTNT Not Taken フェッチする命令数を増やすためには,1サイクルに複数の分岐方 向を予測することが必要 B-1 Not Taken 複数分岐予測 (Multiple Branch Predictor) B0 TNN B1 Tree(i) B0 B0 B1 Global history + PC Predicted path B2 B2 B3 B3 B4 B4 27 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 Tree-based Pattern History Table Updating of 2-bit bimodal tree-node Two-bit bimodal branch predictor Not taken Tree-based Pattern History Table (Tee-PHT) 00 10 tag Predicted Path Tree N 01 T Taken Not taken Not taken 11 Taken N 00 Taken 11 T 10 01 Taken 11 00 10 T T Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 T 11 N N T 01 10 T N T N 01 T 10 00 11 Not taken N N T Old predicted path: NTNT New predicted path: TNTT Recently completed path: TNNN 01 11 NTNT 28 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 11 29 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 10 01 11 30 5 Tree-PHT with second level PHT (Node-PHT) for tree-node prediction Node Pattern History Table(Node-PHT) Tree-based Pattern History Table (Tee-PHT) tag Predicted Path アナウンス 01...1 n bits of local history Tree N 講義スライド,講義スケジュール www.arch.cs.titech.ac.jp 2-bit bimodal T 01...1 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 • global(g) • per-branch(p) • shared(s) 31 Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 32 6
© Copyright 2024 Paperzz