計算機アーキテクチャ特論 (Advanced Computer Architectures)

レポート(1): 分岐予測の実装と評価
„
計算機アーキテクチャ特論
(Advanced Computer Architectures)
gshare分岐予測を実装し,その予測ミス率を測定せよ.また,bimodal
分岐予測との予測精度(20本のベンチマークのミス率の算術平均)の比
較を示せ.
„
„
„
„
6.分岐予測(3)
„
ハードウェア量を 2KB, 4KB, 8KB, 16KB, 32KB, 64KBとしてグラフを描け.
予測ミス率が低い(性能が高い)と高得点.
1月 最初の講義の開始時にレポートを提出
„
„
„
„
„
1
ハードウェア量を 2KB, 4KB, 8KB, 16KB, 32KB, 64KBとしてグラフを描け.
パーセプトロン分岐予測 を工夫して実装し,予測ミス率を測定せよ.
講義ホームページから bpkit03.tgz をダウンロード
コードの説明(コードは少ないほどベター),工夫した点
ハードウェア量の計算方法を明示
ミス率のグラフ(表ではなく,見やすいグラフにしてください)
考察と感想
Gshare (TR-DEC 1993)
レポート(1):分岐予測の実装と評価
„
トレースデータ,命令アドレスと分岐結果の系列
„
/***** BPKit 0.5 trace file *****/
//trace_name________: CBP1-IT1
//total_branches____: 4184792
//total_instructions: 29499987
004058fb 0
00405910 0
while(!gzeof(gzfp)){
0040591c 0
gzgets(gzfp, buf, BUFSIZE);
sscanf(buf, "%x %d", &pc, &taken);
00405925 0
0040592e 0
bp_predict(pc, NULL, &pred); /* prediction */
0040593a 0
bp_regist(pc, taken, NULL); /* update storage */
00405944 0
if(pred==taken) hit++; else miss++;
0040594b 0
}
0040492d 1
0040494f 0
グローバル分岐履歴と分岐アドレスとの排他的論理和によりパターン履歴表
へのインデックスを作成
„
„
パターン履歴表は2ビット飽和型カウンタの配列で,選択された2ビットカウンタの
値により分岐方向を予測(bimodalと同じ)
分岐結果を用いて,予測に利用したカウンタを更新
010101000 (シフトレジスタ)
Program
Counter
„
„
1
w0
w1
w2
...
Untaken
Taken
Untaken
Strongly
Untaken (00)
2 bit
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
„
分岐アドレスによりパーセプトロンを選択
„
グローバル分岐履歴を用いて予測
„
分岐ミス,|y| < θ の時にトレーニング
„
y の値がある閾値より高い場合に成立と予測する.
x2
Weakly
Untaken (01)
Prediction
Weakly
Taken (10)
4
パーセプトロン分岐予測器 (HPCA 2001)
w は符号付きの整数で表現
x1
Taken
Untaken
Untaken
3
x1からxnまでのnビットの分岐履歴を入力とする.
y を計算する.
„
Strongly
Taken (11)
Taken
n
パーセプロロンモデル
„
Taken
Pattern History Table (PHT)
2n entry
XOR
パーセプトロン
„
Branch History
Register (BHR)
m
n
…
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
2
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
8KBのハードウェア量で,34ビットの分岐履歴
Program Counter
xn
Branch History (x)
wn
…
y
y Compute y
Prediction
Perceptron Model
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
Selected
Perceptron
Table of Perceptrons (w)
5
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
6
1
Partial Pattern Matching (CBP 2004)
パターンマッチング
予測: 0 or 1?
分岐履歴
Table 3
Table 4
pc
pc h[0:9]
Table 2
Table 1
Table 0
pc h[0:19]
pc h[0:39]
pc h[0:79]
12
0
1
0
3b m
ctr
最長一致パターン
3b
ctr
hash
hash
10
8
8 bit
tag
u
hash
3b
ctr
8
1
Prediction
hash
8
8 bit
tag
u
3b
ctr
1
hash
8
8 bit
tag
u
3b
ctr
1
8
8 bit
tag
u
8
=?
1
hash
10
8
=?
1
hash
10
8
=?
0
1
0
hash
10
=?
1
1
1
1
1
„
„
過去の履歴との最長一致のパターンを見つける.
1
それぞれの最長一致パターンの次に出現する0と1の回数を
数え,多い方を予測値とする.
1
CBP2004プレゼンテーションスライド
7
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
prediction 0/1
8
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
分岐予測のまとめ
„
重要な分岐予測の根底にあるアイデアとその構成を解説
Filter
BTB, de-alias PACT 1996
hash function, hybrid
Bimodal
ISCA 1981
Agree
ISCA 1997
Gskew, E-gskew
ISCA 1997
PPM
CBP 2004
2Bc-gskew
TR-INRIA 1999
Prophet-Critic Hybrid Branch Prediction
hybrid
Two-level adaptive
MICRO 1991
Gshare
TR-DEC 1993
branch history, hash function
Bimode
YAGS
MICRO 1997 MICRO 1998
de-alias
Perceptron Fast-Neural
HPCA 2001 MICRO 2003
Ayose Falcon, UPC, Jared Stark, Intel, Alex Ramirez,
UPC, Konrad Lai, Intel, Mateo Valero, UPC
ISCA-31 pp. 250-261 (2004)
table element
Variable Length Path
ASPLOS 1998
history length optimization
ISCA (International Symposium on Computer Architecture)
MICRO (International Symposium on Microarchitecture)
PACT (International Conference on Parallel Architectures and Compilation Techniques)
ASPLOS (International Conference on Architectural Support for Programming Languages and
Operating Systems)
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
9
Introduction
研究の目的と概要
„
今までにない,斬新なアイデアの分岐予測器を提案
„
分岐予測器の予測精度の向上,プロセッサの性能向上を目指す.
„
予測ミス率を大幅に低減する.
„
„
„
10
Conventional predictors are analogous to a taxi with just one driver.
未来の情報を利用できないか?
He gets the passenger to the destination using knowledge of the
roads acquired from previous trips; i. e., using history information
stored in the predictor’s memory structures.
Gccの場合には,予測ミス率を 3.11% から 1.23%に削減
When he reaches an intersection, he uses this knowledge to decide
which way to turn.
Intel Pentium 4 プロセッサをベースとして,uPC (Uops Per Cycle)
で5.2%の性能向上 (Perceptron分岐予測の場合).
The driver accesses this knowledge in the context of his current
location.
Program Counter
Modern branch predictors access it in the context of the current
location (the program counter) plus a history of the most recent
decisions that led to the current location.
Branch
predictor
Branch History
Register (BHR)
Taken or
Untaken
従来の分岐予測器の枠組み
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
11
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
12
2
Introduction
Prophet/critic hybrid の構造
Prophet/critic hybrids are analogous to a taxi with two drivers: the
front-seat and the back-seat.
„
Prophet: 予言者
„
Critic: 批判家,鑑定家
The front-seat driver has the same role as the driver in the singledriver taxi. This role is called the prophet.
„
The back-seat driver has the role of critic. She watches the turns the
prophet makes at intersections. She doesn’t say anything unless she
thinks he’s made a wrong turn. When she thinks he’s made a wrong
turn, she waits until he’s made a few more turns to be certain they
are lost. (Sometimes the prophet makes turns that initially look
questionable, but, after he makes a few more turns, in hindsight
appear to be correct.) Only when she’s certain does she point out
the mistake.
To recover, they backtrack to the intersection where she believes
the wrong-turn was made and try a different direction.
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
13
„
„
„
„
„
Prophetは,分岐命令Dの方
向を,・・・XYZABCの履歴を
用いて予測する.
Criticは,分岐命令Zの方向
を,・・・XYZABCDの情報を
用いて鑑定する.
Criticは,Zの予測を鑑定する
が,それには,Zの予測時刻
より後の(Zを基準として未来
の)情報を利用できる.
ある予測を前提に処理を進
めたところ矛盾が生じる.
前提がおかしいと判断.
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
14
Decoupled front-end architecture, ISCA-1999
Prophet/critic hybrid の予測の例
„
„
基本ブロック 0, 1, 4, 7 が
正しい制御の流れとする.
分岐 A の予測を考える.
Branch History:
W, X, Y, Z
Branch Future:
A, C, G and J
Prophet
Critic
„
過去の履歴において,
W, X, Y, Z | A=T, C=T,
G=T, J=T
の場合には,分岐予測ミス
が多いことを記録し,これ
を用いて鑑定する.
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
15
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
Decoupled front-end architecture を利用した
Prophet/critic hybrid の実装
評価環境
„
„
16
現実的な,サイクル当たり6命令を処理する高性能マイクロプロセッサ.
Prophet, Critic それぞれの予測機構には,従来の分岐予測機構の任意
の組み合わせを利用できる.
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
17
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
18
3
Future bit の重要性,
何回先の予測結果まで利用すればよいか?
„
„
„
„
„
予測精度の比較 (16KBのハードウェア資源)
Future bits = 0 の構成は従
来のハイブリッド分岐予測器
„
Future bits = 1 でも,予測
精度が向上する.
„
平均では,Future bits の数
を増やすことで予測精度が
向上する.
ただし,大きくしすぎることで
精度が低下するベンチマー
クが存在する.
19
„
„
20
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
Agree/Disagree ?
予測精度の比較 (32KBのハードウェア資源)
単体では
perceptron
がベスト.
„
„
更に,提案
手法により
26.8%の予
測ミスを削減
„
„
21
The critic’s critiques of the prophet’s predictions determine the
effectiveness of the prophet/critic algorithm.
We classify the final critique of a branch according to the prophet’s
prediction (i. e., correct or incorrect) and the critic’s critique (i. e.,
agree or disagree).
Ideally, the critic disagrees when the prophet mispredicts (incorrect
disagree). When the critic agrees with the prophet (correct agree if
the prophet didn’t mispredict; incorrect agree if it did), the final
prediction does not change, so there is neither gain nor harm.
However in both these cases, the use of the critic was needless
since there was no change in the final prediction; and for incorrect
agree, an opportunity was lost to correct a mispredict. The worst
case—and the case that should be minimized—is when the critic
disagrees with a prophet’s correct prediction (correct disagree).
22
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
プロセッサの性能向上の比較
Agree/Disagree ?
„
„
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
更に,提案
手法により
25.4%の予
測ミスを削減
Future bits は8ビット程度と
する.
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
単体では
perceptron
がベスト.
23
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
Perceptron がベ
スト.
Perceptron, 12
future bitの場合
には, 従来手法
と比較して,
5.2%の性能向
上
24
4
Prophet-Critic Branch Prediction まとめとコメント
„
今までにない,斬新なアイデアの分岐予測の提案と評価.
„
予測ミス率を大幅に低減する.
„
Completion Time Multiple Branch Prediction
for Enhancing Trace Cache Performance
Gccの場合には,予測ミス率を 3.11% から 1.23%に削減する.
Intel Pentium 4 プロセッサをベースとして,uPC (Uops Per Cycle) で
5.2%の性能向上 (Perceptron分岐予測の場合).
„
Ryan Rakvic et al.
コメント
„
„
アイデアとして面白い.
„
ハードウェアの規模と複雑さも問題にならないレベル.
„
アイデアは分岐予測以外の分野でも利用できるのではないか.
Carnegie Mellon University
ISCA-2000 pp.47-58
25
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
26
複数分岐予測 (Multiple Branch Predictor)
„
„
Tree-Based Multiple Branch Predictor (TMP)
これまでの分岐予測は,1サイクルに1つの分岐方向を予測
ハードウェア資源の増加にともなって,大規模なプロセッサが実現
可能
Tree-based Pattern History Table (Tee-PHT)
B-3
Taken
B-2
Tree
TTNT
Not Taken
„
フェッチする命令数を増やすためには,1サイクルに複数の分岐方
向を予測することが必要
„
B-1
Not Taken
複数分岐予測 (Multiple Branch Predictor)
B0
TNN
B1
Tree(i)
B0
B0
B1
Global history + PC
Predicted path
B2
B2
B3
B3
B4
B4
27
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
Tree-based Pattern History Table
Updating of 2-bit bimodal tree-node
Two-bit bimodal branch predictor
Not taken
Tree-based
Pattern History Table (Tee-PHT)
00
10
tag Predicted Path
Tree
N
01
T
Taken
Not taken
Not taken
11
Taken
N
00
Taken
11 T
10
01
Taken
11
00
10
T
T
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
T
11
N
N
T
01
10
T
N
T
N
01
T
10
00
11
Not taken
N
N
T
Old predicted path: NTNT
New predicted path: TNTT
Recently completed path: TNNN
01
11
NTNT
28
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
11
29
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
10
01
11
30
5
Tree-PHT with second level PHT (Node-PHT) for
tree-node prediction
Node Pattern History Table(Node-PHT)
Tree-based
Pattern History Table (Tee-PHT)
tag Predicted Path
アナウンス
01...1 n bits of local history
Tree
N
講義スライド,講義スケジュール
„ www.arch.cs.titech.ac.jp
2-bit
bimodal
T
01...1
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
„
• global(g)
• per-branch(p)
• shared(s)
31
Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005
32
6