VLSI Project Neural Networks based Branch Prediction Alexander Zlotnik Marcel Apfelbaum Supervised by: Michael Behar, Spring 2005 Introduction Branch Prediction has always been a “hot” topic 20% of all instructions are branches Correct prediction makes faster execution Misprediction has high costs Classic predictors are based on 2 bit counter state-machines taken not-taken 00 SNT taken 01 WNT not-taken taken 11 ST 10 WT not-taken taken not-taken VLSI Project Spring 2005 2 Introduction (cont.) Modern predictors are 2 level and use 2 bit counters and branch history (local\global) Known problems are: • Memory size exponential to history length • Too long history can cause errors Recent studies explore Branch Prediction using Neural Networks VLSI Project Spring 2005 3 Project Objective Develop a mechanism for branch prediction Explore the practicability and applicability of such mechanism and explore its success rates Use of a known Neural Networks technology: The Perceptron Compare and analyze against “old” predictors VLSI Project Spring 2005 4 Project Requirements Develop for SimpleScalar platform to simulate OOOE processors Run developed predictor on accepted benchmarks C language No hardware components equivalence needed, software implementation only VLSI Project Spring 2005 5 Background and Theory Perceptron VLSI Project Spring 2005 6 Background and Theory (cont.) Perceptron Training Let ө=training threshold t=1 if the branch was taken, or -1 otherwise x=history vector if (sign( yout ) != t) or |yout|<= ө then for i := 0 to n do wi := wi + t*xi end for end if VLSI Project Spring 2005 7 Development Stages 1. 2. 3. 4. 5. 6. 7. Studying the background Learning SimpleScalar platform Coding a "dummy" predictor and using it to be sure that we understand how branch prediction is handled in the SimpleScalar platform Coding the perceptron predictor itself Coding perceptron behavior revealer Benchmarking (smart environment) A special study of our suggestion regarding perceptron predictor performance VLSI Project Spring 2005 8 Principles Branch prediction needs a learning methodology, NN provides it based on inputs and outputs (patterns recognition) As history grows, the data structures of our predictor grows linearly. We use a perceptron to learn correlations between particular branch outcomes in the global history and the behavior of the current branch. These correlations are represented by the weights. The larger the weight, the stronger the correlation, and the more that particular branch in the history contributes to the prediction of the current branch. The input to the bias weight is always 1, so instead of learning a correlation with a previous branch outcome, the bias weight learns the bias of the branch, independent of the history. VLSI Project Spring 2005 9 Design and Implementation VLSI Project Spring 2005 10 Hardware budget History length Long history length -> less perceptrons Threshold The threshold is a parameter to the perceptron training algorithm that is used to decide whether the predictor needs more training. Representation of weights Weights are signed integers. Nr of bits = 1 + floor(log(Θ)). VLSI Project Spring 2005 11 Algorithm Fetch stage 1. The branch address is hashed to produce an index i Є 0..n - 1 into the table of perceptrons. 2. The i-th perceptron is fetched from the table into a vector register, of weights P. 3.The value of y is computed as the dot product of P and the global history register. 4.The branch is predicted not taken when y is negative, or taken otherwise. VLSI Project Spring 2005 12 Algorithm (cont.) Execution stage 1. Once the actual outcome of the branch becomes known, the training algorithm uses this outcome and the value of y to update the weights in P (training) 2. P is written back to the i-th entry in the table. VLSI Project Spring 2005 13 Simulation Results In all parameters Perceptron based predictor outran the GSHARE Simulation done over Benchmarks of VPR, Perl, Parser from the ss_spec2k VLSI Project Spring 2005 14 Simulation Results (cont.) GSHARE on VPR Neural on VPR 1 1 0.99 0.99 0.9869 0.98 0.9781 0.9785 0.9737 0.9716 0.97 0.9626 0.9644 0.96 0.95 0.9859 0.9855 15/128, 11520 15/256, 23040 15/512, 46080 0.9879 0.9875 15/1024, 92160 15/2048, 184320 0.98 0.9785 Prediction Rate Prediction Rate 0.9773 0.9863 0.97 0.96 0.95 0.9487 0.94 0.94 0.9325 0.93 0.93 8, 512 9, 1024 10, 2048 11, 4096 12, 8192 13, 16384 14, 32768 15, 65536 16, 131072 17, 262144 15/64, 5760 GHr, Memory GHr/Percpetrons, Memory VLSI Project Spring 2005 15 Simulation Results (cont.) GSHARE on VPR - Instructions Per Cycle NEURAL on VPR - Instructions Per Counter 2 2 1.96 1.96 1.9362 1.928 1.9313 1.937 1.9375 15/1024, 92160 15/2048, 184320 IPC 1.92 IPC 1.92 1.9311 1.88 1.8719 1.8674 1.8538 1.877 1.8782 1.8794 1.8793 1.88 1.8533 1.84 1.84 1.8283 1.8 1.8013 8, 512 1.8 9, 1024 10, 2048 11, 4096 12, 8192 13, 16384 14, 32768 15, 65536 16, 131072 17, 262144 15/64, 5760 GHr, Memory 15/128, 11520 15/256, 23040 15/512, 46080 GHr/Percpetrons, Memory VLSI Project Spring 2005 16 Simulation Results (cont.) Perceptron Prediction by GHr 0.992 0.99 Prediction rate 0.988 256 64 0.986 1024 2048 0.984 0.982 0.98 10 15 20 25 30 GHr size VLSI Project Spring 2005 17 Special Problems Software simulation of hardware • Utilizing existing data structures of SimpleScalar Compiling self written programs for SimpleScalar • After several weeks of hard work we decided to use accepted benchmarks VLSI Project Spring 2005 18 Summary We implemented a different branch prediction mechanism and received exciting results Hardware implementation of the mechanism is hard, but possible Longer history in perceptron helps getting better predictions VLSI Project Spring 2005 19
© Copyright 2026 Paperzz