Instruction and Data Address Trace Compression Aleksandar Milenković (collaborative work with Milena Milenković and Martin Burtscher) Electrical and Computer Engineering Department The University of Alabama in Huntsville Email: [email protected] Web: http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~lacasa Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address trace compression Data address stride caches for data address trace compression Results Conclusions 2 Program Execution Traces Streams of recorded events Basic block traces Address traces Instruction words Operands Trace uses Computer architects for evaluation of new architectures Computer analysts for workload characterization Software developers for program tuning, optimization, and debugging 3 Instruction and Data Address Traces: An Example for(i=0; i<100; i++) { c[i] = s*a[i] + b[i]; sum = sum + c[i]; Dinero+ Execution Trace } Instruction Type Address Data Address @ 0x020001f4: mov r1,r12, lsl #2 2 0x020001f4 @ 0x020001f8: ldr r2,[r4, r1] 0 0x020001f8 0xbfffbe24 @ 0x020001fc: ldr r3,[r14, r1] 0 0x020001fc 0xbfffbc94 @ 0x02000200: mla r0,r2,r8,r3 2 0x02000200 @ 0x02000204: add r12,r12,#1 (1 >>> 0) 2 0x02000204 @ 0x02000208: cmp r12,#99 (99 >>> 0) 2 0x02000208 @ 0x0200020c: add r6,r6,r0 2 0x0200020c @ 0x02000210: str r0,[r5, r1] 1 0x02000210 @ 0x02000214: ble 0x20001f4 2 0x02000214 0xbfffbb04 4 Trace Issues Trace issues Traces tend to be very large Capture Compression Processing In terabytes for a minute of program execution Expensive to store, transfer, and use Effective reduction techniques: Lossless High compression ratio Fast decompression 5 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address trace compression Data address stride caches for data address trace compression Results Conclusions 6 Trace Compression General purpose compression algorithms Ziv-Lempel (gzip) Burroughs-Wheeler transformation (bzip2) Sequitur Trace specific compression techniques Tuned to exploit redundancy in traces Better compression, faster, can be further combined with general-purpose compression algorithms 7 Trace-Specific Compression Techniques Lossless Compression Instructions Instructions + data Link data addresses to dynamic basic block Offset Mache [Samples 1989], LBTC [Luo and John 2004] Replacing an execution sequence with its identifier - Acyclic path (WPP [Larus 1999], Time Stamped WPP [Zhang and Gupta 2001]) Control flow graph + trace of transitions QPT [Larus 1993] - N-tuple [Milenkovic, Milenkovic and Kulick 2003] [Pleszkun 1994], SBC [Milenkovic and Milenkovic, 2003] Offset + repetitions PDATS [Johnson, Ha and Zaidi 2001] Link data addresses to loop Regenerate addresses - Instruction (PDI [Johnson, Ha and Zaidi 2001]) Graph with number of repetitions in nodes Abstract execution [Hamou-Lhadj and Lethbridge 2002] [Eggers, et al. 1990], [Larus 1993] [Elnozahy 1999], SIGMA [DeRose, et al. 2002] Value Predictor VPC [Burtscher and Jeeradit 2003], TCGEN [Burtscher and Sam 2005] 8 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address traces Data address stride caches for data address traces Results Conclusions 9 Why Trace Compression in Hardware? Problem #1: Capture program traces In software: trap after each instruction or taken branch E.g., IBM’s Performance Inspector Slowdown > 100 times Multiple cores on a single chip + more detailed information needed (e.g., time stamps of events) Problem #2: debugging is far from fun Stop execution on breakpoints, examine the state Time-consuming, difficult, may miss a critical state leading to erroneous behavior Stopping the CPU may perturb the sequence of events making your bugs disappear => Need an unobtrusive real-time tracing mechanism 10 Trace Compression in Hardware Goals Small on-chip area and small number of pins Real-time compression (never stall the processor) Achieve a good compression ratio Solution A set of compression algorithms targeting on-the-fly compression of instruction and data address traces 11 Exploiting Stream and Strides Instruction address trace compression Limited number and strong temporal locality of instruction streams => Replace an instruction stream with its identifier Data address trace compression Spatial and temporal locality of data addresses => Recognize regular strides CINT 164.gzip 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 300.twolf #Streams Max.L Dyn.SL 1437 229 13.6 30162 315 11.4 1181 88 7.4 5347 191 13.3 6116 189 10.0 4389 169 13.7 11542 868 11.8 3530 284 11.1 8254 126 11.0 4902 185 14.4 CFP #Streams Max.L Dyn.SL 168.wupwise 1912 229 27.4 171.swim 1839 707 130.8 172.mgrid 1725 1944 420.8 173.applu 1752 3162 462.4 177.mesa 1938 550 18.15 178.galgel 4153 264 21.8 179.art 976 561 9.0 183.equake 1355 623 27.7 188.ammp 1810 422 38.5 189.lucas 1414 427 113.3 191.fma3d 5007 1158 34.3 200.sixtrack 6515 580 170.5 301.appsi 2989 894 50.7 12 Trace Compressor: System Overview Processor Core System Under Test Processor Core Task Switch Data Address Program Counter Data Address Buffer PC DA Memory Stream Cache (SC) Data Address Stride Cache (DASC) Trace Compressor SCIT Trace port External Trace Unit for Storing/Processing (PC or Intelligent Drive) SCMT Predictor + Byte rep. FSM DT DMT Byte rep. FSM Trace Output Controller To External Unit 13 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address traces Data address stride caches for data address traces Results Conclusions 14 Stream Detector + Stream Cache 0x020001f4 0x020001f8 ... 0x02000214 PC Stream Cache (SC) PPC SA NWAY - 1 … SL iWay 1 - Instruction Stream S.SA S.L Buffer 0 =! 4 ’00…0’ reserved 1 F(S.SA, S.SL) S.SA & S.L (0x020001f4,0x09) 0 iWay 0x0E i SA iSet NSET - 1 Hit/Miss 0x00 // it. 0 SCIT Stream Cache Stream Cache 0x0E // it. 1 Index Trace Miss Trace SA SA =? L S.SA & S.L From Instruction Stream Buffer (0x020001f4,0x09) SCMT (SA, SL) 0x0E // it. 99 15 SC Itrace Compression Compress instruction stream 1. Get the next instruction stream record from the instruction stream buffer(S.SA, S.SL); 2. Lookup in the stream cache with iSet = F(S.SA, S.SL); 3. if (hit) 4. Emit(iSet && iWay) to SCIT; 5. else { 6. Emit reserved value 0 to SCIT; 7. Emit stream descriptor (S.SA, S.SL) to SCMT; 8. Select an entry (iWay) in the iSet set to be replaced; 9. Update stream cache entry: SC[iSet][iWay].Valid = 1 SC[iSet][iWay].SA = S.SA, SC[iSet][iWay].SL = S.SL;} 10. Update stream cache replacement indicators; Design Decisions: Instruction Stream Buffer size Not to stall processor (e.g., have consecutive very short instruction streams) Stream cache Size Associativity Replacement policy Mapping function 16 SC Itrace Compression: An Analytical Model Size( Dinero.I ) Size( SCIT ) Size( SCMT ) Size( Dinero.I ) N 4 Bytes CR( SC.I ) N log 2 ( N SET NWAYS ) Bytes SL.Dyn 8 N Size( SCMT ) (1 SC.HitN SET NWAYS ) 5 Bytes SL.Dyn 4 SL.Dyn CR( SC.I ) 1 log 2 ( N SET NWAYS ) 5 (1 SC.HitN SET NWAYS ) 8 Size( SCIT ) Lim (CR( SC.I )) SC.Hit 1 Lim SC.Hit 1 N SET NWAYS 256 N SET NWAYS 128 N SET NWAYS 64 32 SL.Dyn log 2 ( N SET NWAYS ) Lim (CR( SC.I )) 4 SL.Dyn Legend: CR(SC.I) – compression ratio N – number of instructions SL.Dyn – average stream length (dynamic) SC.Hit(Nset,Nway) – SC hit rate Assumptions: stream length < 256 (1 byte for SL) 4 bytes for stream starting address SC.Hit 1 Lim (CR( SC.I )) 4.57 SL.Dyn SC.Hit 1 Lim (CR( SC.I )) 5.34 SL.Dyn SC.Hit 1 17 2nd Level Itrace Compression Size(SCIT) >> Size(SCMT) Redundancy in SCIT HitRate = 98%, 8-bit index => Size(SCIT) = 10*Size(SCMT) Temporal and spatial locality of instruction streams Reduce SCIT trace Global Predictor N-tuple compression using Tuple History Table N-tuple compression using SCIT History Buffer 18 Global Predictor Structure SCIT Trace next.sid History Buffer Predictor ... 0 F pindex MaxP-1 ==? ’0’ ’1’ SCIT PRED Trace Hit/Miss SCIT PRED Miss Trace 19 SCIT Compression Predict SCIT index 1. Get the incoming index, next.sid, from the SCIT trace 2. Calculate the SCIT predictor index, pindex, using indices in the History buffer pindex = F (indices in the History Buffer); 3. Perform lookup in the SCIT Predictor with pindex; 4. if(SCIT.Predictor[pindex] == next.sid) 5. Emit(‘1') to SCIT PRED trace; 6. else { 7. Emit(‘0’) to SCIT PRED trace; 8. Emit next.sid to SCIT Miss PRED trace; 9. SCIT.Predictor[pindex] = next.sid; } 10. Shift in the next.sid to the History Buffer; Design Decisions: Length of history buffer Global predictor Size Mapping function 20 Redundancy in SCIT Pred Trace High predictor hit rates and long runs of 0xFF bytes are expected in Predictor Hit Trace Use a simple FSM to exploit byte repetitions PRED Hit Trace Prev.BYTE CNT =? SCIT PRED Header SCIT PRED Repetition Trace // Detect byte repetitions in SCIT pred 1. Get next SCIT Pred byte, Next.BYTE; 2. if (Next.BYTE == Prev.BYTE) CNT++; 3. else { 4. if (CNT == 0) { 5. Emit Prev.BYTE to SCIT.REP.Trace; 6. Emit ‘0’ to SCIT Header; 7. } else { 8. Emit (Prev.BYTE, CNT) pair to SCIT.REP.Trace; 9. Emit ‘1’ to SCIT Header;} 10. Prev.BYTE = Next.BYTE;} 21 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address traces Data address stride caches for data address traces Results Conclusions 22 Data Address Trace Compression More challenging task Data addresses rarely stay constant during program execution However, they often have a regular stride => Use Data Address Stride Cache (DASC) to exploit locality of memory referencing instructions and regularity in data address strides 23 Data Address Stride Cache Data Address Stride Cache (DASC) 0x020001f8 DASC Tagless structure Indexed by PC of the corresponding instruction Entry fields LDA – Last Data Address Stride PC 0 1 G(PC) index 0xbfffbe24 0xbfffbe20 0xbfffbe1c … … … … LDA Stride i N-1 DA DA-LDA ’0’ ’1’ ==? Stride.Hit Stride.Hit 0xbfffbe24 0xbfffbe20 DT (Data trace) DMT Data Miss Trace 0 0 1 24 DASC Compression // Compress data address stream 1. Get the next pair from data buffers (PC, DA) 2. Lookup in the data address stream cache indexSet = G(PC); 3. cStride = DA - DASC[iSet].LDA; 4. if (cStride == DASC[iSet].Stride) { 5. Emit(‘1’) to DT; //1-bit info 6. } else { 7. Emit(‘0’) to DT; 8. Emit DA to DMT; 9. DASC[iSet].Stride =lsb(cStride); } 10. DASC[iSet].LDA = DA; Design Decisions: Number of entries Index function G Stride length Data address buffer depth 25 DASC Dtrace Compression: An Analytical Model Size( Dinero.D) Size( DT ) Size( DMT ) Size( Dinero.D) N m em ref 4 B CR( SC.D) Size( DT ) Size( DMT ) N m em ref [(1 DASC.Hit ) 4 0.125)] B CR( SC.D) 1 1.03125 DASC.Hit Lim (CR( SC.D)) DASC. Hit 1 1 32 0.03125 Legend: CR(SC.D) – compression ratio Nmemref – number of memory referencing instructions DASC.Hit – DASC hit rate Assumptions: 4 bytes for stream starting address 26 Redundancy in DT Trace High predictor hit rates and long runs of 0xFF bytes are expected in DT Trace Use a simple FSM to exploit byte repetitions DT Prev.DT CNT =? Data Header (DH) Data Repetition Trace (DRT) // Detect data repetitions 1. Get next DT byte; 2. if (DT == Prev.DT) CNT++; 3. else { 4. if (CNT == 0) { 5. Emit Prev.DT to DRT; 6. Emit ‘0’ to DH; 7. } else { 8. Emit (Prev.DT, CNT) pair to DRT; 9. Emit ‘1’ to DH;} 10. Prev.DT = DT;} 27 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address traces Data address stride caches for data address traces Results Conclusions 28 Experimental Evaluation Goals Assess the effectiveness of the proposed algorithms Explore the feasibility of the proposed hardware implementations Determine optimal size and organization of HW structures Workload 16 MiBench benchmarks ARM architecture cjpeg djpeg lame tiff2bw tiff2rgba tiffmedian tiffdither mad sha bf_e rijndael_e ghostscript rsynth stringsearch adpcm_c gsm_d IC 104,607,812 23,391,628 1,285,111,635 143,254,646 151,691,275 541,260,067 832,951,018 286,974,899 140,885,982 544,053,846 319,977,971 708,090,638 824,942,227 3,675,745 732,513,651 1,299,270,245 NUS maxSL SL.Dyn 1636 239 10.89 1324 206 21.81 3410 252 27.81 1058 43 12.79 1146 75 27.54 1431 75 22.22 1831 51 12.57 1659 1055 20.09 495 62 15.15 413 300 5.85 542 254 18.94 6900 187 8.70 1323 180 15.77 439 62 5.61 347 71 54.63 845 401 11.07 Legend: • IC – Instruction count • NUS – Number of unique instruction streams • maxSL – Maximum stream length • SL.Dyn – Average stream length (dynamic) 29 Findings about SC Size/Organization Good compression ratio Outperforms fast GZIP High stream cache hit rates for all application (>98 %) Smaller SCs work well too CR(SC.I) Entries 8 16 32 64 128 256 Replacement policy Pseudo-LRU vs. FIFO Ways 1 16.3 21.1 23.9 27.5 29.0 28.0 2 17.6 22.1 28.0 36.9 47.6 47.8 4 17.0 27.8 34.4 44.1 54.1 53.6 8 15.8 26.6 34.0 47.1 57.4 54.2 CR=f(Complexity), 4-way SC 1.2 Associativity 4-way is a reasonable choice 8-way and 16-way desirable Mapping function S.SA<5+n:6> xor S.L<n-1:0> n=log2(NSET) 1 CR/MaxCR 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 #SC entries 30 Findings about Global Predictor Number of entries should not exceed the number of entries in SC Having longer histories and larger predictors gives only marginal improvements for all applications except ghostscript, blowfish, and stringsearch CR(SC+GP.I) Pred. entries SC Entries P32 P64 P128 P256 8x4 47.64 16x4 72.17 81.19 32x4 91.91 113.22 145.79 64x4 100.32 115.09 150.54 207.64 History length = 1 Index GPRED using the previous SCIT index 31 Putting It All Together (SC+GPRED+BREP): Itrace Compression SC,GPRED 8x8,64 16x8,128 32x8,256 64x4,256 CR 277.1 315.0 316.7 263.7 cjpeg 492.3 539.4 443.3 287.1 djpeg 250.6 255.2 238.6 214.0 lame 1493.0 3062.2 1111.5 351.5 tiff2bw 1834.0 3592.0 3713.1 517.6 tiff2rgba 1601.2 1827.4 1229.4 649.4 tiffmedian 154.3 184.8 120.9 54.8 tiffdither 253.4 257.2 230.4 221.0 mad 322.3 322.4 339.6 348.5 sha 92.6 92.6 100.2 100.2 bf_e 285.6 290.1 298.6 142.1 rijndael_e 119.4 123.6 106.4 30.4 ghostscript 211.5 246.0 152.8 97.0 rsynth 74.9 114.0 78.5 21.8 stringsearch 29972.5 28663.9 27457.8 27456.6 adpcm_c 376.0 401.2 292.3 234.9 gsm_d 237.8 254.4 209.0 113.2 TOTAL DEF. I.GZ 109.6 71.8 60.5 114.1 121.3 152.8 91.1 73.5 211.4 170.4 143.8 100.6 46.7 82.1 233.1 85.4 87.5 FAST I.GZ 54.5 39.8 128.5 83.9 20.3 92.3 46.4 37.8 54.4 41.0 12.6 39.7 30.6 32.3 107.3 59.2 47.2 BEST BEST DEF. I.GZ I.BZ2 GZGZ 265.7 124.5 342.0 232.5 73.7 202.0 174.2 87.6 333.9 615.2 114.4 376.8 122.0 529.6 1292.7 155.5 472.9 1017.5 147.1 99.8 170.9 206.2 94.3 78.5 221.8 656.5 4112.1 182.3 352.0 4065.9 150.6 141.8 2392.9 434.5 111.2 212.5 191.2 48.0 143.2 132.8 100.6 202.5 233.6 1862.6 12764.7 507.1 87.2 165.6 321.6 112.9 172.0 32 Findings about DASC Stride size 1 byte is optimal 2 byte stride improves compression for 10% DASC with 1K entries is an optimal choice Tagged (multi-way) DASC further improves overall compression ratio CR=f(Complexity) 7 6 5 CR 4 3 2 1 0 0 1000 2000 3000 4000 5000 # DASC entries Increased complexity 33 DASC Compression Ratio DASC DASC 32 64 cjpeg 3.35 4.60 djpeg 2.81 3.57 lame 1.20 1.52 tiff2bw 76.31 78.04 tiff2rgba 5.98 79.81 tiffmedian 8.64 8.70 tiffdither 2.61 6.08 mad 1.30 1.59 sha 6.58 7.94 bf_e 1.58 1.95 rijndael_e 1.10 1.10 ghostscript 1.07 1.19 rsynth 1.22 1.36 stringsearch 1.80 2.04 adpcm_c 3.13 3.13 gsm_d 2.67 4.48 TOTAL 1.66 2.04 DASC DASC DASC DASC DEF. FAST BEST 128 256 512 1024 D.GZ D.GZ D.GZ D.BZ2 D.GZGZ 5.14 5.77 6.54 7.11 5.98 4.50 6.11 18.20 9.57 4.28 4.96 5.22 5.29 4.22 3.78 4.22 8.62 4.92 2.81 3.82 4.49 4.88 6.56 4.01 6.63 8.80 8.60 84.28 105.04 128.84 134.23 2.14 2.55 2.10 14.28 3.07 91.24 107.49 127.05 139.57 2.10 2.79 2.09 4.06 4.03 8.74 8.81 8.87 8.89 4.40 4.37 4.53 11.16 6.03 7.21 8.69 9.65 10.06 4.51 4.41 4.51 7.87 6.77 1.96 2.07 2.35 2.64 4.08 3.60 4.22 13.47 6.97 9.38 10.79 11.36 11.36 44.91 8.36 45.61 172.71 591.69 2.38 2.61 2.75 2.91 7.58 4.86 7.83 16.35 9.08 1.10 1.13 1.29 2.06 4.24 3.22 4.27 7.31 4.49 1.56 2.19 2.93 5.27 27.21 18.58 27.46 47.42 40.83 1.76 3.81 8.30 32.43 24.44 21.46 25.27 57.40 43.88 2.70 4.13 4.44 5.16 11.12 8.57 11.23 15.03 11.47 3.13 3.13 3.13 3.13 6.57 3.64 7.15 12.27 11.42 11.30 13.60 14.81 16.78 21.60 18.05 23.29 63.53 33.15 2.80 3.77 4.67 6.12 6.78 5.51 6.90 13.29 9.70 34 Hardware Complexity Estimation CPU model SC and DASC timings In-order, Xscale like Vary SC and DASC parameters SC: Hit latency = 1 clock, Miss latency = 2 clocks DASC: Hit latency = 2 clocks Miss latency = 2 clocks To avoid any stalls Component Entries Complexity Bytes Instruction stream buffer 2 2x5 10 Stream detector 2 2x4 8 64x4 256x5 1280 256 256 + 1(h) 257 Data address buffer 8 8x8 64 Data address stride cache 1024 1024x5 5120 - 4 4 Stream cache Global Predictor Instruction stream input buffer: Byte repetition state machines MIN = 2 entries Data address input buffer: MIN = 8 entries Results are relatively independent of SC and DASC organization 35 Trace Port Bandwidth Analysis CJPEG 1.6 1.4 1.2 1.2 1 1 bits/instr. bits/instr. CJPEG SC SC+PRED SC+PRED+BREP 1.4 0.8 0.6 TDASC TDASC+BREP 0.8 0.6 0.4 0.4 0.2 0.2 0 0 1 21 41 61 81 101 1 21 Instructions Executed (millions) 41 61 81 101 Instruction Executed (millions) MAD MAD 0.8 SC SC+PRED SC+PRED+BREP 0.7 1.6 1.4 0.5 bits/instr. bits/instr. 0.6 1.8 0.4 0.3 1.2 TDASC TDASC+BREP 1 0.8 0.6 0.4 0.2 0.2 0.1 0 0 1 1 41 81 121 161 201 241 281 41 81 121 161 201 241 281 Instruction Executed (millions) Instructions Executed (millions) 36 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address traces Data address stride caches for data address traces Results Conclusions 37 Conclusions A set of algorithms and hardware structures for instruction and data address trace compression Benefits Stream Caches + Global Predictor + Byte repetition FSM for instruction traces Data Address Stride Cache + Byte repetition FSM for data traces Enabling real-time trace compression with high compression ratio Low complexity (small structures, small number of external pins) Analytical & simulation analysis focusing on compression ratio and optimal sizing/organization of the structures as well as real-time trace port bandwidth requirements 38 Laboratory for Advanced Computer Architectures and Systems at Alabama: Research Overview Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in Huntsville Email: [email protected] Web: http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~lacasa Secure Processors Software & physical attacks Computer Security is Critical Today Tomorrow Original Code Encrypt Generate Program Keys (Key1,Key2,Key3) Encrypt I-Block Calculate Signature Signed Code Secure Mode EKey.Cpu(Key1) EKey.Cpu (Key2) EKey.CPU(Key3) Decrypt Signature Improvements Trusted Code Program Keys (Key1,Key2,Key3) Secure Execution EKey3(I-Block) Instruction Fetch Decrypt Calculate Signature =? Signature Fetch Multiple format string overflow. Sign & Verify for Guaranteed Integrity and Confidentiality of Code Program Loading MMClient.exe in Indiatimes vulnerabilities in (1) neon Multiple heap-based Messenger 6.0 allows remote 0.24.4 and earlier, and other attackers tobuffer cause a overflows denial of in the imlib products that use neon including service (application crash) and allow remote BMP image handler Cadaver, (3) Subversion, (2) Stack-based buffer and possibly execute (4) OpenOffice, allow remote attackers arbitrary to execute overflow in the URL parsing malicious WebDAV servers to code via arbitrary a long group name code via a crafted function in Gaim before 1.3.0 Buffer overflow execute in argument to thefile. RenameGroup arbitrary code. BMP allows remote attackers to WIDCOMM Bluetooth Connectivity function in the execute Software, as used in products such arbitrary code MMClient.MunduMessenger.1 viaand an instant message (IM) with a as BTStackServer 1.3.2.7 ActiveX object. large URL. 1.4.2.10, Windows XP and Integer overflow Windows 98 in with MSI Bluetooth Multiple buffer overflows in Buffer overflow inDongles, the JPEGand(ioHP IPAQ5450 pixbuf_create_from_xpm (JPG) parsing engine running WinCE 3.0, allowsRealOne remote Player, RealOne Player 2.0, xpm.c) in in thethe XPM image decoder RealOne Enterprise Desktop, and Microsoft Graphic Device Interface for gtk+ 2.4.4 (gtk2) and earlier, attackers to execute RealPlayer Enterprise allow remote Plus (GDI+)and component, gdk-pixbuf before 0.22, allows arbitrary code via certain GDIPlus.dll,remote allows remote attackers to execute arbitrary attackers to execute service requests. attackers toarbitrary execute code via certain code via malformed (1) .RP, (2) .RT, (3) .RAM, (4) .RPM or (5) .SMIL files. arbitraryn_col code JPEG that enable a andvia cppa values image. heap-based buffer Yesterday Secure Installation Buffer overflow in I-Block PMAC (Parallel MACs) for reduced cryptographic latency A variation of the one-time-pad for code encryption Instruction Verification Buffer for conditional execution before verification Signature Match http://www.ece.uah.edu/~lacasa/research.htm#secure_processors 40 Microbenchmarks for Architectural Analysis Small programs for uncovering architectural parameters (usually not publicly disclosed) of modern processors Architecture-aware compiler optimization Processor design evaluation and verification Testing Competitive analysis Relatively simple, so their behavior can be understood Microbenchmarks BTB Size BTB Benefits Outcome Predictor BTB Org. BTB Indexing ... Results Local History Performance Counters Branch related events Global History Challenge ... Microbenchmarks for BTB analysis Experimental flow for outcome predictor Tested on P6 and NetBurst (Northwood core) Dothan (PentiumM) predictor http://www.ece.uah.edu/~lacasa/bp_mbs/bp_microbench.htm 41 TinyHMS Concept Prototype Software PS (PDA) User Interface Network Coordinator (Telos) ActiS WWAN/WLAN Communication Protocol ActiS (Tmote sky) ActiS Application Layer ActiS Protocol Flash Storage Messaging Control Interface (USB/CF) Storage Signal Processing Sensor Interface Interface (USB/CF) TimeSync Main Control (Messaging, Fusion, Buffering) Flash Storage Wireless Transceiver TimeSync Messaging Buffering IAS/ISPM ActiS Interface Filtering/ Pre-processing Data Acquisition Wireless Transceiver http://www.ece.uah.edu/~lacasa/research.htm#tinyHMS 42 2000 TinyHMS 1000 0 105 1.5 x 10 105.2 105.4 105.6 105.8 106 106.2 106.4 106.6 106.8 accX accY accZ 107 105.4 105.6 105.8 106 106.2 106.4 106.6 106.8 107 105.4 105.6 105.8 106 106.2 106.4 106.6 106.8 107 4 Motion Sensor (TS2) 1 0.5 105 105.2 4000 ECG Sensor (TS1) 3000 2000 1000 105 105.2 Heart Beat Beacon Message Heart Beat Event Message with Timestamp Heart Beat Step Step Beacon Message … NC TS1 TS2 TS3 … NC Frame i-1 TS1 TS2 TS3 Frame i 43
© Copyright 2026 Paperzz