Winning with Column Store in SQL Server 2016 – A Customer Story Rick Heiges, Data Platform MVP Principal Solutions Architect, DB Best Technologies, LLC Thank you to our sponsors! Thank You to Our Venue SQLPASS Chapter Volunteers for SQL Saturday #560 Our Speakers 3 THANK YOU!!! For Giving Up Your Saturday For Not Giving Up on that One Problem For Wanting to Continue to Learn For Not Wanting to Stagnate in Your Career For Going that Extra Mile to #SQLSat560! For NOT Going all those extra miles to Durban Denmark 4 BIO Rick Heiges Data Platform MVP Since 2007 Principal Solutions Architect – DB Best Technologies, LLC [email protected] PASS Involvement Member since 2001 Board of Directors 2003-2011 Initiated Summit Ambassador Program Initiated 24 Hours of PASS Frequent Speaker at Summit and SQL Saturday Events BLOG: http://sqlblog.com/blogs/rick_heiges @heigesr2 linkedin.com/in/rickheiges www.dbbest.com Follow us on Twitter @dbbest_tech Like us on Facebook facebook.com/DBBestTech SQL Server Migration Assistant (SSMA) • • • • Tool Developed by DB Best Microsoft acquired SSMA in 2005 DB Best Maintains / Updates SSMA SSMA for Oracle Extension Pack Support Data Migration Oracle Providers to enable Server to Server connectivity Provides user-defined functions that Emulate Oracle system functions which helps to decrease manual effort for conversion which may be substantial 7 Business Scenario • Manufacturing Plants Around the World • Hundreds of Measurements taken at various points on Assembly line for each unit • Data Collected and Stored in a Centralized Data Warehouse on a constant basis • Plant and Regional Managers Generate Reports on DW on hourly, daily, weekly, bi-weekly, monthly, quarterly, and adhoc basis • Often references data just collected 8 Business Scenario Graphic Plant Plant Plant ETL DW Plant Reports Plant Manager Plant 9 Initial Situation • Customer was running into Performance Issues in timely reporting and analysis on the DW • Customer started testing Oracle’s In-Memory Feature (Partition pinned in Memory) • Early testing was not impressive. Enabled, but not being used at this time • Customer decided to look at SQL Server 2016 for solution and started to look at Migrating DW 10 Customer First Steps • Contacted Microsoft – Got CAT invovled • Created a small VM (4 vCPUs and 8GB RAM) for initial setup / testing • Customer Initially used SSMA to convert Schema • Created Data Loads for SSIS to get some functional testing done • Identified Queries that exhibited Poor Performance • Used “normal” indexes initially • 4 week Pilot to prove power of Column Store in SQL 2016 11 System Environment Notes • Initial VM was 2 HT Cores (4 vCPU) with 8GB RAM Windows Server 2012 R2 SQL Server 2016 CTP 2.3 Initial Schema Conversion and Reality Check • Upgraded VM to 4HT Cores (8 vCPU) with 32GB RAM for Discovery / Initial Testing • Upgraded VM to 16 vCPU with 256GB RAM for Final Testing • Dev / Test Environment is on 2 nodes of a 4 node Oracle RAC with 256GB RAM per node and 79 cores allocated 12 Migration Challenges • Customer used various Partitioning strategies Referential Range-List • Does not always map well to Range Partitioning in SQL Server • Data Type Changes – Parameter Data Types • SSMA Converted Code has goal of functionality (not performance) 13 Dealing with Partitioning Differences • Keep ahead of the end of the window – Split Empty partitions in advance • Schedule Jobs to monitor to automate • Partitioning Key must be part of any Unique Index key • RANGE Partitioning only Usually, Partitioning on Dates is best practice With Column Store indexes in mind, Don’t partition to small 14 Best Practices for Migrating to SQL Server • Audit Report • Indexes • Features 15 Data Loading Audit Report • Row Count – simple to implement, but does not interrogate validity of data from Source to Target other than Row Count • Consider running a similar query against Source and Target systems on Business Critical tables that may have dollar amounts associated with them. • Sum of a column would be bare minimum. Consider also Min, Max, and Average for higher confidence 16 CI: Clustered Index (Rowstore) • Not Required, but…. Is a default and best practice Is generally more performant than a Heap Is generally a unique, primary key value Is Required for Partitioning Think of it as a Page Number in a book …. Unless There is a Reason Not to Have One • Fill Factor Page Splits 17 NCI: Non-Clustered Index (Rowstore) • Often used to aid retrieval of data without knowing the PK • Often used for FK and common columns used in searches • Use Included Columns for Covering Queries with an NCI • Similar to an Index in the back of a book. Look in the index to find the page to turn to. 18 Columnstore Refresher how is it different? Data stored as columns Data stored as rows C1 C 2 C 3 C 4 C 5 Benefits: • Improved compression: Data from same domain compress better … • Reduced I/O: Fetch only columns needed • Improved Performance: More data fits in memory Key Terms / Concepts • Columnstore / Rowstore – Physical Organization • Row Group – Rows Compressed into Columnstore format usually in sets of the Max (1,048,576) • Column Segment – Column of Data within Row Group • Deltastore – Clustered Index that stores data in rows until a threshold(Min 102,400) is met to move to a columnstore • Tuple Mover – Background Process converts rows in the deltastore to the columnstore • Batch Execution – Query Processing Method that operates on multiple rows together (aka vector-based execution) 20 Deltastore Basics • Bulk Load Data goes directly into Columnstore based on large volumes of data • Single and smaller loads of data first go to Deltastore • Tuple Mover runs in the background and moves rows from rowstore to columnstore Row Group 21 NCCI: Non-Clustered Columnstore Index • First Introduced in SQL 2012 Made Table Read-Only in 2012/2014 Only 1 NCCI allowed in 2012/2014 Use Partition “Switch” to add data • Now updateable in SQL 2016 Multiple NCCI allowed • Complements “Row Store” 22 CCI: Clustered Columstore Index • Introduced in SQL 2014 Converts Rowstore into Columnstore No other indexes allowed in 2014 • Incredible Compression Often up to 90% Domain of Data is similar repeatable data • Can have NCI on CCI in 2016 Allows for better SEEK performance • Certain Restrictions apply Unique, FK, Data Types 23 Index Compatibility Index Matrix Heap CI CCI NCI Yes Yes Yes NCCI Yes Yes Why? Summary • Table is physically organized as a Heap, Cluster Index (Row) or CCI (Column) • NCI and NCCI are secondary indexes that can dramatically affect performance Initial Changes to SQL 2016 Database / Server • Trace Flags 1117/1118/2371 no longer needed • TempDB Number of Files / Growth Rate / Location • Customer Database Files / Filegroups / Growth • SQL Memory Settings • Examine Waitstats • Quiet Baseline 25 Changes on SQL 2016 Side • Parameter Types changed in .NET/ App to match SQL data type changes • Clustered Indexes and PKs reviewed • Selected Tables were chosen for CCI • Selected Tables were chosen to have NCCI added • Some Additional NCI added to Selected Tables • Selected Tables were also Partitioned Date is preferred Arbitrary Boundary Point similar to a time period 26 Traditional DW Design Dim Table A Dim Table B Dim Table C Fact Table Dim Table D Dim Table E 27 • Almost all queries hit Fact Table • Approx 450 columns (400+ of Measurement Data • 100 GB Fact Table uncompressed • 100s of millions of rows Fact Table • Initial Load of 2015 Data (on 4vCPU/8GB VM) Created Partitions – Date Based (2 weeks) 100+ GB in size • Converted to CCI • Rebuild CCI – Reorganize – Compressed to 8GB • Early Investigative Querying from DBA Team indicated that we were optimistic about Testing • Loaded Additional Years of Data (2014, 2013) 2014 and 2013 data volumes were smaller due to not as many measurements being captured 28 Related Tables • Smaller Dim Tables Clustered Index Additional Non-Clustered Indexes • Large Dim Tables Clustered Index / Partitioned Several with Non-Clustered Index Several with Non-Clustered Column Store Index Based on query usage 29 Testing • Used Consistent Dates - Returned same number of rows across both systems • Queries chosen were based on problematic queries currently used in Production • Queries were very complex with criteria focusing on specific date ranges, plants, and measurement types (some with multiple CASE clauses) • Simple Queries were not officially tested, but were anecdotally tested by the DBA team without issues 30 The Contestants Oracle 12c 2 Nodes of a 4 Node RAC 40 cores of 79 allocated 256GB per node (512GB) Dedicated Attached Storage No ETL No clearing of cache No Perf Data Key Tables are the same volume except for Dim Table A: 3 Billion SQL Server 2016 CTP 2.3 VM: 16 vCPU/256GB 240GB allocated to SQL Storage on shared SAN ETL Job running loading data Clear Caches before every set of queries Basic Perfmon Data Collection Key Tables are the same volume except for Dim Table A: 600 Million (Due to allocated space) SQL 2016 Test Results 600 500 400 Query A 300 Query B Query C 200 100 0 Oracle SQL 1 Day Oracle SQL 1Week Oracle SQL 2 Week Oracle SQL 1Month Oracle SQL 3Month Sum of Time (Sec) Row Labels Query A Query B Query C Grand Total 1 Day Oracle 85 20 23 128 SQL 7 1 2 10 1Week Oracle 102 37 73 212 SQL 11 1 9 21 2 Week Oracle 68 561 49 678 SQL 8 1 3 12 1Month Oracle 70 571 72 713 SQL 6 1 5 12 3Month Oracle 169 55 24 248 SQL 7 4 28 39 Grand Total 533 1252 288 2073 Perfmon Initial Notes – We could be even faster! • CPU at Approx. 20% • Serious IO Latency on both Read and Write F: .010 / .041 G: .010 / .037 H: .010 / .038 L: Read Spike .068 @ start of testing D: write .020 • No Memory Pressure 33 Future Considerations • Customer to perform PoC with SQLCAT • Some Data Architecture Changes coming to the Fact Table • Further Index Tuning and Decisions • Further Code and Query Tuning 34 SQL 2016 Architecture Option: Use AlwaysOn Availability Groups Applica tion Liste ner Sync Secondary Primary Data Data Async Secondary • AlwaysOn AG Primary Replica • Sync to Replica in same DC to provide HA • Async to Replica in DR DC for DR • May also use Replicas for Read Only Queries • May also have additional Replicas closer to plants to reduce network latency Data 35 Summary • SQL on a smaller VM beat Oracle on Big Iron • SQL Server 2016 with Columnstore Indexes performs better than the Oracle System • Efforts were made to simulate consistent ETL flow as in current real world conditions • Further Query Tuning and more Robust IO subsystem will solidify SQL Server Dominant Performance 36 Thank You! • • • • Rick Heiges Data Platform MVP Follow Me on Twitter: @heigesr2 [email protected] 37 Questions? Thank You for Attending
© Copyright 2025 Paperzz