Issues related to 4GB partitioned tables During research undertaken on the partitioned table capability of Ingres, it has been noted that many problems are a result of the architectural design. A reworking of the partitioning capabilities of Ingres needs to be undertaken to address these problems 4GB+ table - research Research into 4GB tables, was done on usl3sd01 with SAN attached (Three lungs, each of 2TB of disk). The Ingres installation is Ingres 9.3 (Build SVN 932). The Hardware is given in appendix 1. The use of a 4GB+ table was deemed necessary to ensure that wrap-around errors where integers are used could be detected. Following several successful creations of tables with 4B+ tuples it was found that there were areas of greater concern than simply the capability to load 4B+ tuples. The results of the study are included in this paper. Datawarehouse - research My working title for an Ingres warehouse is IngresWDB - I have a 300 DB star configuration that has views against a federated table (which is equivalent to effectively spreading a LIST partitioned table over 300 servers). This configuration will be used for functionality testing For practical testing, performance of data load and retrieval, a 20 DB constellation has been created on the same hardware. A 3 DB constellation has been used to test the scripts used to build the above database warehouse examples. 1GB table - research Creating a 1 billion row table took approx 12 seconds for each 500,000 tuples added. o Total run time of 6h 32m. o Spikes were seen when the table extended. As each partition was extended the load time for 500K tuples was approx 28 seconds The CREATE TABLE statement is given in appendix 2. MODIFY the 1 billion row table to BTREE o Total run time 4h 15m o Required “SET SESSION WITH ON_LOGFULL = COMMIT”, even though the transaction log file was 16 Gigabytes, What was causing the very large volume of writes to the transaction log file has yet to be identified. The MODIFY TABLE statement is given in appendix 3 Creating a composite key index on two i4's on the 1 billion row table ran to completion o Total runtime of 3h 08m o Resulting index contains 4.3 million 8K pages. The CREATE INDEX statement is given in appendix 4 Optimizing the table ran to completion o Total runtime of 13m 19s Optimizedb –zr5000 –zu5000 –zh big4 –rcollected_data –asampleperiod –adcid copydb o copy.out ran to completion Required “SET SESSION WITH ON_LOGFULL = COMMIT” to be added to the script Time to unload 3h o copy.in 94 gigabytes of disk and more than 1 billion log writes were required to load the table. Time to load 4h 30m Changes required to COPYDB Changes are required in the copy.out and copy.in files generated to include set session with on_logfull = notify if any partitioned tables are being processed. Bulk load is not supported against partitioned tables, this needs to be addressed Features to export partition sets and create the load schema, where a partition set is a range from 1 to n partitions of the partitioned table in sequence with any partition as the starting point To unload and load a partitioned table in parallel To unload a partitioned table into individual files o Unloading the partitions in parallel: this would speed up the extract of data from a table as would a parallel load capability. The limitation is that the same partitioning scheme is used for unload and load. The XFERDB syntax would need to be enhanced to add the required SQL to facilitate the parallel load and unload of table data into individual physical files. The naming of the files for each partition would need to be considered to ensure that the partition data was correctly handled irrespective of the number of partitions Changes required to USERMOD To allow specific partitions or a range of partitions to be acted upon independently o A rule to automatically restructure a range partition or list partition with multiple entries into new partitions automatically when a global or table partition limit is reached. E.g. if 6m tuples are added to a table every day, when the number of tuples in any partition reached 25m that partition would be split ( if possible) or an entry in the error log made to notify the DBA Negating the automatic re-partitioning action is required as the DBA may be controlling the table and be aware of the potential issues that may arise. The automatic feature is required for lights out installations. Changes required to MODIFY and ALTER To DROP partitions from a partition schema (Sharding) To CREATE a new table from one or more consecutive partitions by refactoring of the partitioned table and changes to the iirelation table To add partitions to the beginning or end of a partitioned table (Growing) To split a partition (Re-Partitioning) To aggregate partitions (Re-Partitioning) MODIFY to TRUNCATE any partition sequence leaving them in place MODIFY or ALTER changes to enable partition level indexing MODIFY or ALTER changes to apply updates to global indexes when changes to partitioning moves keyed values between partitions or otherwise updates the TID information without needing to DROP or otherwise create the global index.(TBD) Changes required to partitioning criteria Various difficulties have been experienced when loading partitioned tables. Some ideas and questions are provided below for consideration. Session defaults: Loading tuples into a partitioned table fills the TX log file. Should a session be able to add the set session with on_logfull= notify setting automatically prior to starting the load of tuples into a partitioned table. Enable partitioned tables to be bulk loaded under all circumstances. This could be resolved by loading data into single partitions in parallel with all tuples in a given load file that fail to match the partition scheme being loaded into the default partition or placed in an exception file. To facilitate this the unload (copy.out) would also ned to be able to unload the table into individual load files. Being able to treat partitions as if they were ordinary tables for the purposes of loading and unloading via copy.in/copy.out scripts would benefit all implementations of partitioned tables. BOLKE01 – EXPANDED ABOVE (The above is missing something) Partitioned tables are made up of tables that are treated as special tables that are part of a Master table. These tables share the Master table schema (columns are only held in the iiattribute table for the Master table) however in every other way they are the physical table that the DBMS server acts upon, since the MASTER table has no physical presence. Treating partitions as tables that can participate in registered table visualization would address some areas of performance by reducing locking. Only SQL statement requests made against the Master table would require high level locks to be taken on the Master table. A registered table has similarity to a Master table in that it has no physical presence of its own. An Ingres STAR registered table has a link to one or more tables on a different database that are either local or in remote installations. If partitions of a partitioned table could be registered, as if they were a set of coherent tables, created by either the register table statement or the create table statement with a new ‘register’ option then the distinction of partitioned tables and an Ingres STAR registered table would be removed. In the former all participating tables would need to be of the same DDL scheme. In the latter the underlying tables would be created by the CREATE TABLE statement (as per current standard) with a syntax update to enable the partitions to be created as individual tables under the umbrella of the master table. o Example: CREATE TABLE ptn ( id samples svalue WITH DUPLICATES, NOJOURNALING, PAGE_SIZE = 16384 \p\g INTEGER, INTEGER, FLOAT) MODIFY ptn TO HEAP WITH ALLOCATION = 2000, EXTEND = 2048, PAGE_SIZE = 16384, register partition = (( list on samples partition p01 values partition p02 values partition p03 values partition p04 values partition p05 values partition p53 values SUBPARTITION( HASH ON id 35 partition with location = (iidatabase))) (1), (2), (3), (4), (5), (default)) \p\g REGISTER TABLE ptn_history FOR (ptn.p01, ptn.p02); \p\g REGISTER TABLE ptn_research FOR (ptn.p03, ptn.p04); \p\g REGISTER TABLE ptn_current FOR (ptn.p04, ptn.p05); \p\g REGISTER TABLE ptn_invalid FOR ptn.p53; \p\g /* brackets not required for a single table) In this example the ptn.p04 and ptn.p05 would exist as real tables that can be acted upon by MODIFY and ALTER statements e.g. ALTER TABLE ptn.p01 ADD CONSTRAINT UNIQUE INDEX ptn_idx_01 (id, samples) ; ) , However to use DML statements (SELECT, UPDATE or INSERT) data, the Master table (ptn) or the registered table (ptn_current) would need to be used. e.g. working with the ptn table would work for both these examples, however the first would have failed against ptn_current since the partitioning scheme would still be in effect allowing only tuples with samples = 4 or 5: INSERT INTO ptn values (2,5,3.57); or INSERT INTO ptn_current values (2, 4, 59.0); BOLKE01 added examples - The above does not make sense, how about examples? Addressability of the individual partitions as independent tables would resolve many issues and enabling views over restricted parts of a single highly partitioned table would bring benefits all round. Un-latching the control/master table from the partitions would bring a big improvement to flexibility and the capability to manage (register) a set of related tables (same schema) as a partitioned table (similar to a view) would bring its own benefits See above examples: this section is a bit of a tototolgy of the one above now The above does not make sense, how about examples? A re-write of partition table logic is required to address restrictions on indexing and table modification - (sharding, splitting, combining and adding new partitions) all of which are necessary when dealing with massive tables BOLKE01 – YEP – especially after the new additions. Have you not already said this in a different form within this document? The ability to register a table as a partition or subset of consecutive partitions of the main table either in STARDB or in a standard DB would bring benefits to the partitioning of a table across multiple installations, and in reverse, the combining of multiple tables as a distributed partitioned table facilitating the development of the IngresWDB solution Individual partitions should have their own table structure; at present all partitions must have the same structure and index columns which may not be an optimal choice Multi-part partition key is required within iirelation to identify partitioned tables; disconnecting the over-load of partition information within reltidx o A three part TID is a necessary requirement (reltid, reltidp - new column, reltidx), which together with the relnparts column form a consistently addressable referencing model and should be added to the next release even if the functionality is not fully crossed from the use of the two part (overloading of the index reltidx column) o reltid reltidx reltidp relnparts type Description N 0 0 0 T A base table N n 0 0 I An index on a base table N 0 n m P A partitioned table N n n m G A partitioned index on a partitioned table. Global index. n n 0 0 L An index on a single partition of a partitioned table. Local Index the value ‘N’ is the integer reltid of a Base or Master Partition the value ‘n’ is the integer reltid of an Index or partition the value ‘m’ is the integer partition number Note: for a non partitioned table the value is 0, and for a base table of a partition it holds the maximum partition number. Note: ‘N’ and ‘n’ should be increased to big integers (integer8) o With the current reltid and over-layed reltidx structure there is no ability to implement the local index or the partitioned global index. The locking implementation that is currently in place is prohibitively costly when multiple partitions are being updated as there is a lock taken at the master table level by each updating statement which causes synchronous activity. Indexing Index Type Local Index Global Index Description An Index that is against a single partition An Index across the whole partitioned table This index can be either partitioned or non-partitioned. If the index is partitioned then it should have the same partitioning properties as for portioning of a partitioned base table Current DB Capability New DB Capability (1) Reltid, reltidx, relnparts Reltid, reltidx, reltidp, relnparts 183, -2147483464, 0 183, -2147483463, 1 Partitioned Table 183, 0, 184, 0 Partitioned Table 183, 0, 185, 1 183, -2147483462, 2 183, 0, 186, 2 183, 187, 188, 0 Index 183,184, 0 Index: All partitions can only be included in a single index, using single or multiple locations. A maximum of 8.2m pages are possible in an index, limiting flexibility. Partitioned Index. 183, 187, 189, 1 183, 187, 190, 2 Partitioned Index: Same scheme as table partitioning ensuring no coincidental locking is required. Single or multiple locations may be used, though not required. New DB Capability (2) Reltid, reltidx, reltidp, relnparts 183, 0, 184, 0 Partitioned Table 183, 0, 185, 1 183, 0, 186, 2 184, 187, 0, 0 Local Index 185, 188, 0, 0 Local Index 186, 189, 0, 0 Local Index Local Index: Each partition can be indexed separately as if it were a Base table. No locking of the Master table or any other partition is required. Single or Multiple locations may be used. Sharding and Re- Partitioning Action Sharding Description Removal of the oldest partitions of a table. Special case of RePartitioning. I.e. Removal of the lowest range for Range partitioned Re-Partitioning Growing tables or the first set of a List partitioned table. Modification of all or some of the parts of a table. Enables partitions of a Range partitioned table to be subdivided and an additional set to be added to List partitioned tables Adding a new partition to the head of a partitioned table. Special case of Re-Partitioning. I.e. Repartitioning the default or last partition to have a range following the last defined value of a range partition or a new set of values for a List partitioned table Sharding Capability Sharding Capability Reltid, reltidx, relnparts relnparts 183, 0, 184, 0 Reltid, reltidx, reltidp, relnparts 183, 0, 185, 1 Partitioned Table Partitioned Table 183, 0, 186, 2 183, 0, 187, 3 Sharding: Removal of the first identified partition using reltid, reltidp and relnparts. Global indexes are not permitted during Sharding. Any Local index for the partition being removed are dropped. 183, 0, 185, 0 183, 0, 186, 1 183, 0, 187, 2 Sharding: The sequencing of the partitions is related to the relnparts value. When the tables first partition is removed, the remaining partitions are re-sequenced. Local indexes on the re-sequenced partitions are unaffected. Growing Capability Growing Capability Reltid, reltidx, reltidp, relnparts Reltid, reltidx, relnparts relnparts 183, 0, 184, 0 183, 0, 184, 0 Partitioned Table 183, 0, 184, 0 183, 0, 185, 1 183, 0, 185, 1 Partitioned Table 183, 0, 186, 2 183, 0, 186, 2 183, 0, 187, 3 Growing: Adding a new partition requires that the default partition is made the highest numbered partition relnpartno and that the new partion is inserted behind it. Global indexes are not permitted during Growing. Local indexes are unaffected. Growing: Once the new partition is inserted, all values in the default partition are inserted into the new partition and deleted from the default partition. Local indexes on the default partition are reorganised. Creation of Local indexes on the new partition are the responsibility of the User. Re-Partitioning Capability Re-Partitioning Capability Reltid, reltidx, reltidp, relnparts Reltid, reltidx, relnparts relnparts 183, 0, 184, 0 183, 0, 184, 0 183, 0, 185, 1 183, 0, 185, 1 Partitioned Table 183, 0, 186, 2 183, 0, 187, 3 Partitioned Table 183, 0, 188, 2 183, 0, 187, 3 183, 0, 187, 4 Re-Partitioning: The new range or list set for an inserted partition must completely cover the original partitioning scheme for the affected partition. Global indexes are not permitted during rePartitioning. Local indexes are dropped and recreated on the affected partition. The default partition will be checked for matching values which will be relocated to the new partition. Re-Partitioning: Once the new partition is inserted, all matching values from the original partition are inserted into the new partition and deleted from the original partition. The range and List set criteria is validated prior to initiating the re-partitioning action. Local indexes on the modified partition are dropped and once the partitions are repopulated they are re-created on both the original and new partitions. APPENDIX 1 – HARDWARE and OPERATING SYSTEM OPERATING SYSTEM: uname –a Linux usl3sd01.ingres.prv 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux FILESYSTEM / SAN Filesystem /dev/cciss/c0d0p1 tmpfs /dev/mapper/vg00-vol01 /dev/mapper/vg00-vol02 /dev/mapper/vg00-vol03 Size 62G 2.0G 2.0T 2.0T 2.0T Used 58G 0 1.4T 5.6G 404G Avail 1.6G 2.0G 487G 1.9T 1.5T Use% 98% 0% 75% 1% 22% Mounted on / /dev/shm /vol01 /vol02 /vol03 CPU INFO 4 CPUs (0-3) Details of CPU 0 is given here processor :0 vendor_id : GenuineIntel cpu family : 15 model :4 model name : Intel(R) Xeon(TM) CPU 3.80GHz stepping :3 cpu MHz : 2800.000 cache size : 2048 KB physical id : 0 siblings :2 core id :0 cpu cores :1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est tm2 cid cx16 xtpr bogomips : 7603.80 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: APPENDIX 2 – CREATE TABLE \continue \sql set autocommit on \p\g DROP SEQUENCE big1 \p\g CREATE SEQUENCE big1 as INTEGER8 START WITH 1 CACHE 500000 NOCYCLE \p\g DROP TABLE collected_data \p\g CREATE TABLE collected_data ( dcid INTEGER8 NOT NULL NOT DEFAULT, sampletime INTEGER NOT NULL, sampleperiod INTEGER NOT NULL, qualifier INTEGER NOT NULL, samples INTEGER, avgval F4, minval F4, maxval F4, stddev F4 ) WITH DUPLICATES, NOJOURNALING, PAGE_SIZE = 16384 \p\g MODIFY WITH collected_data TO HEAP ALLOCATION = 200000, EXTEND = 51200, PAGE_SIZE = 16384, partition = (( list on sampletime partition p01 values partition p02 values partition p03 values partition p04 values partition p05 values partition p06 values partition p07 values partition p08 values partition p09 values partition p10 values partition p11 values partition p12 values partition p13 values partition p14 values partition p15 values partition p16 values partition p17 values partition p18 values partition p19 values partition p20 values partition p21 values partition p22 values partition p23 values partition p24 values partition p25 values partition p26 values partition p27 values partition p28 values partition p29 values partition p30 values partition p31 values partition p32 values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21), (22), (23), (24), (25), (26), (27), (28), (29), (30), (31), (32), partition p33 values (33), partition p34 values (34), partition p35 values (35), partition p36 values (36), partition p37 values (37), partition p38 values (38), partition p39 values (39), partition p40 values (40), partition p41 values (41), partition p42 values (42), partition p43 values (43), partition p44 values (44), partition p45 values (45), partition p46 values (46), partition p47 values (47), partition p48 values (48), partition p49 values (49), partition p50 values (50), partition p51 values (51), partition p52 values (52), partition p53 values (default)) subpartition ( hash on dcid /* 100 */ 35 partition with location = (loc1))) \p\g APPENDIX 3 – MODIFY TABLE MODIFY WITH collected_data TO BTREE ALLOCATION = 200000, EXTEND = 51200, PAGE_SIZE = 16384, partition = (( list on sampletime partition p01 values (1), partition p02 values (2), partition p03 values (3), partition p04 values (4), partition p05 values (5), partition p06 values (6), partition p07 values (7), partition p08 values (8), partition p09 values (9), partition p10 values (10), partition p11 values (11), partition p12 values (12), partition p13 values (13), partition p14 values (14), partition p15 values (15), partition p16 values (16), partition p17 values (17), partition p18 values (18), partition p19 values (19), partition p20 values (20), partition p21 values (21), partition p22 values (22), partition p23 values (23), partition p24 values (24), partition p25 values (25), partition p26 values (26), partition p27 values (27), partition p28 values (28), partition p29 values (29), partition p30 values (30), partition p31 values (31), partition p32 values (32), partition p33 values (33), partition p34 values (34), partition p35 values (35), partition p36 values (36), partition p37 values (37), partition p38 values (38), partition p39 values (39), partition p40 values (40), partition p41 values (41), partition p42 values (42), partition p43 values (43), partition p44 values (44), partition p45 values (45), partition p46 values (46), partition p47 values (47), partition p48 values (48), partition p49 values (49), partition p50 values (50), partition p51 values (51), partition p52 values (52), partition p53 values (default)) subpartition ( hash on dcid /* 100 */ 9 partition with location = (loc1))) \p\g
© Copyright 2026 Paperzz