Redesign of ILIAS Repository Tree for Performance Lucerne, 11 December 2009 Page 1/38 Frankenstrasse 9, Postfach 2969, CH-6002 Luzern T +41 41 228 42 42, F +41 41 228 42 43 www.hslu.ch Lucerne, 11 December 2009 Page 2/38 Table of changes Version Date 1.0 2010-02-12 0.9 2009-12-11 Status Final Draft Changes and comments Edited by W. Randelshofer W. Randelshofer Table of contents 1. Vision..................................................................................................................................... 3 2. State of this project ................................................................................................................. 4 2.1. Current state ........................................................................................................................ 4 2.2. Completed steps .................................................................................................................. 4 2.3. Next Steps ........................................................................................................................... 4 2.4. Funding ............................................................................................................................... 5 3. Requirements analysis ............................................................................................................ 6 4. Proposed changes ................................................................................................................... 7 4.1. Change of data structures ..................................................................................................... 7 4.2. Change of storage engines ................................................................................................... 8 4.3. Change of code which accesses table "tree" ......................................................................... 8 4.4. Use of transactions instead of table locks ............................................................................. 8 5. Appendix A: Overview of tree data structures ....................................................................... 10 5.1. Adjacency List data structure ............................................................................................. 10 5.2. Nested Sets data structure .................................................................................................. 10 5.3. Materialized Path data structure ......................................................................................... 11 5.4. Adjacency List + Nested Sets data structure ....................................................................... 12 5.5. Adjacency List + Materialized Path data structure .............................................................. 13 6. Appendix B: Operations on the Adjacency List + Materialized Path Tree data structure ........ 14 6.1. Example repository structure ............................................................................................. 14 6.2. Implementation of the repository structure ......................................................................... 15 6.3. Example tree operations ..................................................................................................... 16 6.4. Implementation of the tree operations ................................................................................ 17 7. Appendix C: Operations on the Adjacency List + Nested Sets Tree data structure ................. 28 7.1. Example repository structure ............................................................................................. 28 7.2. Implementation of the data structure .................................................................................. 28 7.3. Example tree operations ..................................................................................................... 29 7.4. Implementation of the tree operations ................................................................................ 29 8. Bibliography ......................................................................................................................... 38 Lucerne, 11 December 2009 Page 3/38 1. Vision We want to improve the performance and responsiveness of ILIAS when many users are logged in simultaneously, and when the repository of ILIAS contains a large number of objects. Specifically, we want a repository data structure in ILIAS which is responsive, even if the repository contains more than 500,000 objects. we want support for concurrent write operations in ILIAS, so that more than 200 users can work simultaneously without blocking each other. Lucerne, 11 December 2009 Page 4/38 2. 2.1. State of this project Current state This project is currently in a pilot phase. A pilot implementation based on ILIAS 3.10 is in use at the Lucerne University of Applied Sciences and Arts (HSLU) since May 2009. 2.2. Completed steps In autumn 2008 the Lucerne University of Applied Sciences and Arts (HSLU) experienced performance issues with ILIAS, after upgrading from version 3.7 to 3.10. A first analysis at that time suggested, that the data structure of the ILIAS repository could be a cause of this problem, but changing it was considered too risky. HSLU made a number of performance improvements in several SQL statements of ILIAS, in the hope that they would sufficiently improve the situation. (These changes have been integrated into the official ILIAS code base in spring 2009). Also, a number of database tables have been migrated from the MySQL MyISAM storage engine to the InnoDB storage engine. Since the achieved improvements were not satisfactory, a requirements analysis for changing the repository structure of ILIAS started in early 2009 at HSLU. The changes in the source code of ILIAS have been made in the HSLU code branch for ILIAS 3.10 in early 2009. All proposed changes have been implemented by HSLU and are in use at HSLU since spring 2009. Performance tests have been made in May and June 2009 at HSLU. The results have been presented at the joint ILIASuisse and Baden-Württemberg Community meeting on July 2, 2009. 2.3. Next Steps The proposed changes need to be reviewed for inclusion in the official ILIAS code base. After successful review, the changes can be incorporated into the ILIAS code base. If the ILIAS core team implements the changes, it needs funding by the ILIAS open source community. At the current time (February 2010), a restructuring project is taking effect at HSLU, and it is not yet decided who will be in charge with ILIAS tasks. Therefore, funding any refactoring or implementing into ILIAS 4.x by HSLU cannot be promised. But if it turns out support is possible, HSLU will be glad to help. As the code is ready in the HSLU 3.10 branch, the inclusion may occur e.g. spring 2010 for the ILIAS 4.1 release. Lucerne, 11 December 2009 Page 5/38 2.4. Funding This project is funded by the Lucerne University of Applied Sciences and Arts (HSLU). Lucerne, 11 December 2009 Page 6/38 3. Requirements analysis A performance analysis of ILIAS 3.10 made in winter 2008 suggested that the data structure of the ILIAS repository, and the MySQL database engine could be a cause for the performance issues experienced by HSLU. The following components were identified as performance critical: The repository of ILIAS 3.10 is represented by a data structure named Adjacency List + Nested Sets Tree. (See Appendix A for a discussion of advantages and disadvantages of this data structure.) ILIAS 3.10 uses the MyISAM storage engine to store its database tables. (See [2] for a discussion of MySQL storage engines.) Based on the literature [1] and [2], the following hypotheses were made: Hypothesis 1: Operations on the repository are not always responsive for the following reasons: 1. The Nested Sets data structure frequently needs reorganization. On average half of all rows need to be updated during reorganization. On large repositories, this may take several seconds. 2. Operations which act on a subtree of the repository tree perform slowly because no indices are defined over the Nested Sets data structure. On large repositories, this may take longer than a second. Hypothesis 2: ILIAS does not support concurrent read and write operations for the following reasons: 1. The database tables use the MySQL MyISAM storage engine, which does not support concurrent write operations. 2. The repository tree is implemented using a redundant Adjacency List + Nested Sets data structure. The Nested Sets data structure does not support concurrent write operations. Lucerne, 11 December 2009 Page 7/38 4. 4.1. Proposed changes Change of data structures Change the data structure used by table tree from Adjacency List+Nested Sets to Adjacency List+Materialized path: ALTER TABLE `tree` ( DROP COLUMN `lft`, DROP COLUMN `rgt`, DROP COLUMN `depth`, ADD COLUMN `path` varchar(255) character set ascii NOT NULL, ADD KEY `path_index` (`path`(255)) ) This yields the following create table statement: CREATE TABLE `tree` ( `tree` int(10) NOT NULL default '0', `child` int(10) unsigned NOT NULL default '0', `parent` int(10) unsigned default NULL, `path` varchar(255) character set ascii NOT NULL, KEY `child` (`child`), KEY `parent` (`parent`), KEY `jmp_tree` (`tree`), KEY `path_index` (`path`(255)) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci Note the use of the US-ASCII character set for the path field, and the use of a prefix length of 255 characters for the path_index key. The path field holds a path to a tree node. The path consists of child -ID's encoded as decimal digit characters, delimited by the character '.'. The '~' character is used to specify upper bounds in range queries. See section 5.3 for a discussion of the path field. This change affects the code in ilias3/setup/sql/dbupdate_02.php PENDING: The above table definition is specific to MySQL. It is not clear, whether we can use the same character set, prefix length key and encoding with Oracle. Lucerne, 11 December 2009 Page 8/38 4.2. Change of storage engines Replace the MyISAM storage engine by the InnoDB storage engine in the database tables "tree" and "object_reference": ALTER TABLE `tree` ENGINE='InnoDB'; ALTER TABLE `object_reference` ENGINE='InnoDB'; This change affects the code in ilias3/setup/sql/dbupdate_02.php PENDING: The above table definitions are specific to MySQL. It is not clear, whether we can use the same definitions with Oracle. What we need are tables which support transactions with r owlevel locking. 4.3. Change of code which accesses table "tree" In ILIAS 3.10.x, all code which access tables with an Adjacency List+Nested Sets data structure is in class.ilTree.php. Chapter 5.4 lists all tables which are accessed by class.ilTree.php. Since we propose to only change the data structure in table "tree", we propose creating two new files: ilias3/Services/tree/classes/class.ilALNSTree.php Contains the existing code for the Adjacency List+Nested Sets data structure. ilias3/Services/tree/classes/class.ilALMPTree.php Contains the new code for the Adjacency List+Materialized Path data structure. The code in class.ilTree.php is changed into a Proxy which either calls ilALNSTree or ALMPTree depending on the table being accessed. Alternatively, if-statements in class.ilTree.php can be used to invoke different SQL-statements depending on the table being accessed. This approach was used in the pilot at HSLU. Chapter 6 gives details about the SQL statements for accessing the Adjacency List+Materialized Path data structure. PENDING: Decide whether ilTree shall be turned into a Proxy which calls two new class files, or whether ilTree shall use if-statements. 4.4. Use of transactions instead of table locks Rewrite the SQL statements which block table tree while it is being updated, and use transactions instead. Lucerne, 11 December 2009 Page 9/38 This rewrite affects the code in ilias3/Services/tree/classes/class.ilTree.php (if if-statements are used) or ilias3/Services/tree/classes/class.ilALMPTree.php (if a new class for PENDING: If an ILIAS installation uses the MyISAM engine instead of the InnoDB engine for table "tree" and for table "object_reference", the queries which change data in these tables must use explicit table locking instead of transactions. If table locking is not used, the database will quickly get corrupted due to concurrent write accesses. Lucerne, 11 December 2009 Page 10/38 5. Appendix A: Overview of tree data structures This appendix provides an overview of tree data structures and their usage in ILIAS. 5.1. Adjacency List data structure Description The Adjacency List data structure represents a tree structure as a child to parent relationship. The following fields are used to identify a tree node and the relationship to its parent node: child Uniquely identifies a tree node. parent Holds the id of the parent node. Usage This data structure is used by the following tables in ILIAS: The call structure. ctrl_calls ctrl_structure 5.2. The code control structure. Nested Sets data structure Description The Nested Sets data structure uses enclosure (containment) to show parenthood. The following fields are used to identify the boundaries of an enclosure: lft Holds the left (lower) boundary of the nested set enclosed by the node. rgt Holds the right (upper) boundary of the nested set enclosed by the node. Usage This data structure is not used in ILIAS. It is described here only for completeness. Lucerne, 11 December 2009 Page 11/38 5.3. Materialized Path data structure Description The Materialized path data structure represents a tree structure by a path description from the root of the tree down to a specific node: The following fields are used to identify a tree node and its path: child Uniquely identifies a tree node. path Holds the path to a node. Contents of the path field Encoding The path consists of child id's separated by a special character - the separator character. To allow for range queries, the separator character must be smaller than the characters used for encoding the child id's (or more precisely: must precede them in the collation sequence). We also need a character for defining the upper bound of a range query. This character must succeed the characters used for encoding the child id's in the collation sequence. If the path uses the US-ASCII collation rule, we can use the digits 0-9 and the characters a-z for encoding the child id's. the '.' (full stop) for separating the child-id's the '~' (tilde) character for range queries With 0-9 and a-z we can either use decimal encoding, hex encoding or a base-36 encoding for the child id's. For simplicity and for robustness reasons we suggest using decimal encoding for now. In future versions of ILIAS, one of the other two encodings can be used. Field length The path is stored in a VARCHAR field. In MySQL, an ASCII VARCHAR field can have up to 65,535 characters. If id's with 11 decimal digits are chosen, then this limits the maximal depth of a tree to 5,200 levels. In MySQL the maximal length of VARCHAR fields depends on the character encoding used. If we chose UTF-8 instead of ASCII, we could only store up to 21,845 characters. The effective maximum length of a VARCHAR in MySQL is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. For access efficiency, the path needs to be indexed. In MySQL, a prefix index over the first 255 characters of the path can be used. PENDING: The above description is specific to MySQL. It is not clear, whether we can use the same character set, prefix length key and encoding with Oracle. Usage This data structure is not used in ILIAS. It is described here only for completeness. Lucerne, 11 December 2009 Page 12/38 5.4. Adjacency List + Nested Sets data structure Description The Adjacency List + Nested Sets data structure uses both these data structures to provide efficient access to tree items. In addition, a depth field is used to improve the performance of operations which retrieve the path to a tree node. The following fields are used for this data structure: child Uniquely identifies a tree node. parent Holds the id of the parent node. depth Holds the depth of the node in the tree. lft Holds the left (lower) boundary of the nested set enclosed by the node. rgt Holds the right (upper) boundary of the nested set enclosed by the node. Usage This data structure is used by the following tables in ILIAS: Bookmark tree on the personal desktop. bookmark_tree cp_tree SCORM 2004 course item tree. frm_posts_tree Posts in the discussion thread of a forum. lm_tree Chapters and pages tree in a learning module. mail_tree Mail folder tree. mep_tree MediaPool tree. search_tree Search results tree. scorm_tree SCORM 1.2 course item tree. tree The repository tree. xml_tree ? All code which accesses these tables is in class.ilTree.php. Lucerne, 11 December 2009 Page 13/38 5.5. Adjacency List + Materialized Path data structure Description The Adjacency List + Materialized Path data structure uses both these data structures to provide efficient access to tree items. The following fields are used for this data structure: child Uniquely identifies a tree node. parent Holds the id of the parent node. depth Holds the depth of a node in the tree. path Holds the path to a node. Usage We propose to use this data structure for table "tree" in ILIAS. Lucerne, 11 December 2009 Page 14/38 6. Appendix B: Operations on the Adjacency List + Materialized Path Tree data structure This appendix provides details of the proposed Adjacency List + Materialized Path Tree data structure. 6.1. Example repository structure The following example repository with a total of 100'000 objects is used as an example in this appendix and in appendix C: ILIAS Root School of Business Category Bachelor Category Autumn 2009 Category English A09 Course Role Folder Lecture Notes Folder Grammar 101.pdf File Qualification Notes Folder English A09.01 Group Role Folder File Exchange Folder Role Folder Joe's Workbook.doc Mary's Workbook.doc File unknown Category 99'985 unknown objects Lucerne, 11 December 2009 Page 15/38 6.2. Implementation of the repository structure The table below shows an implementation of the example tree structure using the proposed Adjacency List + Materialized Path data structure: tree child parent depth path 1 1 0 1 1 ILIAS Root 1 2 1 2 1.2 School of Business Category 1 3 2 3 1.2.3 Bachelor Category 1 4 3 4 1.2.3.4 Autumn 2009 Category 1 5 4 5 1.2.3.4.5 English A09 Course 1 6 5 6 1.2.3.4.5.6 Role Folder 1 7 5 6 1.2.3.4.5.7 Lecture Notes Folder 1 8 6 7 1.2.3.4.5.7.8 Grammar 101.pdf File 1 9 5 6 1.2.3.4.5.9 Qualification Notes Folder 1 10 5 6 1.2.3.4.5.10 English A09.01 Group 1 11 10 7 1.2.3.4.5.10.11 Role Folder 1 12 10 7 1.2.3.4.5.10.12 File Exchange Folder 1 13 12 8 1.2.3.4.5.10.12.13 Role Folder 1 14 12 8 1.2.3.4.5.10.12.14 Joe's Workbook.doc 1 15 12 8 1.2.3.4.5.10.12.15 Mary's Workbook.doc File 1 16 1 2 1.16 Unknown Category … … … … … 99,985 Objects Lucerne, 11 December 2009 Page 16/38 6.3. Example tree operations The following tree operations are used as examples in this appendix and in appendix C: getRoot():int Gets the reference id of the object located at the root of the specified repository tree. getChildren($node:int):int[] Takes the reference id of a node, and returns the reference ids of all its childrens. getParent($node:int):int Takes the reference id of a node, and returns the reference id of its parent node. getPath($node:int):int[] Takes the reference id of a node, and returns the reference ids of its parents, ordered by depth. getSubtree($node:int):int[] Takes the reference id of a node, and returns the reference ids of all nodes in the subtree, including the node itself. getDepth($node:int):int Takes the reference id of a node, and returns the depth of the node. isDescendantOf($node1:int, $node2:int):boolean Returns true, if node1 is contained in the subtree of node2. insertInto($node1:int, $node2:int):void Inserts node1 as a child of node2. delete($node:int):void Deletes the subtree starting at the specified node from the tree. moveTo($node1:int, $node2:int):void Removes the subtree starting at node1 from its parent and adds it as a child to node2. Lucerne, 11 December 2009 Page 17/38 6.4. Implementation of the tree operations The following paragraphs describe the implementation of tree operations using the proposed Adjacency List + Materialized Path data structure: For each operation an analysis is given. The analysis shows that all operations on an Adjacency List + Materialized Path data structure perform equally well or better than those for a Adjacency List + Nested Sets data structure. 6.4.1. getRoot():int Algorithm SELECT child FROM tree WHERE parent = 0 AND tree = 1; In general no database access is needed, because the root node has the well known id 1. In case the id is not well known, the above statement can be used. This statement uses the Adjacency List data structure. Example mysql> SELECT child FROM tree WHERE parent=0 AND tree = 1; +-------+ | child | +-------+ | 1 | +-------+ 1 row in set (0.00 sec) The result is 1. Analysis mysql> EXPLAIN SELECT child FROM tree WHERE parent=0 AND tree = 1; +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ | 1 | SIMPLE | tree | ref | parent,jmp_tree | parent | 5 | const | 1 | Using where | +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ 1 row in set (0.00 sec) The explain statement shows that the database can use the key "parent" to select the row, and that it only has to visit a single table row to find the desired result. Lucerne, 11 December 2009 Page 18/38 6.4.2. getChildren($node:int):int[] Algorithm SELECT child FROM tree WHERE parent = $node AND tree = 1; The Adjacency List data structure is used to retrieve the children of a node. The clause "AND tree=1" is needed, because we only want children, which are in the same tree as the node - assuming that the node is in tree 1. Alternative algorithm If the tree of the node is not known, the following statement can be used: SELECT child FROM tree WHERE parent=$node AND tree = (SELECT tree FROM tree WHERE child=$node); Example: Getting the children of 10 "English A09 Course": mysql> SELECT child FROM tree WHERE parent = 5 AND tree = 1; +--------+ | child | +--------+ | 6 | | 7 | | 9 | | 10 | +--------+ 4 rows in set (0.00 sec) The result is {6, 7, 9} {"Role Folder", "Lecture Notes Folder", "Grammar Notes Folder", "English A09.01 Group"} Analysis mysql> EXPLAIN SELECT child FROM tree WHERE parent = 5 AND tree = 1; +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ | 1 | SIMPLE | tree | ref | parent,jmp_tree | parent | 5 | const | 10 | Using where | +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ 1 row in set (0.00 sec) The explain statement shows that the database can use the key "parent" to select the rows, and that it only has to visit rows which are part of the result set. Analysis of the alternative algorithm mysql> EXPLAIN SELECT child FROM tree WHERE parent=17 AND tree = (SELECT tree FROM tree WHERE child=17); +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ | 1 | PRIMARY | tree | ref | parent,jmp_tree | parent | 5 | const | 18 | Using where | | 2 | SUBQUERY | tree | ref | child | child | 4 | | 1 | | +----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+ 2 rows in set (0.00 sec) The explain statement shows that the database can use the key "parent" to select the rows, and that it only has to visit rows which are part of the result set. Lucerne, 11 December 2009 Page 19/38 6.4.3. getParent($node:int):int Algorithm SELECT parent FROM tree WHERE child=$node; The Adjacency List data structure is used to retrieve the parent of a node. Example Getting the parent of "English A09 Course": mysql> SELECT parent FROM tree WHERE child=5; +--------+ | parent | +--------+ | 4 | +--------+ 1 row in set (0.00 sec) The result is {4} {"Autumn 2009 Category"}. Analysis mysql> EXPLAIN SELECT parent FROM tree WHERE child=5; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) The explain statement shows that the database can use the key "child" to select the row, and that it only has to visit 1 row. Lucerne, 11 December 2009 Page 20/38 6.4.4. getPath($node:int):int[] Algorithm SELECT path FROM tree WHERE child=$node; $nodes = $path.split('.'); The Materialized Path data structure is used to retrieve the path to a node. PHP is then used to split up the path into an array. Example Getting the path to "English A09.01 Group": mysql> SELECT path FROM tree WHERE child=10; +--------------+ | path | +--------------+ | 1.2.3.4.5.10 | +--------------+ 1 row in set (0.00 sec) The result is {1, 2, 3, 4, 5, 10} => {"ILIAS Root", "School of Business Category", "Bachelor Category", "Autumn 2009 Category", "English A09 Course", "English A09.01 Group"}. Analysis mysql> EXPLAIN SELECT path FROM tree WHERE child=10; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) The explain statement shows that the database can use the key "child" to select the row, and that it only has to visit 1 row. Lucerne, 11 December 2009 Page 21/38 6.4.5. getSubtree($node:int):int[] Algorithm SELECT tree, path AS from_range FROM tree WHERE child=$node; $to_range = $from_range.'.~'; SELECT child FROM tree WHERE path BETWEEN $from_range AND $to_range AND tree=$tree; Subtrees are retrieved using the Materialized Path data structure. First we select the tree and the path as from_range. Using PHP we construct to_range by appending '.~' to from_range. Then we retrieve the subtree. Alternative algorithm SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.path BETWEEN t1.path AND CONCAT(t1.path, '.~') AND t2.tree=t1.tree WHERE t1.child=$node; Unfortunately, MySQL does not perform an efficient query with this statement. Example Getting the subtree of 12 "File Exchange Folder": mysql> SELECT tree, path AS from_range FROM tree WHERE child=12; +------+-----------------+ | tree | from_range | +------+-----------------+ | 1 | 1.2.3.4.5.10.12 | +------+-----------------+ 1 row in set (0.00 sec) mysql> SELECT child FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND tree=1; +-------+ | child | +-------+ | 12 | | 13 | | 14 | | 15 | +-------+ 4 rows in set (0.00 sec) The result is {12, 13, 14, 15} => {"File Exchange Folder", "Role Folder", "Joe's Workbook.doc", "Mary's Workbook.doc File"}. Analysis mysql> EXPLAIN SELECT tree, path AS from_range FROM tree WHERE child=12; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) The explain statement of the first query shows that the database can use the key "child" to select the row, and that it only has to visit 1 row. mysql> EXPLAIN SELECT child FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND tree=1; +----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+ Lucerne, 11 December 2009 Page 22/38 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+ | 1 | SIMPLE | tree | range | jmp_tree,path_index | path_index | 257 | NULL | 4 | Using where | +----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+ 1 row in set (0.00 sec) The explain statement of the second query shows that the database can use the key "path_index" to select the rows and that it has only to visit as many rows as are in the subtree. Analysis of alternative algorithm mysql> EXPLAIN SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.path BETWEEN t1.path AND CONCAT(t1.path, '.~') AND t2.tree=t1.tree WHERE t1.child=12; +----+-------------+-------+------+---------------------------+----------+---------+--------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------------------+----------+---------+--------------------+------+-------------+ | 1 | SIMPLE | t1 | ref | child,jmp_tree,path_index | child | 4 | const | 1 | | | 1 | SIMPLE | t2 | ref | jmp_tree,path_index | jmp_tree | 4 | ilias_hslu.t1.tree | 34561 | Using where | +----+-------------+-------+------+---------------------------+----------+---------+--------------------+------+-------------+ 2 rows in set (0.00 sec) The analysis of the alternative query shows that MySQL only uses the index jmp_tree, which is not efficient. 6.4.6. getDepth($node:int):int Algorithm SELECT depth FROM tree WHERE child = $node; Getting the depth of a node is straightforward using the depth field. Example mysql> SELECT depth FROM tree WHERE child=10; +-------+ | depth | +-------+ | 6 | +-------+ 1 row in set (0.00 sec) Analysis mysql> EXPLAIN SELECT depth FROM tree WHERE child=10; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) The explain statement shows that the database can use the key "child" to select the row, and that it only has to visit 1 row. Lucerne, 11 December 2009 Page 23/38 6.4.7. isDescendantOf(node1:int, node2:int):boolean Algorithm SELECT path FROM tree WHERE child=$node1; return in_array($node2, explode('.',$path)); The Materialized Path data structure is used to determine whether one node is a descendant o f another node. PHP is then used to search for node2 in the path. Example Determine whether 10 "English A09.01 Group" is a descendant of 5 "English A09 Course": mysql> SELECT path FROM tree WHERE child=10; +--------------+ | path | +--------------+ | 1.2.3.4.5.10 | +--------------+ 1 row in set (0.00 sec) The result is true, because the path contains the id 5, which is the id of "English A09 Course". Analysis mysql> EXPLAIN SELECT path FROM tree WHERE child=5; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) The explain statement shows that the database can use the key "child" to select the row, and that it only has to visit 1 row. Lucerne, 11 December 2009 Page 24/38 6.4.8. insertInto(node1:int, node2:int):void Algorithm SELECT tree, depth, path FROM tree WHERE child=$node2; if ($depth == $max_depth) { // max depth reached, can not insert } INSERT INTO tree (tree, child, parent, depth, path) VALUES ($tree, $node1, $node2, $depth+1, CONCAT(CONCAT($path, '.'), $node1)) First we check whether we can insert a new node without exceeding the maximal path length. Then we insert the new node. Example Insert a new node with id 17 inside 12 "File Exchange Folder". mysql> SELECT tree, depth, path FROM tree WHERE child=12; +------+-------+-----------------+ | tree | depth | path | +------+-------+-----------------+ | 1 | 7 | 1.2.3.4.5.10.12 | +------+-------+-----------------+ 1 row in set (0.00 sec) mysql> INSERT INTO tree (tree, child, parent, depth, path) VALUES (1, 17, 12, 8, '1.2.3.4.5.10.12.17'); Analysis mysql> EXPLAIN SELECT tree, depth, path FROM tree WHERE child=12; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) The data needed from node2 can be efficiently retrieved using key "child". Lucerne, 11 December 2009 Page 25/38 6.4.9. delete($node:int):void Algorithm START TRANSACTION; SELECT tree, path AS from_range FROM tree WHERE child=$node FOR UPDATE; $to_range = $from_range.'.~'; DELETE FROM tree WHERE path BETWEEN $from_range AND $to_range AND tree=$tree; COMMIT; Subtrees are deleted using the Materialized Path data structure. First we select the tree and the path as from_range. Using PHP we construct to_range by appending '.~' to from_range. Then we delete the subtree. Example Deleting the subtree of 12 "File Exchange Folder": mysql> START TRANSACTION; mysql> SELECT tree, path AS from_range FROM tree WHERE child=12 FOR UPDATE; +------+-----------------+ | tree | from_range | +------+-----------------+ | 1 | 1.2.3.4.5.10.12 | +------+-----------------+ 1 row in set (0.00 sec) mysql> DELETE FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND tree=1; Query OK, 4 rows affected (0.00 sec) mysql> COMMIT; Analysis mysql> EXPLAIN SELECT tree, path AS from_range FROM tree WHERE child=12; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) The explain statement of the first query shows that the database can use the key "child" to select the row, and that it only has to visit 1 row. mysql> EXPLAIN SELECT child FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND tree=1; +----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+ | 1 | SIMPLE | tree | range | jmp_tree,path_index | path_index | 257 | NULL | 4 | Using where | +----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+ 1 row in set (0.00 sec) The explain statement of the second query shows that the database can use the key "path_index" to select the rows and that it has only to visit as many rows as are in the subtree. Lucerne, 11 December 2009 Page 26/38 6.4.10. moveTo($node1:int, $node2:int):void Algorithm START TRANSACTION; SELECT tree, child, parent, depth, path AS path FROM tree WHERE child IN ($node1, $node2) LOCK IN SHARE MODE; …assign result set to $node1 and $node2… if ($node2.depth > $node1.depth) { // Check whether we are within maximal path depth $to_path = $node2.path.'.~'; SELECT MAX(depth) AS max_depth FROM tree WHERE path BETWEEN $node2.path AND $to_path AND tree = $node2.tree FOR UPDATE if($max_depth - $node1.depth + $node2.depth + 1 > $max_depth) { // max depth exceeded - can not move } $split_pos = strrpos('.',$node1.path); $to_path = $node1.path.'.~'; UPDATE tree SET parent = CASE WHEN parent = $node1.parent THEN $node2 ELSE parent END, path = CONCAT($node2.path, MID(path, $split_pos)), depth = depth + $node2.depth - $node1.depth + 1 WHERE path BETWEEN $node1.path AND $to_path AND tree = $node1.tree; COMMIT; Subtrees are moved using the Materialized Path data structure. First we select depth, child, parent, path from node1 and node2. If node2 has a greater depth than node1, we check whether we the move stays within the maximal depth. Then we move the nodes. Example Moving the subtree of 7 "Lecture Notes Folder" to 12 "File Exchange Folder": mysql> SELECT tree, child, parent, depth, path FROM tree WHERE child IN (7, 12) LOCK IN SHARE MODE; +-------+-------+--------+-------+-----------------+ | tree | child | parent | depth | path | +-------+-------+--------+-------+-----------------+ | 1 | 7 | 5 | 6 | 1.2.3.4.5.7 | | 1 | 12 | 10 | 7 | 1.2.3.4.5.10.12 | +-------+-------+--------+-------+-----------------+ 2 rows in set (0.00 sec) mysql> SELECT MAX(depth) AS max_depth FROM tree FORCE KEY path_index WHERE path BETWEEN '1.2.3.4.5.7' AND '1.2.3.4.5.7.~' AND tree = 1 FOR UPDATE ; +-----------+ | max_depth | +-----------+ | 8 | +-----------+ 1 row in set (0.00 sec) UPDATE tree SET parent = CASE WHEN parent = 5 THEN 12 ELSE parent END, path = CONCAT('1.2.3.4.5.10.12',MID(path, 10)), Lucerne, 11 December 2009 Page 27/38 depth = depth + 7 - 6 + 1 WHERE path BETWEEN '1.2.3.4.5.7' AND '1.2.3.4.5.7.~' AND tree = 1; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0 Analysis mysql> EXPLAIN SELECT tree, child, parent, depth, path FROM tree WHERE child IN (7, 12) LOCK IN SHARE MODE; +----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+ | 1 | SIMPLE | tree | range | child | child | 4 | NULL | 2 | Using where | +----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+ 1 row in set (0.00 sec) The explain statement of the first query shows that the database can use the key "child" to select the row, and that it only has to visit the 2 rows that we need. mysql> EXPLAIN SELECT MAX(depth) FROM tree WHERE path BETWEEN '1.17.118519' AND '1.17.118519.~' ; +----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+ | 1 | SIMPLE | tree | range | path_index | path_index | 257 | NULL | 69 | Using where | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+ 1 row in set (0.00 sec) The explain statement of the second query shows that the database can use the key "path_index" to select the rows and that it has only to visit as many rows as are in the subtree. Lucerne, 11 December 2009 Page 28/38 7. Appendix C: Operations on the Adjacency List + Nested Sets Tree data structure This appendix provides details of the existing Adjacency List + Nested Sets Tree data structure. 7.1. Example repository structure See Appendix B. 7.2. Implementation of the data structure The table below shows an implementation of the example repository using the Adjacency List + Nested Sets data structure as currently implemented in ILIAS 3.10: tree child parent depth lft rgt 1 1 0 1 1 100106 ILIAS Root 1 2 1 2 2 703 School of Business Category 1 3 2 3 3 604 Bachelor Category 1 4 3 4 4 505 Autumn 2009 Category 1 5 4 5 5 406 English A09 Course 1 6 5 6 6 7 Role Folder 1 7 5 6 8 109 Lecture Notes Folder 1 8 6 7 10 11 Grammar 101.pdf File 1 9 5 6 110 111 Qualification Notes Folder 1 10 5 6 112 313 English A09.01 Group 1 11 10 7 113 114 Role Folder 1 12 10 7 115 122 File Exchange Folder 1 13 12 8 116 117 Role Folder 1 14 12 8 118 119 Joe's Workbook.doc 1 15 12 8 120 121 Mary's Workbook.doc File 1 16 1 2 704 100005 Unknown Category … … … … … … 99,985 Objects Lucerne, 11 December 2009 Page 29/38 7.3. Example tree operations See Appendix B. 7.4. Implementation of the tree operations The following paragraphs describe the implementation of tree operations in ILIAS 3.10 using the Adjacency List + Nested Sets data structure. For each operation an example and an analysis is given. 7.4.1. getRoot():int The code is identical to getRoot() in section 6.4.1. 7.4.2. getChildren($node:int):int[] The code is identical to getChildren() in section 6.4.2. 7.4.3. getParent($node:int):int The code is identical to getParent() in section 6.4.3. Lucerne, 11 December 2009 Page 30/38 7.4.4. getPath($node:int):int[] Algorithm SELECT tree, parent AS node_parent, depth AS node_depth FROM tree WHERE child = $node; if ($node_depth == 1) { return {$node}. } elseif ($node_depth == 2) { return {$parent, $node}. } elseif ($node_depth == 3) { return {1, $parent, $node}. } elseif ($node_depth <= 63) { SELECT d2.parent AS d2_node, d3.parent AS d3_node, …., dn.parent AS dn_node FROM tree AS dn JOIN tree AS dn-1 ON dn-1.child = dn.parent … JOIN tree AS d3 ON d3.child = d4.parent JOIN tree AS d2 ON d2.child = d3.parent WHERE dn.child = $node_parent AND dn.tree=1 AND dn.tree=1 AND dn.tree=$tree; return {1, $d2_node, $d3_node, …, $dn_node, $node_parent, $node] } else { SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t1.lft BETWEEN t2.lft AND t2.rgt WHERE t1.child=$node AND t1.tree=1 AND t2.tree=$tree ORDER BY t2.depth; return result set; } First, the depth of the node in the tree is determined. Also, the id of the parent node is retrieved. This saves an additional SQL statement, if the node is located at a depth of 3 or less. If the depth is 1, ILIAS returns the path: {node}. If the depth is 2, ILIAS returns the path {parent, node}. If the depth is 3, and if the id of the root node has the well known id 1, ILIAS returns the path [1, parent, node]. If depth is less or equal to 63, a self-join over the Adjacency List data structure is used. The selfjoin becomes more complex the deeper the node is located in the tree. For example for depth 6, ILIAS uses the following statement (assuming that the node at depth 1 is well known): ILIAS only needs to retrieve 3 path elements here, since it is known that 1 is the reference id of the root node, and since it can reuse the value of node_parent that it retrieved from the first SELECT statement of this algorithm. If the depth is greater than 63, nested joins can not be used because MySQL 4.1 limits the number of joins to 61 tables. See http://dev.mysql.com/doc/refman/4.1/en/joins-limits.html ILIAS reverts to the Nested Sets Tree data structure, when the path exceeds 63 levels. Lucerne, 11 December 2009 Page 31/38 Example 1 This example uses the Adjacency List data structure. Getting the path to 10 "English A09.01 Group": mysql> SELECT parent AS node_parent, depth AS node_depth FROM tree WHERE child=10; +-------------+------------+ | node_parent | node_depth | +-------------+------------+ | 5 | 6 | +-------------+------------+ 1 row in set (0.00 sec) mysql> SELECT d2.parent AS d2_node, d3.parent AS d3_node, d4.parent AS d4_node FROM tree AS d4 JOIN tree AS d3 ON d3.child = d4.parent JOIN tree AS d2 ON d2.child = d3.parent WHERE d4.child = 5 AND d4.tree = 1 AND d3.tree = 1 AND d2.tree = 1; +---------+---------+---------+ | d2_node | d3_node | d4_node | +---------+---------+---------+ | 2 | 3 | 4 | +---------+---------+---------+ 1 row in set (0.00 sec) The result is {1, 2, 3, 4, 5, 10} -> {"ILIAS Root", "School of Business Category", "Bachelor Category", "Autumn 2009 Category", "English A09 Course", "English A09.01 Group"}. Analysis 1 mysql> EXPLAIN SELECT parent AS node_parent, depth AS node_depth FROM tree WHERE child=10; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT d2.parent AS d2_node, d3.parent AS d3_node, d4.parent AS d4_node FROM tree AS d4 JOIN tree AS d3 ON d3.child = d4.parent JOIN tree AS d2 ON d2.child = d3.parent WHERE d4.child = 5 AND d4.tree = 1 AND d3.tree = 1 AND d2.tree = 1; +----+-------------+-------+------+-----------------------+-------+---------+----------------------+-----+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------------+-------+---------+----------------------+-----+-------------+ | 1 | SIMPLE | d4 | ref | child,parent,jmp_tree | child | 4 | const | 1 | Using where | | 1 | SIMPLE | d3 | ref | child,parent,jmp_tree | child | 4 | ilias_hslu.d4.parent | 1 | Using where | | 1 | SIMPLE | d2 | ref | child,jmp_tree | child | 4 | ilias_hslu.d3.parent | 1 | Using where | +----+-------------+-------+------+-----------------------+-------+---------+----------------------+-----+-------------+ 3 rows in set (0.00 sec) The main limitation of getPath() using the Materialized Path Tree data structure is that we need one self-join for every level in the hierarchy, and performance will naturally degrade with each level added as the joining grows in complexity. Example 2 This example uses the nested sets data structure. Getting the path to 10 "English A09.01 Group": mysql> SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t1.lft BETWEEN t2.lft AND t2.rgt WHERE t1.child=10 AND t1.tree=1 AND t2.tree=1 ORDER BY t2.depth; +--------+ | child | +--------+ Lucerne, 11 December 2009 Page 32/38 | 1 | | 2 | | 3 | | 4 | | 5 | | 10 | +--------+ 6 rows in set (0.00 sec) The result is {1, 2, 3, 4, 5, 10} -> {"ILIAS Root", "School of Business Category", "Bachelor Category", "Autumn 2009 Category", "English A09 Course", "English A09.01 Group"}. Analysis 2 The problem with this approach is that all rows of the tree table need to be inspected, leading to poor performance on repositories which have more than a few thousand objects. mysql> EXPLAIN SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t1.lft BETWEEN t2.lft AND t2.rgt WHERE t1.child=10 AND t1.tree=1 AND t2.tree=1 ORDER BY t2.depth; +----+-------------+-------+------+----------------+----------+---------+-------+--------+----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------+----------+---------+-------+--------+----------------------------+ | 1 | SIMPLE | t2 | ref | jmp_tree | jmp_tree | 4 | t1 | ref | child,jmp_tree | child | const | 100000 | Using where; Using filesort | | 1 | SIMPLE | 4 | const | 1 | Using where | +----+-------------+-------+------+----------------+----------+---------+-------+--------+----------------------------+ 2 rows in set (0.00 sec) Lucerne, 11 December 2009 Page 33/38 7.4.5. getSubtree($node:int):int[] Algorithm SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.lft BETWEEN t1.lft AND t1.rgt AND t1.tree=t2.tree WHERE t1.child=$node; Subtrees are retrieved using the Nested Sets data structure. Example Getting the subtree of 12 "File Exchange Folder": mysql> SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.lft BETWEEN t1.lft AND t1.rgt AND t1.tree=t2.tree WHERE t1.child=12; +-------+ | child | +-------+ | 12 | | 13 | | 14 | | 15 | +-------+ 4 rows in set (1.04 sec) mysql> Analysis mysql> EXPLAIN SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.lft BETWEEN t1.lft AND t1.rgt AND t1.tree=t2.tree WHERE t1.child=12; +----+-------------+-------+------+----------------+----------+---------+--------------------+--------+------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------+----------+---------+--------------------+--------+------------+ | 1 | SIMPLE | t1 | ref | child,jmp_tree | child | 4 | const | 1 | | | 1 | SIMPLE | t2 | ref | jmp_tree | jmp_tree | 4 | ilias_hslu.t1.tree | 100000 | Using where | +----+-------------+-------+------+----------------+----------+---------+--------------------+--------+------------+ 2 rows in set (0.00 sec) The problem with this approach is that all rows of the tree table need to be inspected, leading to poor performance on repositories which have more than a few thousand objects. 7.4.6. getDepth($node:int):int The code is identical to getChildren() in section 6.4.66.4.2. Lucerne, 11 December 2009 Page 34/38 7.4.7. isDescendantOf($node1:int, $node2:int):boolean Algorithm if ($node2 == 1) { return true; // all nodes are descendants of the root node } if ($node1 == 1) { return false; // the root node is not a descendant of any other node except itself } SELECT parent AS node_parent, depth AS node_depth FROM tree WHERE child IN ($node1, $node2); if ($node1.parent == $node2) { return true; // node2 is the parent of node1 } if ($node1.depth > $node2.depth) { return false; // node1 is deeper in the tree as node2 } If the id of the root node is the well known value 1, and node2 = 1, we return true. If the id of the root node is the well known value 1, and node1 = 1, we return false. In all other cases, we first retrieve the parent and the depth of node1 an d node2: If node1_parent is node2, we return true. If node2_parent is node1, we return false. If node1_depth is greater or equal to node2_depth we return false. In all other cases, we have to get the path from the node down to the depth of the ancestor. This is similar to the getPath() operation. For example, if node1_depth is 10 and node2_depth is 4, we can perform the following self -join: SELECT d4.parent AS d4_node FROM tree AS d8 JOIN tree AS d7 ON d7.child = d8.parent JOIN tree AS d6 ON d6.child = d7.parent JOIN tree AS d5 ON d5.child = d6.parent JOIN tree AS d4 ON d4.child = d5.parent WHERE d8.child = $node1_parent; If d4_node = node2, we return true. Otherwise we return false. Analysis The main limitation of such an approach is that we need one self-join for every level in the hierarchy, and performance will naturally degrade with each level added as the joining grows in complexity. 7.4.8. insertInto($node1:int, $node2:int):void Algorithm LOCK TABLES tree WRITE; SELECT depth AS parent_dept, lft AS parent_lft, rgt AS parent_rgt FROM tree WHERE child=$node2; Lucerne, 11 December 2009 Page 35/38 SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=$node2; if (used_rgt - parent_rgt < 2) { UPDATE tree SET lft = CASE WHEN lft > `$used_rgt THEN lft + 102 ELSE lft END, rgt = CASE WHEN rft > `$used_rgt THEN rgt + 102 ELSE rgt END WHERE tree=1; } INSERT INTO tree (tree, child, parent, lft, rgt, depth) VALUES (1, $node1, $node2, $used_rgt+1, $used_rgt+2, $parent_depth+1); UNLOCK TABLES; First we lock table tree. Then we determine if there is enough space available to insert a new child into node2. If used_rgt - parent_rgt is smaller than 2, space must be created by reorganizing the tree structure. To reduce the need for reorganizations, space for 51 nodes is created. The new node is inserted. Finally we unlock the table. Example Insert a new node with id 17 inside "File Exchange Folder". mysql> LOCK TABLES tree WRITE; Query OK, 0 rows affected (0.00 sec) mysql> SELECT depth AS parent_depth, lft AS parent_lft, rgt AS parent_rgt FROM tree WHERE child=12; +--------------+------------+------------+ | parent_depth | parent_lft | parent_rgt | +--------------+------------+------------+ | 7 | 115 | 122 | +--------------+------------+------------+ 1 row in set (0.00 sec) mysql> SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=19; +----------+ | used_rgt | +----------+ | 121 | +----------+ 1 row in set (0.00 sec) mysql> UPDATE tree SET lft=CASE WHEN lft>121 THEN lft+102 ELSE lft END, rgt=CASE WHEN rgt>121 THEN rgt+102 ELSE rgt END WHERE tree = 1; Query OK, 99992 rows affected (2.05 sec) Rows matched: 100000 Changed: 99992 Warnings: 0 mysql> INSERT INTO tree (tree, child, parent, lft, rgt, depth) VALUES (1, 17, 12, 122, 123, 8); mysql> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) Analysis mysql> EXPLAIN SELECT depth AS parent_depth, lft AS parent_lft, rgt AS parent_rgt FROM tree WHERE child=12; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) Lucerne, 11 December 2009 Page 36/38 mysql> EXPLAIN SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=19; +----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | 1 | SIMPLE | tree | ref | parent | parent | 5 | const | 3 | Using where | +----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ 1 row in set (0.00 sec) In the first select statement, the row can be efficiently retrieved using the index "child". The second select shows that all children of node2 have to be visited to retrieve the used_rgt value. The update statement which reorganizes the tree is very inefficient, since all rows of the tree must be inspected. And - in case of this example - almost all rows had to be changed. 7.4.9. delete($node:int):void Algorithm LOCK TABLES tree WRITE; SELECT tree, parent, lft, rgt FROM tree WHERE child=$node; DELETE FROM tree WHERE lft BETWEEN $lft AND $rgt; SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=$parent; $diff = $rgt - $used_rgt; if ($diff > 100) { UPDATE tree SET lft=CASE WHEN lft > $used_rgt THEN lft - $diff + 100 ELSE lft END, rgt=CASE WHEN rgt > $used_rgt THEN rgt - $diff + 100 ELSE rgt END WHERE tree = 1; } UNLOCK TABLES; First we lock table tree. Then we retrieve the node data. Then we delete the subtree starting at the node. Next we determine the size of the gap in the parent node. If more than 100 nodes fit into the gap, we reduce the gap to leave space for 50 nodes only. Finally we unlock the table. Example Deleting the subtree of 12 "File Exchange Folder": mysql> LOCK TABLES tree WRITE; Query OK, 0 rows affected (0.00 sec) mysql> SELECT tree, parent, lft, rgt FROM tree WHERE child=12; +------+--------+---------+---------+ | tree | parent | lft | rgt | +------+--------+---------+---------+ | 1 | 10 | 115 | 122 | +------+--------+---------+---------+ 1 row in set (0.00 sec) mysql> DELETE FROM tree WHERE lft BETWEEN 115 AND 122; Query OK, 4 rows affected (0.23 sec) mysql> SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=10; Lucerne, 11 December 2009 Page 37/38 +----------+ | used_rgt | +----------+ | 114 | +----------+ 1 row in set (0.00 sec) $diff = $rgt - $used_rgt; mysql> UPDATE tree SET lft=CASE WHEN lft > 114 THEN lft - 8 + 100 ELSE lft END, rgt=CASE WHEN rgt > 114 THEN rgt - 8 + 100 ELSE rgt END WHERE tree = 1; (The update is not done in this example, because the gap is smaller than 100). mysql> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) Analysis mysql> EXPLAIN SELECT tree, parent, lft, rgt FROM tree WHERE child=12; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | tree | ref | child | child | 4 | const | 1 | | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=10; +----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | 1 | SIMPLE | tree | ref | parent | parent | 5 | const | 1 | Using where | +----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM tree WHERE lft BETWEEN 115 AND 122 AND tree=1; +----+-------------+-------+------+---------------+----------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+----------+---------+-------+--------+-------------+ | 1 | SIMPLE | tree | ref | jmp_tree | jmp_tree | 4 | const | 100000 | Using where | +----+-------------+-------+------+---------------+----------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) The update statement which reorganizes the tree is very inefficient, since all rows with tree=1 must be inspected. 7.4.10. moveTo($node1:int, $node2:int):void Algorithm LOCK TABLES tree WRITE; SELECT tree, parent, depth, lft, rgt FROM tree WHERE child IN ($node1, $node2); $spread_diff = $node1.rgt - $node1.lft + 1; // Create a gap at node2 UPDATE tree SET lft = CASE WHEN lft > $node2.rgt THEN lft + $spread_diff) ELSE lft END, rgt = CASE WHEN rgt >= $target_rgt THEN rgt + $spread_diff ELSE rgt END WHERE tree = $node2.tree Lucerne, 11 December 2009 Page 38/38 if ($node1.lft > $node2.rgt) { $where_offset = $spread_diff; $move_diff = $node2.rgt - $node1.lft - $spread_diff; } else { $where_offset = 0; $move_diff = $node2.rgt - $node1.lft; } $depth_diff = $target_depth - $source_depth + 1; // Move the node1 subtree to node2 UPDATE tree SET parent = CASE WHEN parent = $node1.parent THEN $node2 ELSE parent END, rgt = rgt + $move_diff, lft = lft + $move_diff, depth = depth + $depth_diff, tree = $node2.tree WHERE lft >= $node1.lft + $where_offset AND rgt <= $node1.rgt + $where_offset AND tree = $node1.tree; // close the gap which we created at node1 UPDATE tree SET lft = CASE WHEN lft >= $node1.lft + $where_offset THEN lft - $spread_diff ELSE lft END, rgt = CASE WHEN rgt >= $node1.rgt + $where_offset THEN rgt - $spread_diff ELSE rgt END WHERE tree = $node1.tree; UNLOCK TABLES; First we lock table tree. Then we retrieve the data of node1 and 2. Next we create a gap in node 2. We can now move the subtree of node 1 into node 2. We close the gap that we created in the parent of node 1. Finally we unlock the table. Analysis This algorithm is very inefficient, because each of the three update statements in this algorithm performs a full table space scan. 8. Bibliography [1] Celko, J. (1999). SQL for Smarties: Advanced SQL Programming Second Edition. The Morgan Kaufmann Series in Data Management Systems. [2] Zawodny, J., Balling, D. (2004). High Performance MySQL. O'Reilly Media.
© Copyright 2025 Paperzz