Operations on the Adjacency List + Materialized Path Tree data

Redesign of ILIAS Repository Tree
for Performance
Lucerne, 11 December 2009
Page 1/38
Frankenstrasse 9, Postfach 2969, CH-6002 Luzern
T +41 41 228 42 42, F +41 41 228 42 43
www.hslu.ch
Lucerne, 11 December 2009
Page 2/38
Table of changes
Version
Date
1.0
2010-02-12
0.9
2009-12-11
Status
Final
Draft
Changes and comments
Edited by
W. Randelshofer
W. Randelshofer
Table of contents
1. Vision..................................................................................................................................... 3
2. State of this project ................................................................................................................. 4
2.1. Current state ........................................................................................................................ 4
2.2. Completed steps .................................................................................................................. 4
2.3. Next Steps ........................................................................................................................... 4
2.4. Funding ............................................................................................................................... 5
3. Requirements analysis ............................................................................................................ 6
4. Proposed changes ................................................................................................................... 7
4.1. Change of data structures ..................................................................................................... 7
4.2. Change of storage engines ................................................................................................... 8
4.3. Change of code which accesses table "tree" ......................................................................... 8
4.4. Use of transactions instead of table locks ............................................................................. 8
5. Appendix A: Overview of tree data structures ....................................................................... 10
5.1. Adjacency List data structure ............................................................................................. 10
5.2. Nested Sets data structure .................................................................................................. 10
5.3. Materialized Path data structure ......................................................................................... 11
5.4. Adjacency List + Nested Sets data structure ....................................................................... 12
5.5. Adjacency List + Materialized Path data structure .............................................................. 13
6. Appendix B: Operations on the Adjacency List + Materialized Path Tree data structure ........ 14
6.1. Example repository structure ............................................................................................. 14
6.2. Implementation of the repository structure ......................................................................... 15
6.3. Example tree operations ..................................................................................................... 16
6.4. Implementation of the tree operations ................................................................................ 17
7. Appendix C: Operations on the Adjacency List + Nested Sets Tree data structure ................. 28
7.1. Example repository structure ............................................................................................. 28
7.2. Implementation of the data structure .................................................................................. 28
7.3. Example tree operations ..................................................................................................... 29
7.4. Implementation of the tree operations ................................................................................ 29
8. Bibliography ......................................................................................................................... 38
Lucerne, 11 December 2009
Page 3/38
1.
Vision
We want to improve the performance and responsiveness of ILIAS when many users are logged in
simultaneously, and when the repository of ILIAS contains a large number of objects.
Specifically,
 we want a repository data structure in ILIAS which is responsive, even if the repository
contains more than 500,000 objects.

we want support for concurrent write operations in ILIAS, so that more than 200 users can
work simultaneously without blocking each other.
Lucerne, 11 December 2009
Page 4/38
2.
2.1.
State of this project
Current state
This project is currently in a pilot phase.
A pilot implementation based on ILIAS 3.10 is in use at the Lucerne University of Applied
Sciences and Arts (HSLU) since May 2009.
2.2.
Completed steps
In autumn 2008 the Lucerne University of Applied Sciences and Arts (HSLU) experienced
performance issues with ILIAS, after upgrading from version 3.7 to 3.10.
A first analysis at that time suggested, that the data structure of the ILIAS repository could be a
cause of this problem, but changing it was considered too risky.
HSLU made a number of performance improvements in several SQL statements of ILIAS, in the
hope that they would sufficiently improve the situation. (These changes have been integrated into
the official ILIAS code base in spring 2009). Also, a number of database tables have been migrated
from the MySQL MyISAM storage engine to the InnoDB storage engine.
Since the achieved improvements were not satisfactory, a requirements analysis for changing the
repository structure of ILIAS started in early 2009 at HSLU.
The changes in the source code of ILIAS have been made in the HSLU code branch for ILIAS 3.10
in early 2009.
All proposed changes have been implemented by HSLU and are in use at HSLU since spring 2009.
Performance tests have been made in May and June 2009 at HSLU. The results have been presented
at the joint ILIASuisse and Baden-Württemberg Community meeting on July 2, 2009.
2.3.
Next Steps
The proposed changes need to be reviewed for inclusion in the official ILIAS code base.
After successful review, the changes can be incorporated into the ILIAS code base. If the ILIAS
core team implements the changes, it needs funding by the ILIAS open source community. At the
current time (February 2010), a restructuring project is taking effect at HSLU, and it is not yet
decided who will be in charge with ILIAS tasks. Therefore, funding any refactoring or
implementing into ILIAS 4.x by HSLU cannot be promised. But if it turns out support is possible,
HSLU will be glad to help.
As the code is ready in the HSLU 3.10 branch, the inclusion may occur e.g. spring 2010 for the
ILIAS 4.1 release.
Lucerne, 11 December 2009
Page 5/38
2.4. Funding
This project is funded by the Lucerne University of Applied Sciences and Arts (HSLU).
Lucerne, 11 December 2009
Page 6/38
3.
Requirements analysis
A performance analysis of ILIAS 3.10 made in winter 2008 suggested that the data structure of the
ILIAS repository, and the MySQL database engine could be a cause for the performance issues
experienced by HSLU.
The following components were identified as performance critical:

The repository of ILIAS 3.10 is represented by a data structure named Adjacency List +
Nested Sets Tree. (See Appendix A for a discussion of advantages and disadvantages of
this data structure.)

ILIAS 3.10 uses the MyISAM storage engine to store its database tables. (See [2] for a
discussion of MySQL storage engines.)
Based on the literature [1] and [2], the following hypotheses were made:
Hypothesis 1: Operations on the repository are not always responsive for the following reasons:
1. The Nested Sets data structure frequently needs reorganization. On average half of all rows
need to be updated during reorganization. On large repositories, this may take several
seconds.
2. Operations which act on a subtree of the repository tree perform slowly because no indices
are defined over the Nested Sets data structure. On large repositories, this may take longer
than a second.
Hypothesis 2: ILIAS does not support concurrent read and write operations for the following
reasons:
1. The database tables use the MySQL MyISAM storage engine, which does not support
concurrent write operations.
2. The repository tree is implemented using a redundant Adjacency List + Nested Sets data
structure. The Nested Sets data structure does not support concurrent write operations.
Lucerne, 11 December 2009
Page 7/38
4.
4.1.
Proposed changes
Change of data structures
Change the data structure used by table tree from Adjacency List+Nested Sets to Adjacency
List+Materialized path:
ALTER TABLE `tree` (
DROP COLUMN `lft`,
DROP COLUMN `rgt`,
DROP COLUMN `depth`,
ADD COLUMN `path` varchar(255) character set ascii NOT NULL,
ADD KEY `path_index` (`path`(255))
)
This yields the following create table statement:
CREATE TABLE `tree` (
`tree` int(10) NOT NULL default '0',
`child` int(10) unsigned NOT NULL default '0',
`parent` int(10) unsigned default NULL,
`path` varchar(255) character set ascii NOT NULL,
KEY `child` (`child`),
KEY `parent` (`parent`),
KEY `jmp_tree` (`tree`),
KEY `path_index` (`path`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Note the use of the US-ASCII character set for the path field, and the use of a prefix length of 255
characters for the path_index key.
The path field holds a path to a tree node. The path consists of child -ID's encoded as decimal digit
characters, delimited by the character '.'. The '~' character is used to specify upper bounds in range
queries.
See section 5.3 for a discussion of the path field.
This change affects the code in
 ilias3/setup/sql/dbupdate_02.php
PENDING: The above table definition is specific to MySQL. It is not clear, whether we can use
the same character set, prefix length key and encoding with Oracle.
Lucerne, 11 December 2009
Page 8/38
4.2. Change of storage engines
Replace the MyISAM storage engine by the InnoDB storage engine in the database tables "tree"
and "object_reference":
ALTER TABLE `tree` ENGINE='InnoDB';
ALTER TABLE `object_reference` ENGINE='InnoDB';
This change affects the code in
 ilias3/setup/sql/dbupdate_02.php
PENDING: The above table definitions are specific to MySQL. It is not clear, whether we can use
the same definitions with Oracle. What we need are tables which support transactions with r owlevel locking.
4.3. Change of code which accesses table "tree"
In ILIAS 3.10.x, all code which access tables with an Adjacency List+Nested Sets data structure is
in class.ilTree.php.
Chapter 5.4 lists all tables which are accessed by class.ilTree.php.
Since we propose to only change the data structure in table "tree", we propose creating two new
files:
 ilias3/Services/tree/classes/class.ilALNSTree.php
Contains the existing code for the Adjacency List+Nested Sets data structure.
 ilias3/Services/tree/classes/class.ilALMPTree.php
Contains the new code for the Adjacency List+Materialized Path data structure.
The code in class.ilTree.php is changed into a Proxy which either calls ilALNSTree or ALMPTree
depending on the table being accessed.
Alternatively, if-statements in class.ilTree.php can be used to invoke different SQL-statements
depending on the table being accessed. This approach was used in the pilot at HSLU.
Chapter 6 gives details about the SQL statements for accessing the Adjacency List+Materialized
Path data structure.
PENDING: Decide whether ilTree shall be turned into a Proxy which calls two new class files, or
whether ilTree shall use if-statements.
4.4.
Use of transactions instead of table locks
Rewrite the SQL statements which block table tree while it is being updated, and use transactions
instead.
Lucerne, 11 December 2009
Page 9/38
This rewrite affects the code in
 ilias3/Services/tree/classes/class.ilTree.php (if if-statements are used)
or
 ilias3/Services/tree/classes/class.ilALMPTree.php (if a new class for
PENDING: If an ILIAS installation uses the MyISAM engine instead of the InnoDB engine for
table "tree" and for table "object_reference", the queries which change data in these tables must use
explicit table locking instead of transactions. If table locking is not used, the database will quickly
get corrupted due to concurrent write accesses.
Lucerne, 11 December 2009
Page 10/38
5.
Appendix A: Overview of tree data structures
This appendix provides an overview of tree data structures and their usage in ILIAS.
5.1.
Adjacency List data structure
Description
The Adjacency List data structure represents a tree structure as a child to parent relationship.
The following fields are used to identify a tree node and the relationship to its parent node:
child
Uniquely identifies a tree node.
parent
Holds the id of the parent node.
Usage
This data structure is used by the following tables in ILIAS:
The call structure.
ctrl_calls
ctrl_structure
5.2.
The code control structure.
Nested Sets data structure
Description
The Nested Sets data structure uses enclosure (containment) to show parenthood.
The following fields are used to identify the boundaries of an enclosure:
lft
Holds the left (lower) boundary of the nested set enclosed by the node.
rgt
Holds the right (upper) boundary of the nested set enclosed by the node.
Usage
This data structure is not used in ILIAS. It is described here only for completeness.
Lucerne, 11 December 2009
Page 11/38
5.3.
Materialized Path data structure
Description
The Materialized path data structure represents a tree structure by a path description from the root
of the tree down to a specific node:
The following fields are used to identify a tree node and its path:
child
Uniquely identifies a tree node.
path
Holds the path to a node.
Contents of the path field
Encoding
The path consists of child id's separated by a special character - the separator character.
To allow for range queries, the separator character must be smaller than the characters used for
encoding the child id's (or more precisely: must precede them in the collation sequence). We also
need a character for defining the upper bound of a range query. This character must succeed the
characters used for encoding the child id's in the collation sequence.
If the path uses the US-ASCII collation rule, we can use



the digits 0-9 and the characters a-z for encoding the child id's.
the '.' (full stop) for separating the child-id's
the '~' (tilde) character for range queries
With 0-9 and a-z we can either use decimal encoding, hex encoding or a base-36 encoding for the
child id's. For simplicity and for robustness reasons we suggest using decimal encoding for now. In
future versions of ILIAS, one of the other two encodings can be used.
Field length
The path is stored in a VARCHAR field. In MySQL, an ASCII VARCHAR field can have up to
65,535 characters. If id's with 11 decimal digits are chosen, then this limits the maximal depth of a
tree to 5,200 levels.
In MySQL the maximal length of VARCHAR fields depends on the character encoding used. If we
chose UTF-8 instead of ASCII, we could only store up to 21,845 characters. The effective
maximum length of a VARCHAR in MySQL is subject to the maximum row size (65,535 bytes,
which is shared among all columns) and the character set used.
For access efficiency, the path needs to be indexed. In MySQL, a prefix index over the first 255
characters of the path can be used.
PENDING: The above description is specific to MySQL. It is not clear, whether we can use the
same character set, prefix length key and encoding with Oracle.
Usage
This data structure is not used in ILIAS. It is described here only for completeness.
Lucerne, 11 December 2009
Page 12/38
5.4.
Adjacency List + Nested Sets data structure
Description
The Adjacency List + Nested Sets data structure uses both these data structures to provide efficient
access to tree items. In addition, a depth field is used to improve the performance of operations
which retrieve the path to a tree node.
The following fields are used for this data structure:
child
Uniquely identifies a tree node.
parent
Holds the id of the parent node.
depth
Holds the depth of the node in the tree.
lft
Holds the left (lower) boundary of the nested set enclosed by the node.
rgt
Holds the right (upper) boundary of the nested set enclosed by the node.
Usage
This data structure is used by the following tables in ILIAS:
Bookmark tree on the personal desktop.
bookmark_tree
cp_tree
SCORM 2004 course item tree.
frm_posts_tree
Posts in the discussion thread of a forum.
lm_tree
Chapters and pages tree in a learning module.
mail_tree
Mail folder tree.
mep_tree
MediaPool tree.
search_tree
Search results tree.
scorm_tree
SCORM 1.2 course item tree.
tree
The repository tree.
xml_tree
?
All code which accesses these tables is in class.ilTree.php.
Lucerne, 11 December 2009
Page 13/38
5.5.
Adjacency List + Materialized Path data structure
Description
The Adjacency List + Materialized Path data structure uses both these data structures to provide
efficient access to tree items.
The following fields are used for this data structure:
child
Uniquely identifies a tree node.
parent
Holds the id of the parent node.
depth
Holds the depth of a node in the tree.
path
Holds the path to a node.
Usage
We propose to use this data structure for table "tree" in ILIAS.
Lucerne, 11 December 2009
Page 14/38
6.
Appendix B: Operations on the Adjacency List + Materialized Path Tree data structure
This appendix provides details of the proposed Adjacency List + Materialized Path Tree data
structure.
6.1. Example repository structure
The following example repository with a total of 100'000 objects is used as an example in this
appendix and in appendix C:
ILIAS Root
School of Business Category
Bachelor Category
Autumn 2009 Category
English A09 Course
Role Folder
Lecture Notes Folder
Grammar 101.pdf File
Qualification Notes Folder
English A09.01 Group
Role Folder
File Exchange Folder
Role Folder
Joe's Workbook.doc
Mary's Workbook.doc File
unknown Category
99'985 unknown objects
Lucerne, 11 December 2009
Page 15/38
6.2. Implementation of the repository structure
The table below shows an implementation of the example tree structure using the proposed
Adjacency List + Materialized Path data structure:
tree child parent depth path
1
1
0
1
1
ILIAS Root
1
2
1
2
1.2
School of Business Category
1
3
2
3
1.2.3
Bachelor Category
1
4
3
4
1.2.3.4
Autumn 2009 Category
1
5
4
5
1.2.3.4.5
English A09 Course
1
6
5
6
1.2.3.4.5.6
Role Folder
1
7
5
6
1.2.3.4.5.7
Lecture Notes Folder
1
8
6
7
1.2.3.4.5.7.8
Grammar 101.pdf File
1
9
5
6
1.2.3.4.5.9
Qualification Notes Folder
1
10
5
6
1.2.3.4.5.10
English A09.01 Group
1
11
10
7
1.2.3.4.5.10.11
Role Folder
1
12
10
7
1.2.3.4.5.10.12
File Exchange Folder
1
13
12
8
1.2.3.4.5.10.12.13 Role Folder
1
14
12
8
1.2.3.4.5.10.12.14 Joe's Workbook.doc
1
15
12
8
1.2.3.4.5.10.12.15 Mary's Workbook.doc File
1
16
1
2
1.16
Unknown Category
…
…
…
…
…
99,985 Objects
Lucerne, 11 December 2009
Page 16/38
6.3. Example tree operations
The following tree operations are used as examples in this appendix and in appendix C:
getRoot():int
Gets the reference id of the object located at the root of the specified
repository tree.
getChildren($node:int):int[]
Takes the reference id of a node, and returns the reference ids of all
its childrens.
getParent($node:int):int
Takes the reference id of a node, and returns the reference id of its parent
node.
getPath($node:int):int[]
Takes the reference id of a node, and returns the reference ids of its
parents, ordered by depth.
getSubtree($node:int):int[]
Takes the reference id of a node, and returns the reference ids of all
nodes in the subtree, including the node itself.
getDepth($node:int):int
Takes the reference id of a node, and returns the depth of the node.
isDescendantOf($node1:int,
$node2:int):boolean
Returns true, if node1 is contained in the subtree of node2.
insertInto($node1:int,
$node2:int):void
Inserts node1 as a child of node2.
delete($node:int):void
Deletes the subtree starting at the specified node from the tree.
moveTo($node1:int, $node2:int):void Removes the subtree starting at node1 from its parent and adds it as a
child to node2.
Lucerne, 11 December 2009
Page 17/38
6.4. Implementation of the tree operations
The following paragraphs describe the implementation of tree operations using the proposed
Adjacency List + Materialized Path data structure:
For each operation an analysis is given. The analysis shows that all operations on an Adjacency List
+ Materialized Path data structure perform equally well or better than those for a Adjacency List +
Nested Sets data structure.
6.4.1.
getRoot():int
Algorithm
SELECT child FROM tree WHERE parent = 0 AND tree = 1;
In general no database access is needed, because the root node has the well known id 1.
In case the id is not well known, the above statement can be used. This statement uses the
Adjacency List data structure.
Example
mysql> SELECT child FROM tree WHERE parent=0 AND tree = 1;
+-------+
| child |
+-------+
|
1 |
+-------+
1 row in set (0.00 sec)
The result is 1.
Analysis
mysql> EXPLAIN SELECT child FROM tree WHERE parent=0 AND tree = 1;
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref
| rows | Extra
|
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
| 1 | SIMPLE
| tree | ref | parent,jmp_tree | parent | 5
| const |
1 | Using where |
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
1 row in set (0.00 sec)
The explain statement shows that the database can use the key "parent" to select the row, and that it
only has to visit a single table row to find the desired result.
Lucerne, 11 December 2009
Page 18/38
6.4.2.
getChildren($node:int):int[]
Algorithm
SELECT child FROM tree WHERE parent = $node AND tree = 1;
The Adjacency List data structure is used to retrieve the children of a node.
The clause "AND tree=1" is needed, because we only want children, which are in the same tree as
the node - assuming that the node is in tree 1.
Alternative algorithm
If the tree of the node is not known, the following statement can be used:
SELECT child FROM tree WHERE parent=$node AND tree = (SELECT tree FROM tree WHERE child=$node);
Example:
Getting the children of 10 "English A09 Course":
mysql> SELECT child FROM tree WHERE parent = 5 AND tree = 1;
+--------+
| child |
+--------+
|
6 |
|
7 |
|
9 |
|
10 |
+--------+
4 rows in set (0.00 sec)
The result is {6, 7, 9} {"Role Folder", "Lecture Notes Folder", "Grammar Notes Folder", "English
A09.01 Group"}
Analysis
mysql> EXPLAIN SELECT child FROM tree WHERE parent = 5 AND tree = 1;
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref
| rows | Extra
|
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
| 1 | SIMPLE
| tree | ref | parent,jmp_tree | parent | 5
| const |
10 | Using where |
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
1 row in set (0.00 sec)
The explain statement shows that the database can use the key "parent" to select the rows, and that
it only has to visit rows which are part of the result set.
Analysis of the alternative algorithm
mysql> EXPLAIN SELECT child FROM tree WHERE parent=17 AND tree = (SELECT tree FROM tree WHERE child=17);
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref
| rows | Extra
|
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
| 1 | PRIMARY
| tree | ref | parent,jmp_tree | parent | 5
| const |
18 | Using where |
| 2 | SUBQUERY
| tree | ref | child
| child | 4
|
|
1 |
|
+----+-------------+-------+------+-----------------+--------+---------+-------+------+-------------+
2 rows in set (0.00 sec)
The explain statement shows that the database can use the key "parent" to select the rows, and that
it only has to visit rows which are part of the result set.
Lucerne, 11 December 2009
Page 19/38
6.4.3.
getParent($node:int):int
Algorithm
SELECT parent FROM tree WHERE child=$node;
The Adjacency List data structure is used to retrieve the parent of a node.
Example
Getting the parent of "English A09 Course":
mysql> SELECT parent FROM tree WHERE child=5;
+--------+
| parent |
+--------+
|
4 |
+--------+
1 row in set (0.00 sec)
The result is {4} {"Autumn 2009 Category"}.
Analysis
mysql> EXPLAIN SELECT parent FROM tree WHERE child=5;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
|
1 | SIMPLE
| tree
| ref
| child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
The explain statement shows that the database can use the key "child" to select the row, and that it
only has to visit 1 row.
Lucerne, 11 December 2009
Page 20/38
6.4.4.
getPath($node:int):int[]
Algorithm
SELECT path FROM tree WHERE child=$node;
$nodes = $path.split('.');
The Materialized Path data structure is used to retrieve the path to a node.
PHP is then used to split up the path into an array.
Example
Getting the path to "English A09.01 Group":
mysql> SELECT path FROM tree WHERE child=10;
+--------------+
| path
|
+--------------+
| 1.2.3.4.5.10 |
+--------------+
1 row in set (0.00 sec)
The result is {1, 2, 3, 4, 5, 10} => {"ILIAS Root", "School of Business Category", "Bachelor
Category", "Autumn 2009 Category", "English A09 Course", "English A09.01 Group"}.
Analysis
mysql> EXPLAIN SELECT path FROM tree WHERE child=10;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
The explain statement shows that the database can use the key "child" to select the row, and that it
only has to visit 1 row.
Lucerne, 11 December 2009
Page 21/38
6.4.5.
getSubtree($node:int):int[]
Algorithm
SELECT tree, path AS from_range FROM tree WHERE child=$node;
$to_range = $from_range.'.~';
SELECT child FROM tree WHERE path BETWEEN $from_range AND $to_range AND tree=$tree;
Subtrees are retrieved using the Materialized Path data structure.
First we select the tree and the path as from_range.
Using PHP we construct to_range by appending '.~' to from_range.
Then we retrieve the subtree.
Alternative algorithm
SELECT t2.child
FROM tree AS t1
JOIN tree AS t2 ON t2.path BETWEEN t1.path AND CONCAT(t1.path, '.~') AND t2.tree=t1.tree
WHERE t1.child=$node;
Unfortunately, MySQL does not perform an efficient query with this statement.
Example
Getting the subtree of 12 "File Exchange Folder":
mysql> SELECT tree, path AS from_range FROM tree WHERE child=12;
+------+-----------------+
| tree | from_range
|
+------+-----------------+
|
1 | 1.2.3.4.5.10.12 |
+------+-----------------+
1 row in set (0.00 sec)
mysql> SELECT child FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND tree=1;
+-------+
| child |
+-------+
|
12 |
|
13 |
|
14 |
|
15 |
+-------+
4 rows in set (0.00 sec)
The result is {12, 13, 14, 15} => {"File Exchange Folder", "Role Folder", "Joe's Workbook.doc",
"Mary's Workbook.doc File"}.
Analysis
mysql> EXPLAIN SELECT tree, path AS from_range FROM tree WHERE child=12;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
The explain statement of the first query shows that the database can use the key "child" to select the
row, and that it only has to visit 1 row.
mysql> EXPLAIN SELECT child FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND
tree=1;
+----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+
Lucerne, 11 December 2009
Page 22/38
| id | select_type | table | type | possible_keys
| key
| key_len | ref | rows | Extra
|
+----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+
| 1 | SIMPLE
| tree | range | jmp_tree,path_index | path_index | 257
| NULL |
4 | Using where
|
+----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+
1 row in set (0.00 sec)
The explain statement of the second query shows that the database can use the key "path_index" to
select the rows and that it has only to visit as many rows as are in the subtree.
Analysis of alternative algorithm
mysql> EXPLAIN SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.path BETWEEN t1.path AND
CONCAT(t1.path, '.~') AND t2.tree=t1.tree WHERE t1.child=12;
+----+-------------+-------+------+---------------------------+----------+---------+--------------------+------+-------------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref
|
rows | Extra
|
+----+-------------+-------+------+---------------------------+----------+---------+--------------------+------+-------------+
| 1 | SIMPLE
| t1
| ref | child,jmp_tree,path_index | child
| 4
| const
|
1 |
|
| 1 | SIMPLE
| t2
| ref | jmp_tree,path_index
| jmp_tree | 4
| ilias_hslu.t1.tree |
34561 | Using where |
+----+-------------+-------+------+---------------------------+----------+---------+--------------------+------+-------------+
2 rows in set (0.00 sec)
The analysis of the alternative query shows that MySQL only uses the index jmp_tree, which is not
efficient.
6.4.6.
getDepth($node:int):int
Algorithm
SELECT depth FROM tree WHERE child = $node;
Getting the depth of a node is straightforward using the depth field.
Example
mysql> SELECT depth FROM tree WHERE child=10;
+-------+
| depth |
+-------+
|
6 |
+-------+
1 row in set (0.00 sec)
Analysis
mysql> EXPLAIN SELECT depth FROM tree WHERE child=10;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
The explain statement shows that the database can use the key "child" to select the row, and that it
only has to visit 1 row.
Lucerne, 11 December 2009
Page 23/38
6.4.7.
isDescendantOf(node1:int, node2:int):boolean
Algorithm
SELECT path FROM tree WHERE child=$node1;
return in_array($node2, explode('.',$path));
The Materialized Path data structure is used to determine whether one node is a descendant o f
another node.
PHP is then used to search for node2 in the path.
Example
Determine whether 10 "English A09.01 Group" is a descendant of 5 "English A09 Course":
mysql> SELECT path FROM tree WHERE child=10;
+--------------+
| path
|
+--------------+
| 1.2.3.4.5.10 |
+--------------+
1 row in set (0.00 sec)
The result is true, because the path contains the id 5, which is the id of "English A09 Course".
Analysis
mysql> EXPLAIN SELECT path FROM tree WHERE child=5;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
The explain statement shows that the database can use the key "child" to select the row, and that it
only has to visit 1 row.
Lucerne, 11 December 2009
Page 24/38
6.4.8.
insertInto(node1:int, node2:int):void
Algorithm
SELECT tree, depth, path FROM tree WHERE child=$node2;
if ($depth == $max_depth) {
// max depth reached, can not insert
}
INSERT INTO tree (tree, child, parent, depth, path)
VALUES ($tree, $node1, $node2, $depth+1, CONCAT(CONCAT($path, '.'), $node1))
First we check whether we can insert a new node without exceeding the maximal path length.
Then we insert the new node.
Example
Insert a new node with id 17 inside 12 "File Exchange Folder".
mysql> SELECT tree, depth, path FROM tree WHERE child=12;
+------+-------+-----------------+
| tree | depth | path
|
+------+-------+-----------------+
|
1 |
7 | 1.2.3.4.5.10.12 |
+------+-------+-----------------+
1 row in set (0.00 sec)
mysql> INSERT INTO tree (tree, child, parent, depth, path)
VALUES (1, 17, 12, 8, '1.2.3.4.5.10.12.17');
Analysis
mysql> EXPLAIN SELECT tree, depth, path FROM tree WHERE child=12;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
The data needed from node2 can be efficiently retrieved using key "child".
Lucerne, 11 December 2009
Page 25/38
6.4.9.
delete($node:int):void
Algorithm
START TRANSACTION;
SELECT tree, path AS from_range FROM tree WHERE child=$node FOR UPDATE;
$to_range = $from_range.'.~';
DELETE FROM tree WHERE path BETWEEN $from_range AND $to_range AND tree=$tree;
COMMIT;
Subtrees are deleted using the Materialized Path data structure.
First we select the tree and the path as from_range.
Using PHP we construct to_range by appending '.~' to from_range.
Then we delete the subtree.
Example
Deleting the subtree of 12 "File Exchange Folder":
mysql> START TRANSACTION;
mysql> SELECT tree, path AS from_range FROM tree WHERE child=12 FOR UPDATE;
+------+-----------------+
| tree | from_range
|
+------+-----------------+
|
1 | 1.2.3.4.5.10.12 |
+------+-----------------+
1 row in set (0.00 sec)
mysql> DELETE FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND tree=1;
Query OK, 4 rows affected (0.00 sec)
mysql> COMMIT;
Analysis
mysql> EXPLAIN SELECT tree, path AS from_range FROM tree WHERE child=12;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
The explain statement of the first query shows that the database can use the key "child" to select the
row, and that it only has to visit 1 row.
mysql> EXPLAIN SELECT child FROM tree WHERE path BETWEEN '1.2.3.4.5.10.12' AND '1.2.3.4.5.10.12.~' AND
tree=1;
+----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref | rows | Extra
|
+----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+
| 1 | SIMPLE
| tree | range | jmp_tree,path_index | path_index | 257
| NULL |
4 | Using where
|
+----+-------------+-------+-------+---------------------+------------+---------+------+------+------------+
1 row in set (0.00 sec)
The explain statement of the second query shows that the database can use the key "path_index" to
select the rows and that it has only to visit as many rows as are in the subtree.
Lucerne, 11 December 2009
Page 26/38
6.4.10. moveTo($node1:int, $node2:int):void
Algorithm
START TRANSACTION;
SELECT tree, child, parent, depth, path AS path FROM tree WHERE child IN ($node1, $node2) LOCK IN SHARE
MODE;
…assign result set to $node1 and $node2…
if ($node2.depth > $node1.depth) {
// Check whether we are within maximal path depth
$to_path = $node2.path.'.~';
SELECT MAX(depth) AS max_depth FROM tree
WHERE path BETWEEN $node2.path AND $to_path
AND tree = $node2.tree
FOR UPDATE
if($max_depth - $node1.depth + $node2.depth + 1 > $max_depth) {
// max depth exceeded - can not move
}
$split_pos = strrpos('.',$node1.path);
$to_path = $node1.path.'.~';
UPDATE tree
SET parent = CASE WHEN parent = $node1.parent THEN $node2
ELSE parent END,
path = CONCAT($node2.path, MID(path, $split_pos)),
depth = depth + $node2.depth - $node1.depth + 1
WHERE path BETWEEN $node1.path AND $to_path
AND tree = $node1.tree;
COMMIT;
Subtrees are moved using the Materialized Path data structure.
First we select depth, child, parent, path from node1 and node2.
If node2 has a greater depth than node1, we check whether we the move stays within the maximal
depth.
Then we move the nodes.
Example
Moving the subtree of 7 "Lecture Notes Folder" to 12 "File Exchange Folder":
mysql> SELECT tree, child, parent, depth, path FROM tree WHERE child IN (7, 12) LOCK IN SHARE MODE;
+-------+-------+--------+-------+-----------------+
| tree | child | parent | depth | path
|
+-------+-------+--------+-------+-----------------+
|
1 |
7 |
5 |
6 | 1.2.3.4.5.7
|
|
1 |
12 |
10 |
7 | 1.2.3.4.5.10.12 |
+-------+-------+--------+-------+-----------------+
2 rows in set (0.00 sec)
mysql> SELECT MAX(depth) AS max_depth FROM tree FORCE KEY path_index WHERE path BETWEEN '1.2.3.4.5.7' AND
'1.2.3.4.5.7.~' AND tree = 1 FOR UPDATE ;
+-----------+
| max_depth |
+-----------+
|
8 |
+-----------+
1 row in set (0.00 sec)
UPDATE tree
SET parent = CASE WHEN parent = 5 THEN 12 ELSE parent END,
path = CONCAT('1.2.3.4.5.10.12',MID(path, 10)),
Lucerne, 11 December 2009
Page 27/38
depth = depth + 7 - 6 + 1
WHERE path BETWEEN '1.2.3.4.5.7' AND '1.2.3.4.5.7.~'
AND tree = 1;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2 Changed: 2 Warnings: 0
Analysis
mysql> EXPLAIN SELECT tree, child, parent, depth, path FROM tree WHERE child IN (7, 12) LOCK IN SHARE MODE;
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra
|
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| 1 | SIMPLE
| tree | range | child
| child | 4
| NULL |
2 | Using where |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
1 row in set (0.00 sec)
The explain statement of the first query shows that the database can use the key "child" to select the
row, and that it only has to visit the 2 rows that we need.
mysql> EXPLAIN SELECT MAX(depth) FROM tree WHERE path BETWEEN '1.17.118519' AND '1.17.118519.~' ;
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra
|
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE
| tree | range | path_index
| path_index | 257
| NULL |
69 | Using where |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
1 row in set (0.00 sec)
The explain statement of the second query shows that the database can use the key "path_index" to
select the rows and that it has only to visit as many rows as are in the subtree.
Lucerne, 11 December 2009
Page 28/38
7.
Appendix C: Operations on the Adjacency List + Nested Sets Tree data structure
This appendix provides details of the existing Adjacency List + Nested Sets Tree data structure.
7.1. Example repository structure
See Appendix B.
7.2. Implementation of the data structure
The table below shows an implementation of the example repository using the Adjacency List +
Nested Sets data structure as currently implemented in ILIAS 3.10:
tree child parent depth lft
rgt
1
1
0
1
1
100106 ILIAS Root
1
2
1
2
2
703
School of Business Category
1
3
2
3
3
604
Bachelor Category
1
4
3
4
4
505
Autumn 2009 Category
1
5
4
5
5
406
English A09 Course
1
6
5
6
6
7
Role Folder
1
7
5
6
8
109
Lecture Notes Folder
1
8
6
7
10
11
Grammar 101.pdf File
1
9
5
6
110 111
Qualification Notes Folder
1
10
5
6
112 313
English A09.01 Group
1
11
10
7
113 114
Role Folder
1
12
10
7
115 122
File Exchange Folder
1
13
12
8
116 117
Role Folder
1
14
12
8
118 119
Joe's Workbook.doc
1
15
12
8
120 121
Mary's Workbook.doc File
1
16
1
2
704 100005 Unknown Category
…
…
…
…
…
…
99,985 Objects
Lucerne, 11 December 2009
Page 29/38
7.3. Example tree operations
See Appendix B.
7.4.
Implementation of the tree operations
The following paragraphs describe the implementation of tree operations in ILIAS 3.10 using the
Adjacency List + Nested Sets data structure.
For each operation an example and an analysis is given.
7.4.1.
getRoot():int
The code is identical to getRoot() in section 6.4.1.
7.4.2.
getChildren($node:int):int[]
The code is identical to getChildren() in section 6.4.2.
7.4.3.
getParent($node:int):int
The code is identical to getParent() in section 6.4.3.
Lucerne, 11 December 2009
Page 30/38
7.4.4.
getPath($node:int):int[]
Algorithm
SELECT tree, parent AS node_parent, depth AS node_depth
FROM tree
WHERE child = $node;
if ($node_depth == 1) {
return {$node}.
} elseif ($node_depth == 2) {
return {$parent, $node}.
} elseif ($node_depth == 3) {
return {1, $parent, $node}.
} elseif ($node_depth <= 63) {
SELECT d2.parent AS d2_node, d3.parent AS d3_node, …., dn.parent AS dn_node
FROM tree AS dn
JOIN tree AS dn-1 ON dn-1.child = dn.parent
…
JOIN tree AS d3 ON d3.child = d4.parent
JOIN tree AS d2 ON d2.child = d3.parent
WHERE dn.child = $node_parent
AND dn.tree=1 AND dn.tree=1 AND dn.tree=$tree;
return {1, $d2_node, $d3_node, …, $dn_node, $node_parent, $node]
} else {
SELECT t2.child
FROM tree AS t1
JOIN tree AS t2 ON t1.lft BETWEEN t2.lft AND t2.rgt
WHERE t1.child=$node
AND t1.tree=1 AND t2.tree=$tree
ORDER BY t2.depth;
return result set;
}
First, the depth of the node in the tree is determined. Also, the id of the parent node is retrieved.
This saves an additional SQL statement, if the node is located at a depth of 3 or less.
If the depth is 1, ILIAS returns the path: {node}.
If the depth is 2, ILIAS returns the path {parent, node}.
If the depth is 3, and if the id of the root node has the well known id 1, ILIAS returns the path [1,
parent, node].
If depth is less or equal to 63, a self-join over the Adjacency List data structure is used. The selfjoin becomes more complex the deeper the node is located in the tree. For example for depth 6,
ILIAS uses the following statement (assuming that the node at depth 1 is well known):
ILIAS only needs to retrieve 3 path elements here, since it is known that 1 is the reference id of the
root node, and since it can reuse the value of node_parent that it retrieved from the first SELECT
statement of this algorithm.
If the depth is greater than 63, nested joins can not be used because MySQL 4.1 limits the number
of joins to 61 tables. See http://dev.mysql.com/doc/refman/4.1/en/joins-limits.html
ILIAS reverts to the Nested Sets Tree data structure, when the path exceeds 63 levels.
Lucerne, 11 December 2009
Page 31/38
Example 1
This example uses the Adjacency List data structure.
Getting the path to 10 "English A09.01 Group":
mysql> SELECT parent AS node_parent, depth AS node_depth FROM tree WHERE child=10;
+-------------+------------+
| node_parent | node_depth |
+-------------+------------+
|
5 |
6 |
+-------------+------------+
1 row in set (0.00 sec)
mysql> SELECT d2.parent AS d2_node, d3.parent AS d3_node, d4.parent AS d4_node FROM tree AS d4 JOIN tree AS
d3 ON d3.child = d4.parent JOIN tree AS d2 ON d2.child = d3.parent WHERE d4.child = 5 AND d4.tree = 1 AND
d3.tree = 1 AND d2.tree = 1;
+---------+---------+---------+
| d2_node | d3_node | d4_node |
+---------+---------+---------+
|
2 |
3 |
4 |
+---------+---------+---------+
1 row in set (0.00 sec)
The result is {1, 2, 3, 4, 5, 10} -> {"ILIAS Root", "School of Business Category", "Bachelor
Category", "Autumn 2009 Category", "English A09 Course", "English A09.01 Group"}.
Analysis 1
mysql> EXPLAIN SELECT parent AS node_parent, depth AS node_depth FROM tree WHERE child=10;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT d2.parent AS d2_node, d3.parent AS d3_node, d4.parent AS d4_node FROM tree AS d4 JOIN
tree AS d3 ON d3.child = d4.parent JOIN tree AS d2 ON d2.child = d3.parent WHERE d4.child = 5 AND d4.tree =
1 AND d3.tree = 1 AND d2.tree = 1;
+----+-------------+-------+------+-----------------------+-------+---------+----------------------+-----+-------------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref
| rows |
Extra
|
+----+-------------+-------+------+-----------------------+-------+---------+----------------------+-----+-------------+
| 1 | SIMPLE
| d4
| ref | child,parent,jmp_tree | child | 4
| const
|
1 |
Using where |
| 1 | SIMPLE
| d3
| ref | child,parent,jmp_tree | child | 4
| ilias_hslu.d4.parent |
1 |
Using where |
| 1 | SIMPLE
| d2
| ref | child,jmp_tree
| child | 4
| ilias_hslu.d3.parent |
1 |
Using where |
+----+-------------+-------+------+-----------------------+-------+---------+----------------------+-----+-------------+
3 rows in set (0.00 sec)
The main limitation of getPath() using the Materialized Path Tree data structure is that we need one
self-join for every level in the hierarchy, and performance will naturally degrade with each level
added as the joining grows in complexity.
Example 2
This example uses the nested sets data structure.
Getting the path to 10 "English A09.01 Group":
mysql> SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t1.lft BETWEEN t2.lft AND t2.rgt WHERE
t1.child=10 AND t1.tree=1 AND t2.tree=1 ORDER BY t2.depth;
+--------+
| child |
+--------+
Lucerne, 11 December 2009
Page 32/38
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
10 |
+--------+
6 rows in set (0.00 sec)
The result is {1, 2, 3, 4, 5, 10} -> {"ILIAS Root", "School of Business Category", "Bachelor
Category", "Autumn 2009 Category", "English A09 Course", "English A09.01 Group"}.
Analysis 2
The problem with this approach is that all rows of the tree table need to be inspected, leading to
poor performance on repositories which have more than a few thousand objects.
mysql> EXPLAIN SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t1.lft BETWEEN t2.lft AND t2.rgt WHERE
t1.child=10 AND t1.tree=1 AND t2.tree=1 ORDER BY t2.depth;
+----+-------------+-------+------+----------------+----------+---------+-------+--------+----------------------------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref
| rows
| Extra
|
+----+-------------+-------+------+----------------+----------+---------+-------+--------+----------------------------+
|
1 | SIMPLE
| t2
| ref
| jmp_tree
| jmp_tree | 4
| t1
| ref
| child,jmp_tree | child
| const | 100000 | Using where;
Using filesort |
|
1 | SIMPLE
| 4
| const |
1 | Using where
|
+----+-------------+-------+------+----------------+----------+---------+-------+--------+----------------------------+
2 rows in set (0.00 sec)
Lucerne, 11 December 2009
Page 33/38
7.4.5.
getSubtree($node:int):int[]
Algorithm
SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.lft BETWEEN t1.lft AND t1.rgt AND
t1.tree=t2.tree WHERE t1.child=$node;
Subtrees are retrieved using the Nested Sets data structure.
Example
Getting the subtree of 12 "File Exchange Folder":
mysql> SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.lft BETWEEN t1.lft AND t1.rgt AND
t1.tree=t2.tree WHERE t1.child=12;
+-------+
| child |
+-------+
|
12 |
|
13 |
|
14 |
|
15 |
+-------+
4 rows in set (1.04 sec)
mysql>
Analysis
mysql> EXPLAIN SELECT t2.child FROM tree AS t1 JOIN tree AS t2 ON t2.lft BETWEEN t1.lft AND t1.rgt AND
t1.tree=t2.tree WHERE t1.child=12;
+----+-------------+-------+------+----------------+----------+---------+--------------------+--------+------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows
|
Extra
|
+----+-------------+-------+------+----------------+----------+---------+--------------------+--------+------------+
| 1 | SIMPLE
| t1
| ref | child,jmp_tree | child
| 4
| const
|
1 |
|
| 1 | SIMPLE
| t2
| ref | jmp_tree
| jmp_tree | 4
| ilias_hslu.t1.tree | 100000 |
Using where |
+----+-------------+-------+------+----------------+----------+---------+--------------------+--------+------------+
2 rows in set (0.00 sec)
The problem with this approach is that all rows of the tree table need to be inspected, leading to
poor performance on repositories which have more than a few thousand objects.
7.4.6.
getDepth($node:int):int
The code is identical to getChildren() in section 6.4.66.4.2.
Lucerne, 11 December 2009
Page 34/38
7.4.7.
isDescendantOf($node1:int, $node2:int):boolean
Algorithm
if ($node2 == 1) {
return true; // all nodes are descendants of the root node
}
if ($node1 == 1) {
return false; // the root node is not a descendant of any other node except itself
}
SELECT parent AS node_parent, depth AS node_depth FROM tree WHERE child IN ($node1, $node2);
if ($node1.parent == $node2) {
return true; // node2 is the parent of node1
}
if ($node1.depth > $node2.depth) {
return false; // node1 is deeper in the tree as node2
}
If the id of the root node is the well known value 1, and node2 = 1, we return true.
If the id of the root node is the well known value 1, and node1 = 1, we return false.
In all other cases, we first retrieve the parent and the depth of node1 an d node2:
If node1_parent is node2, we return true.
If node2_parent is node1, we return false.
If node1_depth is greater or equal to node2_depth we return false.
In all other cases, we have to get the path from the node down to the depth of the ancestor. This is
similar to the getPath() operation.
For example, if node1_depth is 10 and node2_depth is 4, we can perform the following self -join:
SELECT d4.parent AS d4_node
FROM tree AS d8
JOIN tree AS d7 ON d7.child = d8.parent
JOIN tree AS d6 ON d6.child = d7.parent
JOIN tree AS d5 ON d5.child = d6.parent
JOIN tree AS d4 ON d4.child = d5.parent
WHERE d8.child = $node1_parent;
If d4_node = node2, we return true.
Otherwise we return false.
Analysis
The main limitation of such an approach is that we need one self-join for every level in the
hierarchy, and performance will naturally degrade with each level added as the joining grows in
complexity.
7.4.8.
insertInto($node1:int, $node2:int):void
Algorithm
LOCK TABLES tree WRITE;
SELECT depth AS parent_dept, lft AS parent_lft, rgt AS parent_rgt FROM tree WHERE child=$node2;
Lucerne, 11 December 2009
Page 35/38
SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=$node2;
if (used_rgt - parent_rgt < 2) {
UPDATE tree
SET lft = CASE WHEN lft > `$used_rgt THEN lft + 102 ELSE lft END,
rgt = CASE WHEN rft > `$used_rgt THEN rgt + 102 ELSE rgt END
WHERE tree=1;
}
INSERT INTO tree (tree, child, parent, lft, rgt, depth)
VALUES (1, $node1, $node2, $used_rgt+1, $used_rgt+2, $parent_depth+1);
UNLOCK TABLES;
First we lock table tree.
Then we determine if there is enough space available to insert a new child into node2.
If used_rgt - parent_rgt is smaller than 2, space must be created by reorganizing the tree structure.
To reduce the need for reorganizations, space for 51 nodes is created.
The new node is inserted.
Finally we unlock the table.
Example
Insert a new node with id 17 inside "File Exchange Folder".
mysql> LOCK TABLES tree WRITE;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT depth AS parent_depth, lft AS parent_lft, rgt AS parent_rgt FROM tree WHERE child=12;
+--------------+------------+------------+
| parent_depth | parent_lft | parent_rgt |
+--------------+------------+------------+
|
7 |
115 |
122 |
+--------------+------------+------------+
1 row in set (0.00 sec)
mysql> SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=19;
+----------+
| used_rgt |
+----------+
|
121 |
+----------+
1 row in set (0.00 sec)
mysql> UPDATE tree SET lft=CASE WHEN lft>121 THEN lft+102 ELSE lft END, rgt=CASE WHEN rgt>121 THEN rgt+102
ELSE rgt END WHERE tree = 1;
Query OK, 99992 rows affected (2.05 sec)
Rows matched: 100000 Changed: 99992 Warnings: 0
mysql> INSERT INTO tree (tree, child, parent, lft, rgt, depth)
VALUES (1, 17, 12, 122, 123, 8);
mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)
Analysis
mysql> EXPLAIN SELECT depth AS parent_depth, lft AS parent_lft, rgt AS parent_rgt FROM tree WHERE child=12;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
Lucerne, 11 December 2009
Page 36/38
mysql> EXPLAIN SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=19;
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra
|
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
| 1 | SIMPLE
| tree | ref | parent
| parent | 5
| const |
3 | Using where |
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
1 row in set (0.00 sec)
In the first select statement, the row can be efficiently retrieved using the index "child".
The second select shows that all children of node2 have to be visited to retrieve the used_rgt value.
The update statement which reorganizes the tree is very inefficient, since all rows of the tree must
be inspected. And - in case of this example - almost all rows had to be changed.
7.4.9.
delete($node:int):void
Algorithm
LOCK TABLES tree WRITE;
SELECT tree, parent, lft, rgt FROM tree WHERE child=$node;
DELETE FROM tree WHERE lft BETWEEN $lft AND $rgt;
SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=$parent;
$diff = $rgt - $used_rgt;
if ($diff > 100) {
UPDATE tree
SET
lft=CASE WHEN lft > $used_rgt THEN lft - $diff + 100 ELSE lft END,
rgt=CASE WHEN rgt > $used_rgt THEN rgt - $diff + 100 ELSE rgt END
WHERE tree = 1;
}
UNLOCK TABLES;
First we lock table tree.
Then we retrieve the node data.
Then we delete the subtree starting at the node.
Next we determine the size of the gap in the parent node.
If more than 100 nodes fit into the gap, we reduce the gap to leave space for 50 nodes only.
Finally we unlock the table.
Example
Deleting the subtree of 12 "File Exchange Folder":
mysql> LOCK TABLES tree WRITE;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT tree, parent, lft, rgt FROM tree WHERE child=12;
+------+--------+---------+---------+
| tree | parent | lft
| rgt
|
+------+--------+---------+---------+
|
1 |
10 |
115 |
122 |
+------+--------+---------+---------+
1 row in set (0.00 sec)
mysql> DELETE FROM tree WHERE lft BETWEEN 115 AND 122;
Query OK, 4 rows affected (0.23 sec)
mysql> SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=10;
Lucerne, 11 December 2009
Page 37/38
+----------+
| used_rgt |
+----------+
|
114 |
+----------+
1 row in set (0.00 sec)
$diff = $rgt - $used_rgt;
mysql> UPDATE tree
SET
lft=CASE WHEN lft > 114 THEN lft - 8 + 100 ELSE lft END,
rgt=CASE WHEN rgt > 114 THEN rgt - 8 + 100 ELSE rgt END
WHERE tree = 1;
(The update is not done in this example, because the gap is smaller than 100).
mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)
Analysis
mysql> EXPLAIN SELECT tree, parent, lft, rgt FROM tree WHERE child=12;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1 | SIMPLE
| tree | ref | child
| child | 4
| const |
1 |
|
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT MAX(rgt) AS used_rgt FROM tree WHERE parent=10;
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows | Extra
|
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
| 1 | SIMPLE
| tree | ref | parent
| parent | 5
| const |
1 | Using where |
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM tree WHERE lft BETWEEN 115 AND 122 AND tree=1;
+----+-------------+-------+------+---------------+----------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref
| rows
| Extra
|
+----+-------------+-------+------+---------------+----------+---------+-------+--------+-------------+
| 1 | SIMPLE
| tree | ref | jmp_tree
| jmp_tree | 4
| const | 100000 | Using where |
+----+-------------+-------+------+---------------+----------+---------+-------+--------+-------------+
1 row in set (0.00 sec)
The update statement which reorganizes the tree is very inefficient, since all rows with tree=1 must
be inspected.
7.4.10. moveTo($node1:int, $node2:int):void
Algorithm
LOCK TABLES tree WRITE;
SELECT tree, parent, depth, lft, rgt FROM tree WHERE child IN ($node1, $node2);
$spread_diff = $node1.rgt - $node1.lft + 1;
// Create a gap at node2
UPDATE tree
SET lft = CASE
WHEN lft > $node2.rgt THEN lft + $spread_diff)
ELSE lft
END,
rgt = CASE
WHEN rgt >= $target_rgt THEN rgt + $spread_diff
ELSE rgt
END
WHERE tree = $node2.tree
Lucerne, 11 December 2009
Page 38/38
if ($node1.lft > $node2.rgt) {
$where_offset = $spread_diff;
$move_diff = $node2.rgt - $node1.lft - $spread_diff;
} else {
$where_offset = 0;
$move_diff = $node2.rgt - $node1.lft;
}
$depth_diff = $target_depth - $source_depth + 1;
// Move the node1 subtree to node2
UPDATE tree
SET parent = CASE
WHEN parent = $node1.parent THEN $node2
ELSE parent
END,
rgt = rgt + $move_diff,
lft = lft + $move_diff,
depth = depth + $depth_diff,
tree = $node2.tree
WHERE lft >= $node1.lft + $where_offset
AND rgt <= $node1.rgt + $where_offset
AND tree = $node1.tree;
// close the gap which we created at node1
UPDATE tree
SET lft = CASE
WHEN lft >= $node1.lft + $where_offset THEN lft - $spread_diff
ELSE lft
END,
rgt = CASE
WHEN rgt >= $node1.rgt + $where_offset THEN rgt - $spread_diff
ELSE rgt
END
WHERE tree = $node1.tree;
UNLOCK TABLES;
First we lock table tree.
Then we retrieve the data of node1 and 2.
Next we create a gap in node 2.
We can now move the subtree of node 1 into node 2.
We close the gap that we created in the parent of node 1.
Finally we unlock the table.
Analysis
This algorithm is very inefficient, because each of the three update statements in this algorithm
performs a full table space scan.
8.
Bibliography
[1]
Celko, J. (1999). SQL for Smarties: Advanced SQL Programming Second Edition. The
Morgan Kaufmann Series in Data Management Systems.
[2]
Zawodny, J., Balling, D. (2004). High Performance MySQL. O'Reilly Media.