Database-driven tree structures with XML and XSLT
by Pascal Opitz on June 13 2005, 20:56
This article deals with the display of tree-structures that are driven by a database. There are actually a few approaches to transform a 2-dimensional structure into a tree, and it seems odd that most are unknown to many developers.
The most obvious approach is using the parent-ID as a back-reference for recursion like in this example. But then what happens if the tree-structure gets a bit bigger? How about 5 childnodes and a depth of 5 levels? Well, suddenly you end up with 52 database requests and your application becomes incredibly slow... That's why we're just about to bin the idea of using the parent-ID!
Database structure
To do this we have to get familiar with a thing called preordered tree traversal. If you've not come across it before, take a moment to read Storing Hierarchical Data in a Database. Essentially, the idea is to store left and right values for each node and then to figure out whether there is an increase or decrease in the depth of the tree from the difference in the left value between consecutive elements.
+--------------------+-----+-----+ | name | lft | rgt | +--------------------+-----+-----+ | page 1 | 1 | 10 | | page 1.1 | 2 | 3 | | page 1.2 | 4 | 9 | | page 1.2.1 | 5 | 6 | | page 1.2.2 | 7 | 8 | +--------------------+-----+-----+
This is far more efficient than recursively requesting on the ID of the parent element because now we just need two SELECTS:
- The first to get the left and right values of the node that is the root node of the tree to describe.
- The second one to get all dependencies.
Keep in mind that LEFT and RIGHT are SQL keywords, so label your database fields accordingly. I use 'lft' and 'rgt'.
XML tree structure
Now we are about to transform the 2-dimensional structured data in the database into an XML format. The beauty of XML for the purpose of displaying trees is the ability to have nested structures. Also read my previous article about XML as intermediate application layer to find out what else XML has to offer.
Our structure is going to look something like this:
<page id="1" depth="1"> <name><![CDATA[page 1]]></name> <page id="2" depth="2"> <name><![CDATA[page 1.1]]></name> </page> <page id="3" depth="2"> <name><![CDATA[page 1.2]]></name> <page id="4" depth="3"> <name><![CDATA[page 1.2.1]]></name> </page> <page id="5" depth="3"> <name><![CDATA[page 1.2.2]]></name> </page> </page> <page id="6" depth="2"> <name><![CDATA[page 1.3]]></name> </page> </page>
In case you are wondering why I am not writing the page name into an attribute: An attribute cannot contain CDATA or entities. Since I want to keep the structure as flexible as possible I keep the name in a CDATA field that lets us store unencoded markup.
Creating the XML with PHP
In order to create the XML structure we will walk through the result set and trigger the opening and closing tags by comparing the left and right values.
Bear in mind that this function is supposed to be a method of a class that already has the basic toolkits for database handling available. Again, read my previous article to find out more.
function getPageTreeXML($ID = false) { // get left and right values of root-node $q = ($ID) ? 'SELECT lft, rgt FROM pages WHERE ID = ' . $ID : 'SELECT * FROM pages ORDER BY lft ASC LIMIT 1'; $res = $this->DB->query($q); $depth = 0; $ol = mysql_result($res, 0, 'lft'); $or = mysql_result($res, 0, 'rgt'); // get tree branch $q = ' SELECT * FROM pages WHERE lft >= ' . $ol . ' AND rgt <= ' . $or . ' ORDER BY lft ASC'; $res = $this->DB->query($q); // open the pagetree tag $xmlStr = ''; $xmlStr .= '<pagetree>'; while($arr = mysql_fetch_array($res)) { // store old depth $old_depth = $depth; // trigger new depth values if($ol == ($arr['lft'] - 1)) $depth++; if($or < ($arr['rgt'] - 2)) $depth -= (($arr['lft'] - $or) - 1); // if depth doesn't increase close page tag if($old_depth != 0 && $old_depth >= $depth) $xmlStr .= str_repeat('</page>', ($old_depth - $depth) + 1); // write xml data $xmlStr .= '<page id="' . $arr['ID'] . '" depth="' . $depth . '">' . '<name><![CDATA[' . $this->mysqlDecodeText($arr['name']) . ']]></name>'; // store the left and right values for next iteration $ol = $arr["lft"]; $or = $arr["rgt"]; } // close all open tags if($depth > 0) $xmlStr .= str_repeat('</page>', $depth + 1); // close pagetree tag $xmlStr .= '</pagetree>'; return $xmlStr; }
Transforming the XML with XSLT
Now that the XML is created we can easily create any kind of HTML out of it by using XSLT. The following example is a quickly mocked up nested structure to create an indented display of the page-names.
You can create nested XHTML elements easily by using a recursive structure in XSLT. In my example, you can see that the “page” template is calling itself in the for-each loop.
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/pagetree"> <html> <head> <title>pagetree</title> <style type="text/css"> .tree div { padding: 10px 0 10px 10px;} </style> </head> <body> <div class="tree"> <xsl:apply-templates /> </div> </body> </html> </xsl:template> <xsl:template match="page" name="page"> <div> <xsl:value-of select="name" /> <xsl:for-each select="page"> <xsl:call-template name="page" /> </xsl:for-each> </div> </xsl:template> </xsl:stylesheet>
Conclusion
This is a poster child application for seperating your functionality into an XML-rendering layer. I find this method a handy way to generate menus for the front-end and the back-end of an application using the same XML-rendering for both sides. Also, by using XSLT, we have plenty of styling possibilities.
But the best advantage for me in this 3-step process is the first one - the pre-ordered tree traversal. It speeds up your application dramatically and makes it possible to render even large trees with maximum pace.
Comments
In the parent-child approach you only add, and you only read out the parent-id.
But you could on the other hand prevent this by forming some sort of message queue in a background process, or simply repeat the command until it’s successful. One might be quite some work, the other one seems a bit of a dirty solution (although, how do YOU make sure your data is actually written?), but all I want to say is that there’s always a way to use the desired approach, as much as there’s always a way to misuse it.
by Matthias on June 13 2005, 21:08 #
If things are corrupted you could rebuild the left and right values by using parentID, in a cron script for example …
by Pascal Opitz on June 14 2005, 10:56 #
Let’s say I use this for a navigation, how do I get ONLY the direct children for my page, like a 1st level subnav?
Right now I’m thinking I should combine the parent_id approach with the preordered tree traversal, but I didn’t see this question as a drawback anywhere, so I wonder if I missed that somehow…
by Matthias on August 8 2005, 13:43 #
Like that you can, with the XML beeing a whole tree, reuse it for all kind of navigations, you need on your page.
Take a close look in Find your node: Advanced XPATH commands again and you’ll see what I mean.
by Pascal Opitz on August 9 2005, 10:56 #
I’m not saying preorder algorithms are bad but they’re not amazingly better as this article suggests.
by Ben on November 29 2005, 11:05 #
While this is definetely true for big RDBMS, and possibly even for MySQL 4.1 upwards, they are sadly not always available. Without Subselects, I wonder if you could give an example sql for the problem stated above.
Actually, I’d be happy to see any recursive SQL in here, I don’t think that everybody reading this article is aware of this possibility.
by Matthias on January 12 2006, 04:05 #
SortLevel = depth of node, basically
ChildCount = self-explanatory
by cody caughlan on January 13 2006, 12:13 #
This is a very good idea indeed. Even though when you do the traversal at least for the depth level it’s kind of obsolete, since depth can be triggered with just a counter.
@Ben:
I’d be curious to see an example of this as well. In fact I wonder how the performance of a recursive statement would be like. Subselects for example will make the query execution slower, and I guess the same goes for any recursive statement.
Has anyone got any benchmark or something on hand?
by Pascal Opitz on January 18 2006, 05:40 #
If you order by alphabet, you loose the structure of the tree anyway, right? So therefore you don’t need to sort by lft anymore I guess …
The update is by no means a recursion, instead it is just a couple of simple SQL statements, that should even be fast with big amounts of data, pretty similar to the one below:
UPDATE table SET lft = (lft + offset), rgt = (rgt + offset) WHERE lft > startpoint AND rgt < endpoint;
Hope that makes sense to you?
by Pascal Opitz on March 8 2006, 08:43 #
How do you order the results alphabetically if you used order by lft?
How do you move nodes with children to an other node without recreating all the left and right numbers which could be exhausting with 100 000 entries or more? (recursive function)
Geza
by Geza on March 8 2006, 08:29 #
I tried getting my 2000 member list to show up in the above example. I was looking in your previous article that uses adodb but can’t find anything to get your function getPageTreeXML to work. Or did I miss out anything. Appreciate if you could tell me which module to use for the above function. Thanks.
by Sebbie on July 12 2006, 06:39 #
For the example above I used one of my own classes:
by Pascal Opitz on July 12 2006, 18:04 #
Sebbie
by Sebbie on August 14 2006, 05:42 #
It’s not critical for the app though.
by Pascal Opitz on August 15 2006, 10:47 #
by Thomas Hansen on January 18 2007, 12:42 #
by Ernani Cecon Jr. on January 25 2007, 11:21 #
by Ernani Cecon Jr. on January 24 2007, 11:13 #
Quick shot: How about you store the node-data itself in a separate table and join node-guid against data-guid via a join table?
In addition you add a user-id to the tree table.
This would give you the opportunity to store trees for each user and join them against several data nodes without redundancy.
Does that make sense?
by Pascal Opitz on January 24 2007, 11:46 #