Database-driven tree structures with XML and XSLT

by Pascal Opitz on June 13 2005, 20:56

This article deals with the display of tree-structures that are driven by a database. There are actually a few approaches to transform a 2-dimensional structure into a tree, and it seems odd that most are unknown to many developers.

The most obvious approach is using the parent-ID as a back-reference for recursion like in this example. But then what happens if the tree-structure gets a bit bigger? How about 5 childnodes and a depth of 5 levels? Well, suddenly you end up with 5² database requests and your application becomes incredibly slow... That's why we're just about to bin the idea of using the parent-ID!

Database structure

To do this we have to get familiar with a thing called preordered tree traversal. If you've not come across it before, take a moment to read Storing Hierarchical Data in a Database. Essentially, the idea is to store left and right values for each node and then to figure out whether there is an increase or decrease in the depth of the tree from the difference in the left value between consecutive elements.


+--------------------+-----+-----+
| name               | lft | rgt |
+--------------------+-----+-----+
| page 1             |   1 |  10 |
| page 1.1           |   2 |   3 |
| page 1.2           |   4 |   9 |
| page 1.2.1         |   5 |   6 |
| page 1.2.2         |   7 |   8 |
+--------------------+-----+-----+

This is far more efficient than recursively requesting on the ID of the parent element because now we just need two SELECTS:

The first to get the left and right values of the node that is the root node of the tree to describe.
The second one to get all dependencies.

Keep in mind that LEFT and RIGHT are SQL keywords, so label your database fields accordingly. I use 'lft' and 'rgt'.

XML tree structure

Now we are about to transform the 2-dimensional structured data in the database into an XML format. The beauty of XML for the purpose of displaying trees is the ability to have nested structures. Also read my previous article about XML as intermediate application layer to find out what else XML has to offer.

Our structure is going to look something like this:


<page id="1" depth="1">
  <name><![CDATA[page 1]]></name>

  <page id="2" depth="2">
    <name><![CDATA[page 1.1]]></name>
  </page>
  <page id="3" depth="2">
    <name><![CDATA[page 1.2]]></name>

    <page id="4" depth="3">
      <name><![CDATA[page 1.2.1]]></name>
    </page>
    <page id="5" depth="3">
      <name><![CDATA[page 1.2.2]]></name>
    </page>

  </page>
  <page id="6" depth="2">
    <name><![CDATA[page 1.3]]></name>
  </page>
</page>

In case you are wondering why I am not writing the page name into an attribute: An attribute cannot contain CDATA or entities. Since I want to keep the structure as flexible as possible I keep the name in a CDATA field that lets us store unencoded markup.

Creating the XML with PHP

In order to create the XML structure we will walk through the result set and trigger the opening and closing tags by comparing the left and right values.

Bear in mind that this function is supposed to be a method of a class that already has the basic toolkits for database handling available. Again, read my previous article to find out more.


function getPageTreeXML($ID = false)
{
  // get left and right values of root-node
  $q = ($ID) ? 'SELECT lft, rgt FROM pages WHERE ID = '
      . $ID : 'SELECT * FROM pages ORDER BY lft ASC LIMIT 1';

  $res = $this->DB->query($q);

  $depth = 0;
  $ol = mysql_result($res, 0, 'lft');
  $or = mysql_result($res, 0, 'rgt');

  // get tree branch
  $q = '
    SELECT * FROM pages WHERE
      lft >= ' . $ol . '
    AND
      rgt <= ' . $or . '
    ORDER BY lft ASC';

  $res = $this->DB->query($q);

  // open the pagetree tag
  $xmlStr = '';
  $xmlStr .= '<pagetree>';

  while($arr = mysql_fetch_array($res))
  {
    // store old depth
    $old_depth = $depth;

    // trigger new depth values
    if($ol == ($arr['lft'] - 1))
      $depth++;
    if($or < ($arr['rgt'] - 2))
      $depth -= (($arr['lft'] - $or) - 1);

    // if depth doesn't increase close page tag
    if($old_depth != 0 && $old_depth >= $depth)
      $xmlStr .= str_repeat('</page>', ($old_depth - $depth) + 1);

    // write xml data
    $xmlStr .= '<page id="' . $arr['ID'] . '" depth="' .
            $depth . '">' . '<name><![CDATA[' .
            $this->mysqlDecodeText($arr['name']) . ']]></name>';

    // store the left and right values for next iteration
    $ol = $arr["lft"];
    $or = $arr["rgt"];
  }

  // close all open tags
  if($depth > 0)
    $xmlStr .= str_repeat('</page>', $depth + 1);

  // close pagetree tag
  $xmlStr .= '</pagetree>';
  return $xmlStr;
}

Transforming the XML with XSLT

Now that the XML is created we can easily create any kind of HTML out of it by using XSLT. The following example is a quickly mocked up nested structure to create an indented display of the page-names.

You can create nested XHTML elements easily by using a recursive structure in XSLT. In my example, you can see that the “page” template is calling itself in the for-each loop.


<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/pagetree">
    <html>
      <head>
        <title>pagetree</title>
        <style type="text/css">
          .tree div { padding: 10px 0 10px 10px;}
        </style>
      </head>
      <body>
        <div class="tree">
          <xsl:apply-templates />
        </div>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="page" name="page">
    <div>
      <xsl:value-of select="name" />
      <xsl:for-each select="page">
        <xsl:call-template name="page" />
      </xsl:for-each>
    </div>
  </xsl:template>
</xsl:stylesheet>

Conclusion

This is a poster child application for seperating your functionality into an XML-rendering layer. I find this method a handy way to generate menus for the front-end and the back-end of an application using the same XML-rendering for both sides. Also, by using XSLT, we have plenty of styling possibilities.

But the best advantage for me in this 3-step process is the first one - the pre-ordered tree traversal. It speeds up your application dramatically and makes it possible to render even large trees with maximum pace.

Comments

Great article, Pascal. I have read somewhere that you might run into trouble if you modify the tree very often, as you have to lock the whole tree when updating it.
In the parent-child approach you only add, and you only read out the parent-id.
But you could on the other hand prevent this by forming some sort of message queue in a background process, or simply repeat the command until it’s successful. One might be quite some work, the other one seems a bit of a dirty solution (although, how do YOU make sure your data is actually written?), but all I want to say is that there’s always a way to use the desired approach, as much as there’s always a way to misuse it.

by Matthias on June 13 2005, 21:08 #
If thing get really bad my fix would probably be to store the parentID as well but still do the tree-thing by left and right …

If things are corrupted you could rebuild the left and right values by using parentID, in a cron script for example …

by Pascal Opitz on June 14 2005, 10:56 #
I just ran into an interesting thought about this:

Let’s say I use this for a navigation, how do I get ONLY the direct children for my page, like a 1st level subnav?
Right now I’m thinking I should combine the parent_id approach with the preordered tree traversal, but I didn’t see this question as a drawback anywhere, so I wonder if I missed that somehow…

by Matthias on August 8 2005, 13:43 #
I’d say just do that in the XSL, with an xpath command. For exapmple the depth value of the node that I’ve put into an attribute helps to do so, but there’s other ways.

Like that you can, with the XML beeing a whole tree, reuse it for all kind of navigations, you need on your page.

Take a close look in Find your node: Advanced XPATH commands again and you’ll see what I mean.

by Pascal Opitz on August 9 2005, 10:56 #
The primary argument against using a parent-id relationship to define the hierarchy seems to be the number of database requests required. However, it is possible to write SQL, with either recursion or loops, to return the structure in one command. You could also pass in a starting node id to only return child nodes.

I’m not saying preorder algorithms are bad but they’re not amazingly better as this article suggests.

by Ben on November 29 2005, 11:05 #
Ben,
While this is definetely true for big RDBMS, and possibly even for MySQL 4.1 upwards, they are sadly not always available. Without Subselects, I wonder if you could give an example sql for the problem stated above.
Actually, I’d be happy to see any recursive SQL in here, I don’t think that everybody reading this article is aware of this possibility.

by Matthias on January 12 2006, 04:05 #
We use the same tree skeleton as a model but added “SortLevel” and “ChildCount” to each row. At insertion/update time we calculate these, which is cheaper than having to do them on-the-fly (more SQL requests). This seems to offer better performance for us, but your mileage may very.

SortLevel = depth of node, basically
ChildCount = self-explanatory

by cody caughlan on January 13 2006, 12:13 #
@cody:
This is a very good idea indeed. Even though when you do the traversal at least for the depth level it’s kind of obsolete, since depth can be triggered with just a counter.

@Ben:
I’d be curious to see an example of this as well. In fact I wonder how the performance of a recursive statement would be like. Subselects for example will make the query execution slower, and I guess the same goes for any recursive statement.
Has anyone got any benchmark or something on hand?

by Pascal Opitz on January 18 2006, 05:40 #
Geza, I don’t see where the problem is?
If you order by alphabet, you loose the structure of the tree anyway, right? So therefore you don’t need to sort by lft anymore I guess …

The update is by no means a recursion, instead it is just a couple of simple SQL statements, that should even be fast with big amounts of data, pretty similar to the one below:

UPDATE table SET lft = (lft + offset), rgt = (rgt + offset) WHERE lft > startpoint AND rgt < endpoint;

Hope that makes sense to you?

by Pascal Opitz on March 8 2006, 08:43 #
There is also a couple of other problems with left and right value approach (Modified Preorder Tree Traversal).

How do you order the results alphabetically if you used order by lft?

How do you move nodes with children to an other node without recreating all the left and right numbers which could be exhausting with 100 000 entries or more? (recursive function)

Geza

by Geza on March 8 2006, 08:29 #
HI Pascal,
I tried getting my 2000 member list to show up in the above example. I was looking in your previous article that uses adodb but can’t find anything to get your function getPageTreeXML to work. Or did I miss out anything. Appreciate if you could tell me which module to use for the above function. Thanks.

by Sebbie on July 12 2006, 06:39 #

Sebbie, in fact the above example doesn’t use ADODB. It uses some kind of not specified DB class. The article you’ve been lookin into doesn’t “use” ADODB either. Instead it mentions ADODB as possibility of abstraction.

For the example above I used one of my own classes:


<
/**
* general database connection class
*/
class DB {
  /**
  * Basic contructor
  */
  function DB() {
       $this->host = "myhost";
       $this->db = "mydb";
       $this->user = "myuser";
       $this->pass = 'mypassword';
  }

  /**
  * Opens connection
  */
  function open() {
       $this->connection = mysql_connect($this->host, $this->user, $this->pass);
       $this->dbselect = mysql_select_db($this->db);
       register_shutdown_function(array(&$this, 'close'));
   }

  /**
  * Executes query, returns resultset
  *
  * @param  string  String containing the SQL-Statement
  * @access  public
  * @return  ressource
  */
   function query($query) {
     $result = mysql_query($query, $this->connection) or die(mysql_error());
       return $result;
   }

  /**
  * closes connection
  */
   function close() {
       mysql_close($this->connection);
   }
}
?>

by Pascal Opitz on July 12 2006, 18:04 #

Hi Pascal, what does mysqlDecodeText do ? Could you email me what it is ? Even better, if you have a complete working example in a zip file that would be nice. Thanks.

Sebbie

by Sebbie on August 14 2006, 05:42 #
It’s just a wrapper for undoing whatever you do before you put it in the database … like add slashes and so on.

It’s not critical for the app though.

by Pascal Opitz on August 15 2006, 10:47 #
You should also check up my blog entry about “Genetic Trees” which has another approach called Genetic Trees or “Materialized Path” that makes subselection of childne and grandchildrens etc an O(1) operation!!

by Thomas Hansen on January 18 2007, 12:42 #
(Reply for item #17): Yes, that’s a solution. However, when I make changes in a node (including a new child, for example), I’ll have to search for all trees that has the modified node, and update them. It seems to be expensive, but we must think of what is more likely: MODIFY a node or SELECT a node. If SELECT’s are more frequent, then I think your suggestion will be the best solution.

by Ernani Cecon Jr. on January 25 2007, 11:21 #
Hi Pascal, congratulations for the article. And what if I increase the complexity a little bit? Let’s assume that, in your example, we would like to store different menu trees in the table, one for administrators and another for ordinary users. Both trees should share some nodes (both should see page 1.2.1, for example), and wouldn’t like to have repeated records for those nodes in the table. How could we do this using the preordered tree traversal method?

by Ernani Cecon Jr. on January 24 2007, 11:13 #
Hi Ernani,

Quick shot: How about you store the node-data itself in a separate table and join node-guid against data-guid via a join table?
In addition you add a user-id to the tree table.

This would give you the opportunity to store trees for each user and join them against several data nodes without redundancy.
Does that make sense?

by Pascal Opitz on January 24 2007, 11:46 #

Content with Style?

Content with Style was a London based blog, focused on web technology and related topics. We wrote posts between 2005 and 2011. This is the archived content from that time.

Database-driven tree structures with XML and XSLT

Database structure

XML tree structure

Creating the XML with PHP

Transforming the XML with XSLT

Conclusion

Comments

Content with Style?

Browse by Date

2011

2010

2009

2008

2007

2006

2005