Content with Style

Web Technique

Find your node: Advanced XPATH commands

by Pascal Opitz on May 21 2005, 11:49

XSLT and XPATH?

All that XSLT does is applying code-templates on XML-nodes. In order to do this you need to find the right node. XPATH offers you an advanced toolkit to do that within an XSL-file.

Go dynamic

XPATH and XSLT offer you dynamic features that remind more of a scripting-language rather than a stylesheet language, but keep in mind that all you can do is transforming the data you've got, so on a more abstract level it is still just styling up the content, just that this time you can choose what to display triggered with variables and control-structures.

Locations in XPATH

Let's have a look into a simple, concatinated XML structure:

<root>
  <node>
    <subnode />
    <subnode />
    <subnode />
  </node>
  <node>
    <subnode>
      <subsubnode />
    </subnode>
  </node>
</root>

In order to navigate between the nodes you'll need to use Location Path Expressions. The basic ones work pretty much the same like in an operating-system enviroment. /node/subnode for example would be the absolute path to all subnodes in the example above.
On top of that XPATH offers you differnet types of axes like ancestor, parent or child. Even attributes can be selected with attribute or you can use * as a wildcard.

A full reference can be found at at w3schools.

Node-tests

On top of these axes you can test nodes against expressions that most of you might know as operators when using scripting- or programming-languages.
There are expressions like +, - but also relational ones like =, > and !=. Keep in mind that within valid XML > and < need to be escaped as &gt; and &lt;.

Functions in XPATH

For those that already used basic XSLT functionallity the functions won't be new, either. The function current() for example gives back the node that you are in now, maybe while within a loop.
But there is also string-functions like contains() or substring(). Those can be used to manipulate the data that you will use for outputting data or locating nodes.

Again, find a full list of XPATH functions at w3schools.

Combine them!

Now you can combine the location path, the axes and the node tests to get the node you really want. The syntax is axisname::nodetest[predicate].
I'll fit everything in one expression and you'll immediatly figure what I mean:

current()/child::*[attribute::type='classic']

This expression selects all subnodes of the child-elements within the current node, that have the attribute type with the value 'classic'.

An example please!

I know, that was pretty abstract, but now we'll move straight on to the practical example. We will use dynamic XPATH expressions to display a node-set, remember that when using server side techniques or client-side scripting this also could be triggered by a variable.
Let's have a look at the piece of XML that will be transformed.

<?xml version="1.0" ?>
<company_list>
  <company country="uk">
    <name>Company 1</name>
    <sales>3200900</sales>
    <employees>250</employees>
  </company>
  <company country="usa">
    <name>4th capitalist</name>
    <sales>102310000</sales>
    <employees>3050</employees>
  </company>

....

  <company country="uk">
    <name>UK stores</name>
    <sales>12300000</sales>
    <employees>3301</employees>
  </company>
  <company country="uk">
    <name>THEUSTOYSTORES</name>
    <sales>22200000</sales>
    <employees>18639</employees>
  </company>
</company_list>

What we're going to do now is to find out the top 3 in the UK regarding the ration of employees and sales.
Let's have a look at the XSL file:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/company_list">
  <html>
    <body style="font-family: Arial">
      <h1>Top 3 stores in the uk regarding employees per sales</h1>
      <ul>
        <xsl:for-each select="company[@country = 'uk']">
        <xsl:sort select="sales div employees"/>
          <xsl:if test="position() < 4">
            <li>
              Company: <xsl:value-of select="name" />
              <br />
              Employees: <xsl:value-of select="employees" />
              <br />
              Sales: <xsl:value-of select="sales" />
            </li>
          </xsl:if>
        </xsl:for-each>
      </ul>
    </body>
  </html>
  </xsl:template>
</xsl:stylesheet>

Download the files here:
company_list.xml
uk_top3.xsl

Conclusion

That wasn't bad, but now imagine what you could do when you have dynamic variables. Especially with the MSXML toolkit or Sarissa that have the method selectSingleNode() these expressions are a piece of cake.

Comments

  • A little while ago I was asked to look at some XSLT for someone who had done some changes to a template they didn’t originally create, but who now couldn’t figure out what the hell was going on. You know, the usual stuff.

    by Rakshi on September 22 2006, 04:15 #

  • Hey there.
    Try to use count() and value():

    
    //product/categories[count(cat[text() = '2']) &gt; 0 and count(cat[text() = '3']) > 0]/../@name
    


    You can create that in a loop as well :)

    by Pascal Opitz on January 31 2007, 07:38 #

  • I’m having an xpath issue trying to work out how to select nodes where two child nodes are present.

    My XML format is along these lines:

    <categories>
       <cat id=”1” name=”Type A” />
       <cat id=”2” name=”Type B” />
       <cat id=”3” name=”Type 1” />
       <cat id=”4” name=”Type 2” />
       <cat id=”5” name=”Type 3” />
    </categories>

    <products>
       <product id=”1” name=”Product 1”>
         <categories>
           <cat>1</cat>
           <cat>3</cat>
         </categories>
      </product>
       <product id=”2” name=”Product 2”>
         <categories>
           <cat>2</cat>
           <cat>3</cat>
           <cat>4</cat>
         </categories>
       </product>
    </products>

    Because of the nature of the products they may belong to one or more category, and I need to be able to filter by one or more category, so I’m trying to do a query along the lines of:

    “product/categories[cat=3 and cat=2]/../@name”

    constructing the cat=X part on the fly

    I did consider comma seperating my cat ids in the product and using contains(), but then I’d get all the 22 containing 2 type problems

    Any pointers very gratefully received

    by tim harwood on January 31 2007, 07:06 #