XPath: not(=) <> !=

I’ve been doing some XPath in the last couple of days (with Jaxen) and it’s a powerful tool for extracting information from XHTML. On the down side, the code you end up writing is vulnerable to slight variations in the source documents. This aside, I fell into a couple of traps that wasted my time; I hope someone benefits from my mistakes.

One thing that got me is that x != 'y' is not equivalent to not(x = 'y'). I pored over my XPath statement wondering why I wasn’t getting any results when I remembered that XPath operates on sets, not on individual elements - a realisation that made me feel somewhat foolish. x != 'y' means “there exists at least one node in x that does not equal 'y'” whereas not(x = 'y') means “there are no nodes in x that exactly equal 'y'“. A subtle distinction, and enough to give you wildly different results. This is relevant to XSLT XPath too.

For example:

If you were to use the expression ‘./td[child::text() != 'Author']‘ would select both td nodes. However, ‘./td[not(child::text() = 'Author')]‘ would select only the second td. This is because there is a text node containing whitespace immediately prior to the div element (this is dependent on your parser settings - whitespace may not be preserved).

The other thing that came as something of a surprise was when I used an expression to operate on a sub-tree of my document. The sub-tree is still a part of the overall document, so the expressions used need to be based from the current node (”.“) not from the root.

I don’t claim that any of this is rocket science, but hopefully someone finds it useful.

No Responses to “XPath: not(=) <> !=”

  1. Brian McKendrick Says:

    I would recomend using the XPath engine included in the latest JAXP stack - also part of the Java 1.5 standard libs. I just got done ripping out Jaxen from a bunch of applications for our organization. The major argument is that it’s best to go with standard libs unless there is a real issues with the impl (like GregorianCalendar, it’s a POS) or there is something the 3rd-party lib can offer that isn’t availible in the standard dist. Couple that with the fact the Jaxen has been beta for close to ?4? years now, and I decided it was time to exorcise that dependancy. The Jaxen API is a little less cumbersome than the XPath support in JaxP, but it’s just a minor annoyance.

    I decided to wrap the XPath functionality in a very simple wrapper class so if I ever need to switch out XPath engines again, I wouldn’t have to dig through every app’s code again.

    A little off-topic, but I thought I would share what little wisdom I have …