| From | Sent On | Attachments |
|---|---|---|
| ryan...@bloglines.com | Nov 15, 2004 7:22 am | |
| peter royal | Nov 15, 2004 10:17 am | |
| Baz | Nov 15, 2004 11:08 am | |
| ryan...@bloglines.com | Nov 15, 2004 1:27 pm | |
| Christian Nentwich | Nov 15, 2004 2:08 pm | |
| Baz | Nov 16, 2004 2:46 am | |
| peter royal | Nov 16, 2004 6:17 am | |
| ryan...@bloglines.com | Nov 16, 2004 7:23 am | |
| Baz | Nov 16, 2004 9:08 am |
| Subject: | Re: [jaxen-user] Java XPath Engine Comparison | |
|---|---|---|
| From: | Baz (bria...@gmail.com) | |
| Date: | Nov 16, 2004 2:46:32 am | |
| List: | org.codehaus.jaxen.user | |
on (1) I've had a look at the OASIS tests. I think the jaxen tests are actually closer to being something that would work cross-platform. It ought to be possible to generate tests identical to the applicable oasis ones from jaxen-like ones, but I don't think its entirely automatable the other way around?
Yes, I've read a bunch of those papers. It actually turns out only a handful are directly applicable, because of the jaxen usage model: ie we support multiple object models via navigators, and the api doesn't allow for pre-indexing a document.
eg a problem with optimising with navigators is knowing whether you can cache partial results. Tto cache you need to know if you're at the same node: the DOM doesn't support equals() so Jaxen uses == for node tests, and theres bugs in jira from people who wanted support for OM's that don't support that either! Although, I reckon this problem could and should be solved by adding an 'equals(Node blah)' to the navigators.
The pre-indexing thing comes up a lot in the papers (I notice there's a bunch of VLDB papers there, that tends to be their focus). Its important for in-database trees, but is it a win for smaller in-memory documents? I reckon in-database trees are pretty much out of scope for jaxen, especially once vendors start to build xpath2/xquery support into the DB.
There's a lot of good stuff there though. However jaxen's not even trivially optimised right now: it pulls back all nodes when in a boolean context, variables & functions are bound late rather than early, and so on. Here's how I think the road ahead could realistically look (but be warned I have no say in this!)
- 1.1 release. (Now) Bugfixes. Navigator.parseXPath() deprecated. - 1.2 release. Lazy list optimisation, no API changes required. - 2.0 release. Break apis to allow for better performing usage: JAXP 1.3 compliance. Navigator.equals(node) added. Navigator.parseXPath() removed. - 2.1 release. Rewrite of the xpath engine. The paper I'd thought fitted jaxen best was this one: http://www.dbai.tuwien.ac.at/staff/koch/download/icde2003.pdf
On 15 Nov 2004 21:27:22 -0000, ryan...@bloglines.com <ryan...@bloglines.com> wrote:
+ Current binary releases for all libraries were used.
+ DOM was used since it was supported across all libraries.
Questions for you guys:
1) What are your thoughts on a cross engine compliance and performance harness? Oasis has something like this but it's tied to XSLT. Each library has some sort of test suite. It could even be crafted in such a way that the tests were cross platform ( C#, Ruby, ... ) and the java bits were stand alone.
2) There seems to be a wealth of academic work on optimzations for xpath. Has this body of work been given consideration as it relates to Jaxen?
http://db.cs.utwente.nl/Publications/PaperStore/db-utwente-0000003587.pdf
http://db.uwaterloo.ca/~david/cs848/gottlob-koch-pichler-sigmodrecord.pdf
http://idealliance.org/papers/dx_xmle04/papers/02-03-02/02-03-02.pdf
http://wam.inrialpes.fr/publications/2004/GenevesRose.pdf
http://wam.inrialpes.fr/publications/2004/tphols2004.pdf
http://wam.inrialpes.fr/publications/2004/VionDuryGenevesDocEng2004.pdf
http://www.adrem.ua.ac.be/~hidders/pubs/dbpl2003-avoidsort.pdf
http://www.adrem.ua.ac.be/biborb/bibs/ADReM/papers/DBPL-2003-cr.pdf
http://www.adrem.ua.ac.be/biborb/bibs/ADReM/papers/fernandez03techrep.pdf
http://www.adrem.ua.ac.be/biborb/bibs/ADReM/papers/plan-x2004.pdf
http://www.comp.nus.edu.sg/~chancy/vldbj02.pdf
http://www.cs.ust.hk/vldb2002/VLDB2002-proceedings/papers/S04P02.pdf
http://www.cs.washington.edu/homes/suciu/paper-sigmod2003.pdf
http://www.cs.washington.edu/homes/suciu/paper-xviz.pdf
http://www.csd.uch.gr/~hy561/Papers/XPath-Natix-wise02.pdf
http://www.dbai.tuwien.ac.at/staff/koch/download/icde2003.pdf
http://www.inf.uni-konstanz.de/~grust/files/xpath-accel.pdf
http://www.lfcs.inf.ed.ac.uk/research/database/publications/vldb04_taming.pdf
http://www.pms.ifi.lmu.de/publikationen/PMS-FB/PMS-FB-2001-16.pdf
http://www.pms.ifi.lmu.de/publikationen/PMS-FB/PMS-FB-2002-4.pdf
http://xmltk.sourceforge.net/cikm03.pdf
http://zmo.cwru.edu/433/BLAS.pdf
http://zmo.cwru.edu/433/grust.pdf
--- us...@jaxen.codehaus.org wrote:
well that patch (jaxen 31) should be "considered harmful". Once I dug
into it I found there were nasty problems with what I had done,
whenever
you really need all the elements. There's a bunch of stuff
that held me
back from doing something better...
- I experimented with trying to use
jaxen under a tiger-a-like api[1];
the saxpath and navigator bits are
fine, but I couldn't square the
rest of it up without breaking things.
(BTW did I mention that
Navigator.getXPath() is evil? it prevents reuse
of navigators with a
different xpath api...)
- there's better xpath
algorithms that could be used instead of just
making jaxen fail-fast; but
these would pretty much require api
changes.
- the automated builds
for the release never got going, so I lost a
bit of motivation...
-
and of course theres lotsa non-jaxen stuff to do...
... however when
I last looked, I think the only way to get my patch
to work would be to
go back to jaxen as it is now and replace all the
Lists with implementations
that are lazy. Which seemed like a pile of
work.
I'm curious about
a couple of the results:
substring('12345', -42, 1 div 0) (jaxen fails)
Is this the release version of jaxen? I think the one in cvs does this
correctly (I seem to remember fixing it). Ho hum, I'll have to check.
Also, there's no mention of what object model was used with jaxen? I
notice there are preceding:: tests in there which I think jaxen only
does right on the DOM navigator?
-Baz
[1] I'm not sure how much
I like that api - you can't register
functions in the default namespace,
and its missing ones like
current(). But it's a standard - a blessing and
a curse.
On Mon, 15 Nov 2004 13:17:24 -0500, peter royal <pete...@pobox.com>
wrote:
On Nov 15, 2004, at 10:22 AM, ryan...@bloglines.com wrote:
I've recently posted some comparison findings between available Java
based
XPath engines. Jaxen is included.
38913ACE763A57A646F07BEF0C13CE52.txt
Thanks for doing the
comparison Ryan.. Incentive for us(me) to get off
my lazy ass and apply
Brian's patch!
-pete





