atom feed9 messages in org.codehaus.jaxen.userRe: [jaxen-user] Java XPath Engine Co...
FromSent OnAttachments
ryan...@bloglines.comNov 15, 2004 7:22 am 
peter royalNov 15, 2004 10:17 am 
BazNov 15, 2004 11:08 am 
ryan...@bloglines.comNov 15, 2004 1:27 pm 
Christian NentwichNov 15, 2004 2:08 pm 
BazNov 16, 2004 2:46 am 
peter royalNov 16, 2004 6:17 am 
ryan...@bloglines.comNov 16, 2004 7:23 am 
BazNov 16, 2004 9:08 am 
Subject:Re: [jaxen-user] Java XPath Engine Comparison
From:Baz (bria@gmail.com)
Date:Nov 16, 2004 2:46:32 am
List:org.codehaus.jaxen.user

on (1) I've had a look at the OASIS tests. I think the jaxen tests are actually closer to being something that would work cross-platform. It ought to be possible to generate tests identical to the applicable oasis ones from jaxen-like ones, but I don't think its entirely automatable the other way around?

Yes, I've read a bunch of those papers. It actually turns out only a handful are directly applicable, because of the jaxen usage model: ie we support multiple object models via navigators, and the api doesn't allow for pre-indexing a document.

eg a problem with optimising with navigators is knowing whether you can cache partial results. Tto cache you need to know if you're at the same node: the DOM doesn't support equals() so Jaxen uses == for node tests, and theres bugs in jira from people who wanted support for OM's that don't support that either! Although, I reckon this problem could and should be solved by adding an 'equals(Node blah)' to the navigators.

The pre-indexing thing comes up a lot in the papers (I notice there's a bunch of VLDB papers there, that tends to be their focus). Its important for in-database trees, but is it a win for smaller in-memory documents? I reckon in-database trees are pretty much out of scope for jaxen, especially once vendors start to build xpath2/xquery support into the DB.

There's a lot of good stuff there though. However jaxen's not even trivially optimised right now: it pulls back all nodes when in a boolean context, variables & functions are bound late rather than early, and so on. Here's how I think the road ahead could realistically look (but be warned I have no say in this!)

- 1.1 release. (Now) Bugfixes. Navigator.parseXPath() deprecated. - 1.2 release. Lazy list optimisation, no API changes required. - 2.0 release. Break apis to allow for better performing usage: JAXP 1.3 compliance. Navigator.equals(node) added. Navigator.parseXPath() removed. - 2.1 release. Rewrite of the xpath engine. The paper I'd thought fitted jaxen best was this one: http://www.dbai.tuwien.ac.at/staff/koch/download/icde2003.pdf

On 15 Nov 2004 21:27:22 -0000, ryan@bloglines.com <ryan@bloglines.com> wrote:

+ Current binary releases for all libraries were used.

+ DOM was used since it was supported across all libraries.

Questions for you guys:

1) What are your thoughts on a cross engine compliance and performance harness? Oasis has something like this but it's tied to XSLT. Each library has some sort of test suite. It could even be crafted in such a way that the tests were cross platform ( C#, Ruby, ... ) and the java bits were stand alone.

2) There seems to be a wealth of academic work on optimzations for xpath. Has this body of work been given consideration as it relates to Jaxen?

http://db.cs.utwente.nl/Publications/PaperStore/db-utwente-0000003587.pdf

http://db.uwaterloo.ca/~david/cs848/gottlob-koch-pichler-sigmodrecord.pdf

http://idealliance.org/papers/dx_xmle04/papers/02-03-02/02-03-02.pdf

http://pi3.informatik.uni-mannheim.de/downloads/hauptstudium/seminare/folien/seminar_matthias_071103.pdf

http://wam.inrialpes.fr/publications/2004/GenevesRose.pdf

http://wam.inrialpes.fr/publications/2004/tphols2004.pdf

http://wam.inrialpes.fr/publications/2004/VionDuryGenevesDocEng2004.pdf

http://www.adrem.ua.ac.be/~hidders/pubs/dbpl2003-avoidsort.pdf

http://www.adrem.ua.ac.be/biborb/bibs/ADReM/papers/DBPL-2003-cr.pdf

http://www.adrem.ua.ac.be/biborb/bibs/ADReM/papers/fernandez03techrep.pdf

http://www.adrem.ua.ac.be/biborb/bibs/ADReM/papers/plan-x2004.pdf

http://www.comp.nus.edu.sg/~chancy/vldbj02.pdf

http://www.cs.ust.hk/vldb2002/VLDB2002-proceedings/papers/S04P02.pdf

http://www.cs.washington.edu/homes/suciu/paper-sigmod2003.pdf

http://www.cs.washington.edu/homes/suciu/paper-xviz.pdf

http://www.csd.uch.gr/~hy561/Papers/XPath-Natix-wise02.pdf

http://www.dbai.tuwien.ac.at/staff/koch/download/icde2003.pdf

http://www.inf.uni-konstanz.de/~grust/files/xpath-accel.pdf

http://www.lfcs.inf.ed.ac.uk/research/database/publications/vldb04_taming.pdf

http://www.pms.ifi.lmu.de/publikationen/PMS-FB/PMS-FB-2001-16.pdf

http://www.pms.ifi.lmu.de/publikationen/PMS-FB/PMS-FB-2002-4.pdf

http://xmltk.sourceforge.net/cikm03.pdf

http://zmo.cwru.edu/433/BLAS.pdf

http://zmo.cwru.edu/433/grust.pdf

--- us@jaxen.codehaus.org wrote:

well that patch (jaxen 31) should be "considered harmful". Once I dug

into it I found there were nasty problems with what I had done,

whenever

you really need all the elements. There's a bunch of stuff

that held me

back from doing something better...

- I experimented with trying to use

jaxen under a tiger-a-like api[1];

the saxpath and navigator bits are

fine, but I couldn't square the

rest of it up without breaking things.

(BTW did I mention that

Navigator.getXPath() is evil? it prevents reuse

of navigators with a

different xpath api...)

- there's better xpath

algorithms that could be used instead of just

making jaxen fail-fast; but

these would pretty much require api

changes.

- the automated builds

for the release never got going, so I lost a

bit of motivation...

-

and of course theres lotsa non-jaxen stuff to do...

... however when

I last looked, I think the only way to get my patch

to work would be to

go back to jaxen as it is now and replace all the

Lists with implementations

that are lazy. Which seemed like a pile of

work.

I'm curious about

a couple of the results:

substring('12345', -42, 1 div 0) (jaxen fails)

Is this the release version of jaxen? I think the one in cvs does this

correctly (I seem to remember fixing it). Ho hum, I'll have to check.

Also, there's no mention of what object model was used with jaxen? I

notice there are preceding:: tests in there which I think jaxen only

does right on the DOM navigator?

-Baz

[1] I'm not sure how much

I like that api - you can't register

functions in the default namespace,

and its missing ones like

current(). But it's a standard - a blessing and

a curse.

On Mon, 15 Nov 2004 13:17:24 -0500, peter royal <pete@pobox.com>

wrote:

On Nov 15, 2004, at 10:22 AM, ryan@bloglines.com wrote:

I've recently posted some comparison findings between available Java

based

XPath engines. Jaxen is included.

38913ACE763A57A646F07BEF0C13CE52.txt

Thanks for doing the

comparison Ryan.. Incentive for us(me) to get off

my lazy ass and apply

Brian's patch!

-pete