atom feed1 message in org.codehaus.woodstox.devNIO, Stax, Woodstox (was Re: Woodstox...
FromSent OnAttachments
Tatu SalorantaAug 28, 2006 1:10 pm 
Subject:NIO, Stax, Woodstox (was Re: Woodstox question)
From:Tatu Saloranta (cowt@yahoo.com)
Date:Aug 28, 2006 1:10:56 pm
List:org.codehaus.woodstox.dev

--- Colin Fleming <coli@coreproc.com> wrote:

Hi,

I've been looking into Woodstox, we're interested in using a StAX based API for our new platform. I have a question I can't find the answer to on the web site, and the link for the mailing lists is broken, it just takes you back to the CodeHaus page.

Hi Colin! Apologies for the broken link -- Codehaus is, while a nice free environment, also a bit unpredictable and chaotic... so with the latest big changes, apparently the way mailing lists are to be handled has completely changed. Anyway, I think I fixed the link, so hopefully you can join the users/dev list(s). I will cc my reply to the dev list.

Now, regarding the question of using NIO with Stax/Woodstox:

We have a component architecture based on NIO, and in some cases we need to parse XML coming in on an input stream. The biggest

NIO has been on my radar for quite a while, and I am hoping to be able to solve the associated problems. NIO is pretty much required to be able to solve some of the scalability problems, for example being able to implement highly parallel, high latency web services.

It is not trivial to match Stax and NIO, since NIO by its nature is non-stream oriented, especially so in most use cases (where it's more important to be able to poll from multiple sources, and not being able to use one thread per connection etc).

The Stax API itself would have some problems with asynchronous nature of NIO-based source, but not too many or serious (as far as I can see). One could create a Source object to represent such input, as long as the impl would recognize it. Additionally, it would probably make sense to have an additional event type to indicate that not enough data is yet available to know the type (only part of characters for a start element has been read, for example; could return XMLStreamConstants.INCOMPLETE or such -- caller would just call XMLStreamReader.next() later on, and perhaps could also register for a callback that new input was gotten and a new event maybe fully parseable).

Unfortunately, what is much more difficult to do is to write a parser that can not only keep track of the current event, but also relative position within any given event. ;-/ It's not impossible to do, but quite involved (unless I am missing some obvious way to do it)... so I have been trying to think of ways to do it, but do not have a concrete plan quite yet.

implication of using NIO is that we don't generally have access to the whole document in a single buffer, we receive chunks of data as they're read from the network using a callback mechanism rather than simply blocking until data is available. This

Yes. This is the main challenge: if you could emulate streams (which could be done by simple sleeps), it'd be quite easy to use any of existing frameworks.

would seem to be a good (and fairly common) use case for SAX or StAX, but I can't seem to find any useful info about it. Ideally I'd like to be able to have a main parsing loop where I add data to the document buffer if it's available, and then poll either SAX or StAX to get the XML data from the new chunk. Can StAX (and Woodstox) cope with this use case?

Right now Woodstox (and Stax) is unfortunately fully based on streaming input model. If you could dedicate a thread for bridging the gap (essentially implementing an InputStream, which uses NIO polling, and sleeps [or uses whatever NIO mechanism is available for signalling] until it gets data), it can be done quite easily, I think... but that might defeat the purpose of using NIO, and/or not work with your framework.

For truly asynchronous operation more work is unfortunately needed. But I would be very interested in discussing it more, and I am pretty sure others would too.

As to SAX: I am not an expert with SAX, but it might be slightly easier to implement NIO sources, mostly because parser is in full control of input access.

-+ Tatu +-

ps. You may also want to join stax_builders mailing list (stax@yahoogroups.com), which is for discussion about Stax api and impls in general