atom feed4 messages in org.jdom.jdom-interest[jdom-interest] ElementScanner and Me...
FromSent OnAttachments
Brian NahasNov 13, 2006 8:10 am 
Laurent BihanicNov 14, 2006 4:53 pm 
Brian NahasNov 15, 2006 8:48 am 
Laurent BihanicNov 15, 2006 2:59 pm 
Subject:[jdom-interest] ElementScanner and Memory
From:Brian Nahas (bria@yahoo.com)
Date:Nov 13, 2006 8:10:41 am
List:org.jdom.jdom-interest

I have a 1.2 GB xml file I need to parse. Since it's nicely partitioned, I planned on using ElementScanner from the contrib package to only load one item at a time. Here's an equivalent schema:

<data> <item>...</item> <item>...</item> <item>...</item> ... </data>

The path for I'm using for my listener is "/data/item".

I assumed any previous items would be released by the parser upon completion. ElementScanner was very simple to set up to handle this, however I ran into an OutOfMemory error on my first try. I was a little confused as I thought ElementScanner was specifically designed to prevent this. Upon investigation, I found that the SAXHandler used by the ElementScanner was holding onto the previous items after I was done with them. It adds them to the default root element that FragmentHandler creates and nothing removes them after the listeners are called. This seems to be in direct conflict with this message I found which states that ElementScanner doesn't build a document (this message is fairly old though):


http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607&listName=jdom-interest

I worked around this by explicitly detaching the element in my listener when I was done with it, but since it seems like this would be a common pattern and subtle trap, so I thought I'd ask and see if I was missing some setting or improperly using ElementScanner. There's a namespace declared on the data element so I don't know if that has something to do with it.

Thanks, -Brian