atom feed28 messages in org.apache.ws.synapse-devRe: VFS - Synapse Memory Leak
FromSent OnAttachments
kimhornMar 4, 2009 6:33 pm 
Asankha C. PereraMar 4, 2009 6:41 pm 
kimhornMar 4, 2009 10:21 pm 
Ruwan LintonMar 4, 2009 10:51 pm 
Andreas VeithenMar 4, 2009 11:21 pm 
Andreas VeithenMar 7, 2009 11:42 am 
Ruwan LintonMar 7, 2009 6:47 pm 
Andreas VeithenMar 8, 2009 1:38 am 
Ruwan LintonMar 8, 2009 4:46 am 
kimhornMar 8, 2009 7:05 pm 
Andreas VeithenMar 9, 2009 4:54 am 
Ruwan LintonMar 9, 2009 8:53 am 
Andreas VeithenMar 9, 2009 9:22 am 
Hubert, EricMar 9, 2009 9:26 am 
Kim HornMar 11, 2009 4:05 pm 
Andreas VeithenMar 13, 2009 1:31 pm 
kimhornMar 18, 2009 11:20 pm 
Andreas VeithenMar 18, 2009 11:48 pm 
Kim HornMar 19, 2009 3:33 pm 
Andreas VeithenMar 19, 2009 4:51 pm 
Kim HornMar 19, 2009 5:40 pm 
Andreas VeithenMar 20, 2009 2:05 am 
Kim HornMar 22, 2009 3:37 pm 
Kim HornMar 24, 2009 9:01 pm 
Andreas VeithenMar 26, 2009 3:45 am 
Kim HornMar 31, 2009 7:14 pm.dat
Andreas VeithenApr 1, 2009 12:28 am 
Kim HornApr 1, 2009 3:35 pm 
Subject:Re: VFS - Synapse Memory Leak
From:Andreas Veithen (andr@gmail.com)
Date:Mar 19, 2009 4:51:44 pm
List:org.apache.ws.synapse-dev

If N is the size of the file, the memory consumption caused by the transport is O(N) with transport.vfs.Streaming=false and O(1) with transport.vfs.Streaming=true. The getTextAsStream and writeTextTo methods in org.apache.axis2.format.ElementHelper are there to allow you to implement your mediator with O(1) memory usage, so that the overall memory consumption remains O(1). Does that answer your question?

Andreas

On Thu, Mar 19, 2009 at 23:33, Kim Horn <kim.@icsglobal.net> wrote:

It's the same Synapse.xml as specified originally and same trace. If you are
using Nabble you can see this, in case you lost the prior emails I can post them
again.

I must admit I did not set those extra parameters, you mentioned, but I don't
see why you should set parameter to Stop a memory leak. I guessed these
parameter would just reduce the large amounts of memory it appears to be using,
e.g. 10 times the file size, via streaming ? Why is their 10 copies of the data
floating around ? Lots of buffering. This issue suggests to me that any use of
VFS will eventually kill the Server. Even with smaller files it will eventually
use all available memory. I guess I did not understand the actual reason for
this issue from prior discussion.

I will try your extra parameters today though.

-----Original Message----- From: Andreas Veithen [mailto:andr@gmail.com] Sent: Thursday, 19 March 2009 5:48 PM To: de@synapse.apache.org Subject: Re: VFS - Synapse Memory Leak

Kim,

Can you post your current synapse.xml as well as the stack trace you get now?

On Thu, Mar 19, 2009 at 07:20, kimhorn <kim.@icsglobal.net> wrote:

Using the last stable build from 15 March 2009 I still get exactly same behaviour as originally described with the above script. VFS still just dies. Would your fixes be in this ?

Using the last st

Andreas Veithen-2 wrote:

I committed the code and it will be available in the next WS-Commons transport build. The methods are located in org.apache.axis2.format.ElementHelper in the axis2-transport-base module.

On Thu, Mar 12, 2009 at 00:06, Kim Horn <kim.@icsglobal.net> wrote:

Hello Andreas, This is great and really helps, have not had time to try it out but will soon.

Contributing the java.io.Reader would be a great help but it will take me a while to get up to speed to do the Synapse iterator.

In the short term I am going to use a brute force approach that is now feasible given the memory issue is resolved. Just thought of this one today. Use VFS proxy to FTP file locally; so streaming helps here. A POJOCommand on <out> to split file into another directory, stream in and out. Another independent VFS proxy watches that directory and submits each file to Web service. Hopefully memory will be fine. Overloading the destination may still be an issue ?

-----Original Message----- From: Andreas Veithen [mailto:andr@gmail.com] Sent: Monday, 9 March 2009 10:55 PM To: de@synapse.apache.org Subject: Re: VFS - Synapse Memory Leak

The changes I did in the VFS transport and the message builders for text/plain and application/octet-stream certainly don't provide an out-of-the-box solution for your use case, but they are the prerequisite.

Concerning your first proposed solution (let the VFS write the content to a temporary file), I don't like this because it would create a tight coupling between the VFS transport and the mediator. A design goal should be that the solution will still work if the file comes from another source, e.g. an attachment in an MTOM or SwA message.

I thing that an all-Synapse solution (2 or 3) should be possible, but this will require development of a custom mediator. This mediator would read the content, split it up (and store the chunks in memory or an disk) and executes a sub-sequence for each chunk. The execution of the sub-sequence would happen synchronously to limit the memory/disk space consumption (to the maximum chunk size) and to avoid flooding the destination service.

Note that it is probably not possible to implemented the mediator using a script because of the problematic String handling. Also, Spring, POJO and class mediators don't support sub-sequences (I think). Therefore it should be implemented as a full-featured Java mediator, probably taking the existing iterate mediator as a template. I can contribute the required code to get the text content in the form of a java.io.Reader.

Regards,

On Mon, Mar 9, 2009 at 03:05, kimhorn <kim.@icsglobal.net> wrote:

Although this is a good feature it may not solve the actual problem ? The main first issue on my list was the memory leak. However, the real problem is once I get this massive files I  have to send it to a web Service that can only take it in small chunks (about 14MB) . Streaming it straight out would just kill the destination Web service. It would get the memory error. The text document can be split apart easily, as it has independant records on each line seperated by <CR> <LF>.

In an earlier post; that was not responded too, I mentioned:

"Otherwise; for large EDI files a VFS iterator Mediator that streams through input file and outputs smaller chunks for processing, in Synapse, may be a solution ? "

So I had mentioned a few solutions, in prior posts, solution now are:

1) VFS writes straight to temporary file, then a Java mediator can process the file by splitting it into many smaller files. These files then trigger another VFS proxy that submits these to the final web Service. The problem is is that is uses the file system (not so bad). 2) A Java Mediator takes the <text> package and splits it up by wrapping into many XML <data> elements that can then be acted on by a Synapse Iterator. So replace the text message with many smaller XML elements. Problem is that this loads whole message into memory. 3) Create another Iterator in Synapse that works on Regular expression (to split the text data) or actually uses a for loop approach to chop the file into chunks based on the loop index value. E.g. Index = 23 means a 14K chunk 23 chunks into the data. 4) Using the approach proposed now - just submit the file straight (stream it) to another web service that chops it up. It may return an XML document with many sub elelements that allows the standard Iterator to work. Similar to (2) but using another service rather than Java to split document. 5) Using the approach proposed now - just submit the file straight (stream it) to another web service that chops it up but calls a Synapse proxy with each small packet of data that then forwards it to the final WEb Service. So the Web Service iterates across the data; and not Synapse.

Then other solutions replace Synapse with a stand alone Java program at the front end.

Another issue here is throttling: Splitting the file is one issues but submitting 100's of calls in parralel to the destination service would result in time outs... So need to work in throttling.

I agree and can understand the time factor and also +1 for reusing stuff than trying to invent the wheel again :-)

Thanks, Ruwan

On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen <andr@gmail.com>wrote:

Ruwan,

It's not a question of possibility, it is a question of available time :-)

Also note that some of the features that we might want to implement have some similarities with what is done for attachments in Axiom (except that an attachment is only available once, while a file over VFS can be read several times). I think there is also some existing code in Axis2 that might be useful. We should not reimplement these things but try to make the existing code reusable. This however is only realistic for the next release after 1.3.

Andreas

On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <ruwa@gmail.com> wrote:

Andreas,

Can we have the caching at the file system as a property to support the multiple layers touching the full message and is it possible make it to specify a threshold for streaming? For example if the message is touched several time we might still need streaming but not for the 100KB or lesser files.

Thanks, Ruwan

On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen < andr@gmail.com> wrote:

I've done an initial implementation of this feature. It is available in trunk and should be included in the next nightly build. In order to enable this in your configuration, you need to add the following property to the proxy:

<parameter name="transport.vfs.Streaming">true</parameter>

You also need to add the following mediators just before the <send> mediator:

<property action="remove" name="transportNonBlocking" scope="axis2"/> <property action="set" name="OUT_ONLY" value="true"/>

With this configuration Synapse will stream the data directly from the incoming to the outgoing transport without storing it in memory or in a temporary file. Note that this has two other side effects: * The incoming file (or connection in case of a remote file) will only be opened on demand. In this case this happens during execution of the <send> mediator. * If during the mediation the content of the file is needed several time (which is not the case in your example), it will be read several times. The reason is of course that the content is not cached.

I tested the solution with a 2GB file and it worked fine. The performance of the implementation is not yet optimal, but at least the memory consumption is constant.

Some additional comments: * The transport.vfs.Streaming property has no impact on XML and SOAP processing: this type of content is processed exactly as before. * With the changes described here, we have now two different policies for plain text and binary content processing: in-memory caching + no streaming (transport.vfs.Streaming=false) and no caching + deferred connection + streaming (transport.vfs.Streaming=true). Probably we should define a wider range of policies in the future, including file system caching + streaming. * It is necessary to remove the transportNonBlocking property (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send> mediator (more precisely the OperationClient) from executing the outgoing transport in a separate thread. This property is set by the incoming transport. I think this is a bug since I don't see any valid reason why the transport that handles the incoming request should determine the threading behavior of the transport that sends the outgoing request to the target service. Maybe Asankha can comment on this?

Andreas

On Thu, Mar 5, 2009 at 07:21, kimhorn <kim.@icsglobal.net>

wrote:

Thats good; as this stops us using Synapse.

Asankha C. Perera wrote:

Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError: Java heap space         at

java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)

        at

java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)

        at java.lang.StringBuffer.append(StringBuffer.java:307)         at java.io.StringWriter.write(StringWriter.java:72)         at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129)         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1078)         at org.apache.commons.io.IOUtils.toString(IOUtils.java:382)         at

org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68)

Since the content type is text, the plain text formatter is

trying to

use a String to parse as I see.. which is a problem for large content..

A definite bug we need to fix ..

cheers asankha

-- Asankha C. Perera AdroitLogic, http://adroitlogic.org

http://esbmagic.blogspot.com

To unsubscribe, e-mail: dev-@synapse.apache.org For additional commands, e-mail: dev-@synapse.apache.org

-- View this message in context:

http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html

Sent from the Synapse - Dev mailing list archive at Nabble.com.

To unsubscribe, e-mail: dev-@synapse.apache.org For additional commands, e-mail: dev-@synapse.apache.org

To unsubscribe, e-mail: dev-@synapse.apache.org For additional commands, e-mail: dev-@synapse.apache.org