9 messages in org.apache.lucene.solr-userRe: Debugging Solr memory usage/heap ...
FromSent OnAttachments
Graham SteadFeb 6, 2007 11:38 am 
Mike KlaasFeb 6, 2007 12:01 pm 
Yonik SeeleyFeb 6, 2007 12:02 pm 
Graham SteadFeb 6, 2007 12:49 pm 
Mike KlaasFeb 6, 2007 1:01 pm 
Graham SteadFeb 6, 2007 1:21 pm 
Chris HostetterFeb 6, 2007 9:45 pm 
Graham SteadFeb 6, 2007 9:49 pm 
Otis GospodneticFeb 7, 2007 5:17 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: Debugging Solr memory usage/heap problemsActions
From:Yonik Seeley (yon@apache.org)
Date:Feb 6, 2007 12:02:23 pm
List:org.apache.lucene.solr-user

On 2/6/07, Graham Stead <gst@ieee.org> wrote:

Hi everyone,

My Solr JVM runs out of heap space quite frequently. I'm trying to understand Solr/Lucene's memory usage so I can address the problem correctly. Otherwise, I feel I'm taking random shots in the dark.

I've tried previous troubleshooting suggestions. Here's what I've done:

1) Increased Tomcat's JVM heap space, e.g.: JAVA_OPTS='-Xmx1244m -Xms1244m -server'; # frequent heap space problems JAVA_OPTS='-XX:+AggressiveHeap -server'; # runs out of heap space at 2.0g JAVA_OPTS='-Xmx3072m -Xms3072m -server'; # jvm quickly hits 2.9g on 'top'

Solr is the only webapp deployed on this Tomcat instance.

2) I use Solr collection/distribution to separate indexing and searching. The indexer is stable now and memory problems only occur when searching on the Solr slave.

3) In solrconfig.xml, I reduced mergeFactor and maxBufferedDocs by 50%: <mergeFactor>5</mergeFactor> <maxBufferedDocs>500</maxBufferedDocs>

This helped the indexing server but not the Solr slave.

4) In solrconfig.xml, I set filterCache, queryResultCache, and documentCache to 0.

Now for my index details: - To facilitate highlighting, I currently store doc contents in the index, so the index consumes 24GB on disk. - numDocs : 4,953,736 maxDoc : 4,953,736 (just optimized) - Term files: logs # du -ksh ../solr/data/index/*.t?? 5.9M ../solr/data/index/_1kjb.tii 429M ../solr/data/index/_1kjb.tis - I have 22 fields and yes, they currently have norms.

Other info that may be helpful: - My Solr is from 2006-11-15. We have a few mods, including one extra fieldCache that stores ~40 bytes/doc. - Thread counts from solr/admin/threaddump.jsp: Java HotSpot(TM) 64-Bit Server VM 1.5.0_08-b03 Thread Count: current=37 deamon=34 peak=37

My machine has Gentoo Linux and 4gb RAM. 'top' indicates the JVM reaches 2.9g RAM (3472m virtual memory) after 10-20 searches and ~20 mins of use. It seems just a matter of time before more searches or a snapinstaller 'commit' will make it run out of heap space again.

I have flexibility in the changes we can make. I.e., I can omit norms for most fields, or I can stop storing the doc contents in the index. But before embarking on a new strategy, I need some assurance that the strategy will work (crazy, I know). For example, it doesn't seem that removing norms would save a great deal (I calculate saving 1 byte per norm per field on 21 fields is ~99MB).

So...how do I deduce what's taking up so much memory? Any suggestions would be very helpful to me (and hopefully to others, too).

many thanks, -Graham

1) Sorting on fields currently takes up a lot of memory... lucene FieldCache info can be large (4 bytes per doc per field sorted on, plus the unique strings).

2) If your stored fields are very large, try reducing the size of the doc cache.

During warming, there are *two* searchers open, so double the number for things like the FieldCache. If you can accept slow first queries (like maybe in an offline query system) then you can turn off all warming.

-Yonik