atom feed15 messages in net.java.openjdk.core-libs-devDismal performance of String.intern()
FromSent OnAttachments
Steven SchlanskerJun 10, 2013 11:05 am 
Andrew HaleyJun 10, 2013 3:55 pm 
Steven SchlanskerJun 10, 2013 3:59 pm 
Steven SchlanskerJun 10, 2013 4:01 pm 
David HolmesJun 10, 2013 10:42 pm 
Florian WeimerJun 11, 2013 12:19 am 
Remi ForaxJun 11, 2013 1:31 am 
Aleksey ShipilevJun 11, 2013 1:51 am 
Alan BatemanJun 11, 2013 2:27 am 
Steven SchlanskerJun 12, 2013 11:27 am 
Aleksey ShipilevJun 12, 2013 11:37 am 
Steven SchlanskerAug 4, 2013 10:20 pm 
Ioi LamAug 5, 2013 11:15 am 
Aleksey ShipilevAug 5, 2013 11:29 am 
Aleksey ShipilevAug 5, 2013 11:43 am 
Subject:Dismal performance of String.intern()
From:Steven Schlansker (stev@gmail.com)
Date:Jun 10, 2013 11:05:51 am
List:net.java.openjdk.core-libs-dev

Hi core-libs-dev,

While doing performance profiling of my application, I discovered that nearly
50% of the time deserializing JSON was spent within String.intern(). I
understand that in general interning Strings is not the best approach for
things, but I think I have a decent use case -- the value of a certain field is
one of a very limited number of valid values (that are not known at compile
time, so I cannot use an Enum), and is repeated many millions of times in the
JSON stream.

I discovered that replacing String.intern() with a ConcurrentHashMap improved
performance by almost an order of magnitude.

I'm not the only person that discovered this and was surprised:
http://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern

I've been excited about starting to contribute to OpenJDK, so I am thinking that
this might be a fun project for me to take on and then contribute back. But I
figured I should check in on the list before spending a lot of time tracking
this down. I have a couple of preparatory questions:

* Has this bottleneck been examined thoroughly before? Am I wishing too hard
for performance here?

* String.intern() is a native method currently. My understanding is that there
is a nontrivial penalty to invoking native methods (at least via JNI, not sure
if this is also true for "built ins"?). I assume the reason that this is native
is so the Java intern is the same as C++-invoked interns from within the JVM
itself. Is this an actual requirement, or could String.intern be replaced with
Java code?

* If the interning itself must be handled by a symbol table in C++ land as it is
today, would a "second level cache" in Java land that invokes a native "intern0"
method be acceptable, so that there is a low-penalty "fast path"? If so, this
would involve a nonzero memory cost, although I assume that a few thousand
references inside of a Map is an OK price to pay for a (for example) 5x speedup.

* I assume the String class itself is loaded at a very sensitive time during VM
initialization. Having String initialization trigger (for example)
ConcurrentHashMap class initialization may cause problems or circularities. If
this is the case, would triggering such a load lazily on the first intern() call
be "late enough" as to not cause problems?

I'm sure that if I get anywhere with this I will have more questions, but this
should get me started. Thank you for any advice / insight you may be able to
provide!

Steven