| From | Sent On | Attachments |
|---|---|---|
| Adriano Crestani | Jul 13, 2010 12:59 am | |
| Adriano Crestani | Jul 15, 2010 1:00 pm |
| Subject: | Cloning TermAttribute objects | |
|---|---|---|
| From: | Adriano Crestani (adri...@apache.org) | |
| Date: | Jul 13, 2010 12:59:20 am | |
| List: | org.apache.lucene.java-dev | |
Hi,
Why TermAttributeImpl.clone() method uses buff.clone() instead of System.arrayCopy to clone its internal buffer? Performance reasons?
I have the following scenario:
... public boolean incrementToken() { ... String twoHundredKCharsString = "abc...."; String smallString = "test";
termAttribute.setTermBuffer(twoHundredKCharsString); State largeStringState = captureState();
termAttribute.setTermBuffer(smallString); State smallStringState = captureState();
... } ...
And guess what?! smallStringState has a TermAttribute object that holds an internal buffer of 200k chars in size!!!
I was googling and found out that using cloning and arrayCopy has the same performance for small arrays, and cloning just performs better for large arrays.
So, if large string inputs are not a real scenario, why not use arrayCopy instead of clone? But in case it's a real scenario, Lucene should definitely not be copying the entire buffer for small strings.
Maybe TermAttribute interface could expose a method like shrinkBuffer(), so the user could invoke when it needs to.
Thoughts?
Best Regards, Adriano Crestani





