|Subject:||[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7|
|From:||Uwe Schindler (usch...@apache.org)|
|Date:||Jul 28, 2011 2:13:36 pm|
Hello Apache Lucene & Apache Solr users,
Hello users of other Java-based Apache projects,
Oracle released Java 7 today. Unfortunately it contains hotspot compiler optimizations, which miscompile some loops. This can affect code of several Apache projects. Sometimes JVMs only crash, but in several cases, results calculated can be incorrect, leading to bugs in applications (see Hotspot bugs 7070134 , 7044738 , 7068051 ).
Apache Lucene Core and Apache Solr are two Apache projects, which are affected by these bugs, namely all versions released until today. Solr users with the default configuration will have Java crashing with SIGSEGV as soon as they start to index documents, as one affected part is the well-known Porter stemmer (see LUCENE-3335 ). Other loops in Lucene may be miscompiled, too, leading to index corruption (especially on Lucene trunk with pulsing codec; other loops may be affected, too - LUCENE-3346 ).
These problems were detected only 5 days before the official Java 7 release, so Oracle had no time to fix those bugs, affecting also many more applications. In response to our questions, they proposed to include the fixes into service release u2 (eventually into service release u1, see ). This means you cannot use Apache Lucene/Solr with Java 7 releases before Update 2! If you do, please don't open bug reports, it is not the committers' fault! At least disable loop optimizations using the -XX:-UseLoopPredicate JVM option to not risk index corruptions.
Please note: Also Java 6 users are affected, if they use one of those JVM options, which are not enabled by default: -XX:+OptimizeStringConcat or -XX:+AggressiveOpts
It is strongly recommended not to use any hotspot optimization switches in any Java version without extensive testing!
In case you upgrade to Java 7, remember that you may have to reindex, as the unicode version shipped with Java 7 changed and tokenization behaves differently (e.g. lowercasing). For more information, read JRE_VERSION_MIGRATION.txt in your distribution package!
On behalf of the Lucene project, Uwe
 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051  https://issues.apache.org/jira/browse/LUCENE-3335  https://issues.apache.org/jira/browse/LUCENE-3346  http://s.apache.org/StQ