| From | Sent On | Attachments |
|---|---|---|
| Grant Ingersoll | Feb 24, 2010 5:41 am | |
| Chris Lu | Feb 24, 2010 9:00 am | |
| Simon Wistow | Feb 24, 2010 11:59 am | |
| Yuval Feinstein | Feb 24, 2010 12:09 pm | |
| Avi Rosenschein | Feb 24, 2010 12:18 pm | |
| Marcelo Ochoa | Feb 24, 2010 12:22 pm | |
| Michael van Rooyen | Feb 24, 2010 12:24 pm | |
| Aaron Lav | Feb 24, 2010 1:20 pm | |
| Paul Libbrecht | Feb 24, 2010 1:21 pm | |
| Avi Rosenschein | Feb 24, 2010 1:38 pm | |
| Ganesh | Feb 24, 2010 9:40 pm | |
| luoc...@sohu.com | Feb 25, 2010 12:14 am | |
| Paul Taylor | Feb 25, 2010 12:19 am | |
| Uwe Schindler | Feb 25, 2010 12:29 am | |
| Avi Rosenschein | Feb 25, 2010 1:44 am | |
| luocanrao | Feb 25, 2010 3:46 am | |
| Michael McCandless | Feb 25, 2010 7:19 am | |
| Glen Newton | Feb 25, 2010 9:21 am | |
| Jason Rutherglen | Feb 25, 2010 9:51 am | |
| Grant Ingersoll | Feb 25, 2010 10:00 am | |
| Grant Ingersoll | Feb 25, 2010 10:01 am | |
| Grant Ingersoll | Feb 25, 2010 10:02 am | |
| Mark Miller | Feb 25, 2010 10:33 am | |
| Jason Rutherglen | Feb 25, 2010 3:18 pm | |
| N. Hira | Feb 25, 2010 3:37 pm | |
| Mark Miller | Feb 25, 2010 4:02 pm | |
| Thomas Guttesen | Feb 25, 2010 4:05 pm | |
| luoc...@sohu.com | Feb 25, 2010 10:47 pm | |
| Michael McCandless | Feb 26, 2010 12:46 am | |
| Paul Taylor | Feb 26, 2010 1:30 am | |
| Glen Newton | Feb 27, 2010 7:03 am | |
| Uwe Schindler | Feb 27, 2010 7:17 am | |
| Glen Newton | Feb 27, 2010 8:18 am | |
| Ganesh | Mar 1, 2010 12:56 am |
| Subject: | boosts for unstemmed matches (was Re: If you could have one feature in Lucene...) | |
|---|---|---|
| From: | Aaron Lav (as...@pobox.com) | |
| Date: | Feb 24, 2010 1:20:31 pm | |
| List: | org.apache.lucene.java-user | |
On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote:
On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll <gsin...@apache.org>wrote:
What would it be?
For scoring to take into account the non-analyzed token stream.
That is, if a field is analyzed (stemmed, lowercased, maybe even stop words removed), that is fine for indexing. But tokens in the query matching the original form could still get a higher score than those that only match when analyzed.
You can get some of that effect by indexing stemmed and unstemmed forms, and letting IDF boost unstemmed results. (I picked this idea up from http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/)
Also, this would maybe allow a flexible, run-time, decision of what analyzers to include. For example, I might want stemming turned on for normal search, but not for a PhraseQuery.
That's harder - different field names for the different analyses might
work, but not for run-time decisions. I think the way Sun's Minion does
it is morphologically-based query expansion (see
http://blogs.sun.com/searchguy/entry/lightweight_morphology_vs_stemming), and
you might be able to
implement that via query rewriting.
Aaron Lav (as...@pobox.com)





