atom feed34 messages in org.apache.lucene.java-userboosts for unstemmed matches (was Re:...
FromSent OnAttachments
Grant IngersollFeb 24, 2010 5:41 am 
Chris LuFeb 24, 2010 9:00 am 
Simon WistowFeb 24, 2010 11:59 am 
Yuval FeinsteinFeb 24, 2010 12:09 pm 
Avi RosenscheinFeb 24, 2010 12:18 pm 
Marcelo OchoaFeb 24, 2010 12:22 pm 
Michael van RooyenFeb 24, 2010 12:24 pm 
Aaron LavFeb 24, 2010 1:20 pm 
Paul LibbrechtFeb 24, 2010 1:21 pm 
Avi RosenscheinFeb 24, 2010 1:38 pm 
GaneshFeb 24, 2010 9:40 pm 
luoc...@sohu.comFeb 25, 2010 12:14 am 
Paul TaylorFeb 25, 2010 12:19 am 
Uwe SchindlerFeb 25, 2010 12:29 am 
Avi RosenscheinFeb 25, 2010 1:44 am 
luocanraoFeb 25, 2010 3:46 am 
Michael McCandlessFeb 25, 2010 7:19 am 
Glen NewtonFeb 25, 2010 9:21 am 
Jason RutherglenFeb 25, 2010 9:51 am 
Grant IngersollFeb 25, 2010 10:00 am 
Grant IngersollFeb 25, 2010 10:01 am 
Grant IngersollFeb 25, 2010 10:02 am 
Mark MillerFeb 25, 2010 10:33 am 
Jason RutherglenFeb 25, 2010 3:18 pm 
N. HiraFeb 25, 2010 3:37 pm 
Mark MillerFeb 25, 2010 4:02 pm 
Thomas GuttesenFeb 25, 2010 4:05 pm 
luoc...@sohu.comFeb 25, 2010 10:47 pm 
Michael McCandlessFeb 26, 2010 12:46 am 
Paul TaylorFeb 26, 2010 1:30 am 
Glen NewtonFeb 27, 2010 7:03 am 
Uwe SchindlerFeb 27, 2010 7:17 am 
Glen NewtonFeb 27, 2010 8:18 am 
GaneshMar 1, 2010 12:56 am 
Subject:boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)
From:Aaron Lav (as@pobox.com)
Date:Feb 24, 2010 1:20:31 pm
List:org.apache.lucene.java-user

On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote:

On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll <gsin@apache.org>wrote:

What would it be?

For scoring to take into account the non-analyzed token stream.

That is, if a field is analyzed (stemmed, lowercased, maybe even stop words removed), that is fine for indexing. But tokens in the query matching the original form could still get a higher score than those that only match when analyzed.

You can get some of that effect by indexing stemmed and unstemmed forms, and letting IDF boost unstemmed results. (I picked this idea up from http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/)

Also, this would maybe allow a flexible, run-time, decision of what analyzers to include. For example, I might want stemming turned on for normal search, but not for a PhraseQuery.

That's harder - different field names for the different analyses might work, but not for run-time decisions. I think the way Sun's Minion does it is morphologically-based query expansion (see
http://blogs.sun.com/searchguy/entry/lightweight_morphology_vs_stemming), and
you might be able to implement that via query rewriting.