atom feed21 messages in org.apache.lucene.solr-devRe: Incremental Field Updates
FromSent OnAttachments
Shai EreraApr 25, 2010 4:32 am 
Michael McCandlessMay 5, 2010 8:54 am 
Shai EreraMay 5, 2010 10:17 am 
Babak FarhangMay 8, 2010 11:39 pm 
Shai EreraMay 8, 2010 11:49 pm 
Babak FarhangMay 9, 2010 12:31 am 
Shai EreraMay 9, 2010 4:38 am 
Michael McCandlessMay 10, 2010 12:43 am 
Shai EreraMay 10, 2010 1:04 am 
Michael McCandlessMay 10, 2010 1:40 am 
Babak FarhangMay 10, 2010 9:22 pm 
Shai EreraMay 10, 2010 9:26 pm 
Grant IngersollMay 11, 2010 12:40 pm 
Babak FarhangMay 12, 2010 2:27 am 
Michael McCandlessMay 12, 2010 2:55 am 
Jan Høydahl / CominventOct 7, 2010 1:59 am 
Shai EreraOct 7, 2010 2:06 am 
Andrzej BialeckiMay 23, 2011 9:07 am 
Shai EreraMay 23, 2011 11:24 am 
Michael McCandlessMay 23, 2011 12:03 pm 
Andrzej BialeckiMay 23, 2011 12:45 pm 
Subject:Re: Incremental Field Updates
From:Michael McCandless (luc@mikemccandless.com)
Date:May 12, 2010 2:55:10 am
List:org.apache.lucene.solr-dev

I think this would work perfectly fine w/ Shai's approach...

To Lucene a NumericField is just a series of terms w/ no positions indexed.

So when a value is changed, we'd get a new series of terms, do the delta, and then subtract & add accordingly in the stacked segments.

Mike

On Wed, May 12, 2010 at 5:27 AM, Babak Farhang <farh@gmail.com> wrote:

Of course, it raises an interesting point, what are the implications for numeric
fields?

Not sure whether you're referring to the general or the specific, but with the approach Shai is proposing, if the numeric fields are indexed using the new trie structures, then it would be important to properly remove the postings for the old value (I imagine range queries would break o.w.). Again, that could be achieved by having the update API take the old value as well as the new one.

-Babak

On Tue, May 11, 2010 at 1:40 PM, Grant Ingersoll <gsin@apache.org> wrote:

On May 11, 2010, at 12:26 AM, Shai Erera wrote:

but because of the cost of preparing the inputs (i.e. text extraction) to Lucene.

You're right ! That and also the cost of fetching the document, in systems where
the content lives on other servers/systems. Reindexing is usually (depends on
your analysis chain) the cheapest step.

Depends on the type of application, though, I suppose.  Many times the thing
being updated is just a number, like a rating/price/inventory as well, in which
case there is very little analysis.  Of course, it raises an interesting point,
what are the implications for numeric fields?