|Shai Erera||Apr 25, 2010 4:32 am|
|Michael McCandless||May 5, 2010 8:54 am|
|Shai Erera||May 5, 2010 10:17 am|
|Babak Farhang||May 8, 2010 11:39 pm|
|Shai Erera||May 8, 2010 11:49 pm|
|Babak Farhang||May 9, 2010 12:31 am|
|Shai Erera||May 9, 2010 4:38 am|
|Michael McCandless||May 10, 2010 12:43 am|
|Shai Erera||May 10, 2010 1:04 am|
|Michael McCandless||May 10, 2010 1:40 am|
|Babak Farhang||May 10, 2010 9:22 pm|
|Shai Erera||May 10, 2010 9:26 pm|
|Grant Ingersoll||May 11, 2010 12:40 pm|
|Babak Farhang||May 12, 2010 2:27 am|
|Michael McCandless||May 12, 2010 2:55 am|
|Jan Høydahl / Cominvent||Oct 7, 2010 1:59 am|
|Shai Erera||Oct 7, 2010 2:06 am|
|Andrzej Bialecki||May 23, 2011 9:07 am|
|Shai Erera||May 23, 2011 11:24 am|
|Michael McCandless||May 23, 2011 12:03 pm|
|Andrzej Bialecki||May 23, 2011 12:45 pm|
|Subject:||Re: Incremental Field Updates|
|From:||Michael McCandless (luc...@mikemccandless.com)|
|Date:||May 12, 2010 2:55:10 am|
I think this would work perfectly fine w/ Shai's approach...
To Lucene a NumericField is just a series of terms w/ no positions indexed.
So when a value is changed, we'd get a new series of terms, do the delta, and then subtract & add accordingly in the stacked segments.
On Wed, May 12, 2010 at 5:27 AM, Babak Farhang <farh...@gmail.com> wrote:
Of course, it raises an interesting point, what are the implications for numeric
Not sure whether you're referring to the general or the specific, but with the approach Shai is proposing, if the numeric fields are indexed using the new trie structures, then it would be important to properly remove the postings for the old value (I imagine range queries would break o.w.). Again, that could be achieved by having the update API take the old value as well as the new one.
On Tue, May 11, 2010 at 1:40 PM, Grant Ingersoll <gsin...@apache.org> wrote:
On May 11, 2010, at 12:26 AM, Shai Erera wrote:
but because of the cost of preparing the inputs (i.e. text extraction) to Lucene.
You're right ! That and also the cost of fetching the document, in systems where
the content lives on other servers/systems. Reindexing is usually (depends on
your analysis chain) the cheapest step.
Depends on the type of application, though, I suppose. Many times the thing
being updated is just a number, like a rating/price/inventory as well, in which
case there is very little analysis. Of course, it raises an interesting point,
what are the implications for numeric fields?