atom feed21 messages in org.apache.lucene.solr-devRe: Incremental Field Updates
FromSent OnAttachments
Shai EreraApr 25, 2010 4:32 am 
Michael McCandlessMay 5, 2010 8:54 am 
Shai EreraMay 5, 2010 10:17 am 
Babak FarhangMay 8, 2010 11:39 pm 
Shai EreraMay 8, 2010 11:49 pm 
Babak FarhangMay 9, 2010 12:31 am 
Shai EreraMay 9, 2010 4:38 am 
Michael McCandlessMay 10, 2010 12:43 am 
Shai EreraMay 10, 2010 1:04 am 
Michael McCandlessMay 10, 2010 1:40 am 
Babak FarhangMay 10, 2010 9:22 pm 
Shai EreraMay 10, 2010 9:26 pm 
Grant IngersollMay 11, 2010 12:40 pm 
Babak FarhangMay 12, 2010 2:27 am 
Michael McCandlessMay 12, 2010 2:55 am 
Jan Høydahl / CominventOct 7, 2010 1:59 am 
Shai EreraOct 7, 2010 2:06 am 
Andrzej BialeckiMay 23, 2011 9:07 am 
Shai EreraMay 23, 2011 11:24 am 
Michael McCandlessMay 23, 2011 12:03 pm 
Andrzej BialeckiMay 23, 2011 12:45 pm 
Subject:Re: Incremental Field Updates
From:Jan Høydahl / Cominvent (jan.@cominvent.com)
Date:Oct 7, 2010 1:59:08 am
List:org.apache.lucene.solr-dev

Picking up on this very interesting discussion.. Great and innovative piece of work, Shai!

I think we come a long way addressing common scenarios through this approach.
Many customers really just need ACL or other metadata updates. One example is a
customer of mine who have an index of large docs for which the source data is
archived on tape. It is way too costly to retrieve the original data to compile
a new document for a metadata update only.

Also, if I want to have the ability to update a whole field, I would be happy to
make it stored, rather than having to supply the original value to the API.
Seems like a reasonable tradeoff for getting incremental update - nobody would
expect it to be free.

+1 for solving the "simple metadata" update case first, with full-field update
support for stored fields only.

Does this particular solution currently have an associated JIRA issue?

On 10. mai 2010, at 10.40, Michael McCandless wrote:

On Mon, May 10, 2010 at 4:05 AM, Shai Erera <ser@gmail.com> wrote:

That's an interesting scenario Mike.

Previously, I only handled boolean-like terms, as the scenarios we were asked to support involved just those types of terms. Obviously, when the approach allows for more, more scenarios pop to mind :).

OK.

I think we may still be able to resolve that case, but it becomes much more complicated. My design approach of adding the +/- affected the entire posting element, whereas the scenario you describe affects the positions of the posting element. This calls for a more complicated design and solution.

Right.

My take on it is that if someone wants to update the catch-all field, then reindexing the document may not be such a bad idea anyway. The purpose of those incremental updates is to cope w/ high frequency of updates, which usually happen on metadata fields, and not title.

I agree.

But since one could add the 'tags' to the catch-all field as well, it brings us to the same point - how do I remove the positions of term X that relate to the tag X and not the potentially original term X that existed in the document?

This is a very advanced case (and interesting). I don't want to hold up the discussion on it, but want to make sure we do not deviate from getting the more simpler cases in first. Depending on the API, this might be very easy to solve, but might also complicate matters. Maybe, for a incr-field-updates-v1, we can do without it?

Definitely, let's take this (incrementally updating the positions as well) out of scope for the first cut, when we actually start building things. One simple way to do this might be to only allow incremental update on fields that have omitTFAP=true.

When brainstorming/designing a new feature, I like to cast a wide net during the discussion/thinking (what we are doing now), but then when it comes to what to actually build for phase one well pull it way back in and aim for baby steps / progress not perfection. We are able to do much more imagining than we can actually writing code :)

The wide net during brainstorming gives us a better view/context of the road ahead, eg to validate that the baby step is in the right direction, so that it doesn't preclude other things we might imagine later.

In this case, it does sound like the approach should work (in theory) fine w/ positions, too.