atom feed11 messages in org.apache.lucene.java-userRe: Getting multi-values to use in fi...
FromSent OnAttachments
Rob AudenaerdeApr 23, 2014 3:56 am 
Michael SokolovApr 23, 2014 7:11 am 
Rob AudenaerdeApr 23, 2014 7:30 am 
Shai EreraApr 23, 2014 7:38 am 
Rob AudenaerdeApr 23, 2014 7:49 am 
Shai EreraApr 23, 2014 8:13 am 
Rob AudenaerdeApr 23, 2014 8:49 am 
Shai EreraApr 24, 2014 3:20 am 
Shai EreraApr 27, 2014 12:27 pm 
Rob AudenaerdeApr 29, 2014 12:04 am 
Shai EreraApr 29, 2014 12:43 am 
Subject:Re: Getting multi-values to use in filter?
From:Shai Erera (ser@gmail.com)
Date:Apr 23, 2014 8:13:25 am
List:org.apache.lucene.java-user

A NumericDocValues field can only hold one value. Have you thought about encoding the values in a BinaryDocValues field? Or are you talking about multiple fields (different names), each has its own single value, and at search time you sum the values from a different set of fields?

If it's one field, multiple values, then why do you need to separate the values? Is it because you sometimes sum and sometimes e.g. avg? Do you always include all values of a document in the formula, but the formula changes between searches, or do you sometimes use only a subset of the values?

If you always use all values, but change the formula between queries, then perhaps you can just encode the pre-computed value under different NDV fields? If you only use a handful of functions (and they are known in advance), it may not be too heavy on the index, and definitely perform better during search.

Otherwise, I believe I'd consider indexing them as a BDV field. For facets, we basically need the same multi-valued numeric field, and given that NDV is single valued, we went w/ BDV.

If I misunderstood the scenario, I'd appreciate if you clarify it :)

Shai

On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde <rob.@gmail.com>wrote:

Hi Shai, all,

I am trying to write that Filter :). But I'm a bit at loss as how to efficiently grab the multi-values. I can access the context.reader().document() that accesses the storedfields, but that seems slow.

For single-value fields I use a compiled JavaScript Expression with simplebindings as ValueSource, which seems to work quite well. The downside is that I cannot find a way to implement multi-value through that solution.

These create for example a LongFieldSource, which uses the FieldCache.LongParser. These parsers only seem te parse one field.

Is there an efficient way to get -all- of the (numeric) values for a field in a document?

On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera <ser@gmail.com> wrote:

You can do that by writing a Filter which returns matching documents based on a sum of the field's value. However I suspect that is going to be slow, unless you know that you will need several such filters and can cache them.

Another approach would be to write a Collector which serves as a Filter, but computes the sum only for documents that match the query. Hopefully that would mean you compute the sum for less documents than you would have w/ the Filter approach.

On Wed, Apr 23, 2014 at 5:11 PM, Michael Sokolov < msok@safaribooksonline.com> wrote:

This isn't really a good use case for an index like Lucene. The most essential property of an index is that it lets you look up documents very quickly based on *precomputed* values.

On 04/23/2014 06:56 AM, Rob Audenaerde wrote:

Hi all,

I'm looking for a way to use multi-values in a filter.

I want to be able to search on sum(field)=100, where field has values in one documents:

field=60 field=40

In this case 'field' is a LongField. I examined the code in the FieldCache, but that seems to focus on single-valued fields only, or

It this something that can be done in Lucene? And what would be a good approach?

Thanks in advance,