|Rob Audenaerde||Apr 23, 2014 3:56 am|
|Michael Sokolov||Apr 23, 2014 7:11 am|
|Rob Audenaerde||Apr 23, 2014 7:30 am|
|Shai Erera||Apr 23, 2014 7:38 am|
|Rob Audenaerde||Apr 23, 2014 7:49 am|
|Shai Erera||Apr 23, 2014 8:13 am|
|Rob Audenaerde||Apr 23, 2014 8:49 am|
|Shai Erera||Apr 24, 2014 3:20 am|
|Shai Erera||Apr 27, 2014 12:27 pm|
|Rob Audenaerde||Apr 29, 2014 12:04 am|
|Shai Erera||Apr 29, 2014 12:43 am|
|Subject:||Re: Getting multi-values to use in filter?|
|From:||Rob Audenaerde (rob....@gmail.com)|
|Date:||Apr 29, 2014 12:04:32 am|
I read the article on your blog, thanks for it! It seems to be a natural fit to
do multi-values like this, and it is helpful indeed. For my specific problem, I
have multiple values that do not have a fixed number, so it can be either 0 or
10 values. I think the best way to solve this is to encode the number of values
as first entry in the BDV. This is not that hard so I will take this road.
Op 27 apr. 2014 om 21:27 heeft Shai Erera <ser...@gmail.com> het volgende
Your question got me interested, so I wrote a quick prototype of what I think solves your problem (and if not, I hope it solves someone else's! :)). The idea is to write a special ValueSource, e.g. MaxValueSource which reads a BinadyDocValues, decodes the values and returns the maximum one. It can then be embedded in an expression quite easily.
I published a post on Lucene expressions and included some prototype code which demonstrates how to do it. Hope it's still helpful to you: http://shaierera.blogspot.com/2014/04/expressions-with-lucene.html.
On Thu, Apr 24, 2014 at 1:20 PM, Shai Erera <ser...@gmail.com> wrote:
I don't think that you should use the facet module. If all you want is to encode a bunch of numbers under a 'foo' field, you can encode them into a byte and index them as a BDV. Then at search time you get the BDV and decode the numbers back. The facet module adds complexity here: yes, you get the encoding/decoding for free, but at the cost of adding mock categories to the taxonomy, or use associations, for no good reason IMO.
On Wed, Apr 23, 2014 at 6:49 PM, Rob Audenaerde <rob....@gmail.com>wrote:
Thanks for all the questions, gives me an opportunity to clarify it :)
Currently, using single values, I can handle expressions in the form of "fieldA - fieldB - fieldC > 0" and evaluate the long-value that I receive from the FunctionValues and the ValueSource. I also optimize the query by assuring the field exists and has a value, etc. to the search still fast enough. This works well, but single value only.
I also looked into the facets Association Fields, as they somewhat look like the thing that I want. Only in the faceting module, all ordinals and values are stored in one field, so there is no easy way extract the fields that are used in the expression.
I like the solution one you suggested, to add all the numeric fields an encoded byte like the facets do, but then on a per-field basis, so that each numeric field has a BDV field that contains all multiple values for that field for that document.
Now that I am typing this, I think there is another way. I could use the faceting module and add a different facet field ($facetFIELDA, $facetFIELDB) in the FacetsConfig for each field. That way it would be relatively straightforward to get all the values for a field, as they are exact all the values for the BDV for that document's facet field. Only aggregating all facets will be harder, as the TaxonomyFacetSum*Associations would need to do this for all fields that I need facet counts/sums for.
What do you think?
On Wed, Apr 23, 2014 at 5:13 PM, Shai Erera <ser...@gmail.com> wrote:
A NumericDocValues field can only hold one value. Have you thought about encoding the values in a BinaryDocValues field? Or are you talking about multiple fields (different names), each has its own single value, and at search time you sum the values from a different set of fields?
If it's one field, multiple values, then why do you need to separate the values? Is it because you sometimes sum and sometimes e.g. avg? Do you always include all values of a document in the formula, but the formula changes between searches, or do you sometimes use only a subset of the values?
If you always use all values, but change the formula between queries, then perhaps you can just encode the pre-computed value under different NDV fields? If you only use a handful of functions (and they are known in advance), it may not be too heavy on the index, and definitely perform better during search.
Otherwise, I believe I'd consider indexing them as a BDV field. For facets, we basically need the same multi-valued numeric field, and given that NDV is single valued, we went w/ BDV.
If I misunderstood the scenario, I'd appreciate if you clarify it :)
On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde <
Hi Shai, all,
I am trying to write that Filter :). But I'm a bit at loss as how to efficiently grab the multi-values. I can access the context.reader().document() that accesses the storedfields, but that seems slow.
These create for example a LongFieldSource, which uses the FieldCache.LongParser. These parsers only seem te parse one field.
Is there an efficient way to get -all- of the (numeric) values for a field in a document?
On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera <ser...@gmail.com> wrote:
You can do that by writing a Filter which returns matching documents based on a sum of the field's value. However I suspect that is going to be slow, unless you know that you will need several such filters and can
Another approach would be to write a Collector which serves as a Filter, but computes the sum only for documents that match the query. Hopefully that would mean you compute the sum for less documents than you
w/ the Filter approach.
On Wed, Apr 23, 2014 at 5:11 PM, Michael Sokolov < msok...@safaribooksonline.com> wrote:
This isn't really a good use case for an index like Lucene. The most essential property of an index is that it lets you look up
quickly based on *precomputed* values.
On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
I'm looking for a way to use multi-values in a filter.
I want to be able to search on sum(field)=100, where field has
In this case 'field' is a LongField. I examined the code in the FieldCache, but that seems to focus on single-valued fields only, or
It this something that can be done in Lucene? And what would be a good approach?
Thanks in advance,