| From | Sent On | Attachments |
|---|---|---|
| Grant Ingersoll | Nov 20, 2009 7:55 am | |
| Mark Miller | Nov 20, 2009 8:04 am | |
| Jake Mannix | Nov 20, 2009 8:14 am | |
| Mark Miller | Nov 20, 2009 8:14 am | |
| Jake Mannix | Nov 20, 2009 8:18 am | |
| Grant Ingersoll | Nov 20, 2009 10:08 am | |
| Jake Mannix | Nov 20, 2009 10:24 am | |
| Grant Ingersoll | Nov 20, 2009 1:58 pm | |
| Mark Miller | Nov 20, 2009 2:24 pm | |
| Jake Mannix | Nov 20, 2009 2:31 pm | |
| Mark Miller | Nov 20, 2009 2:39 pm | |
| Mark Miller | Nov 20, 2009 2:50 pm | |
| Jake Mannix | Nov 20, 2009 3:39 pm | |
| Mark Miller | Nov 20, 2009 4:09 pm | |
| Mark Miller | Nov 20, 2009 4:20 pm | |
| Jake Mannix | Nov 20, 2009 4:36 pm | |
| Jake Mannix | Nov 20, 2009 4:42 pm | |
| Jake Mannix | Nov 20, 2009 4:49 pm | |
| Mark Miller | Nov 20, 2009 4:49 pm | |
| Mark Miller | Nov 20, 2009 4:51 pm | |
| Jake Mannix | Nov 20, 2009 4:56 pm | |
| Mark Miller | Nov 20, 2009 5:02 pm | |
| Jake Mannix | Nov 20, 2009 5:10 pm | |
| Jake Mannix | Nov 20, 2009 5:13 pm | |
| Otis Gospodnetic | Nov 24, 2009 9:18 pm | |
| Otis Gospodnetic | Nov 24, 2009 9:31 pm | |
| Jake Mannix | Nov 24, 2009 9:39 pm | |
| Jake Mannix | Nov 24, 2009 9:43 pm | |
| Jake Mannix | Nov 24, 2009 9:55 pm | |
| Jake Mannix | Nov 24, 2009 10:30 pm |
| Subject: | Re: Whither Query Norm? | |
|---|---|---|
| From: | Mark Miller (mark...@gmail.com) | |
| Date: | Nov 20, 2009 4:51:23 pm | |
| List: | org.apache.lucene.java-dev | |
Okay - my fault - I'm not really talking in terms of Lucene. Though even there I consider it possible. You'd just have to like, rewrite it :) And it would likely be pretty slow.
Jake Mannix wrote:
On Fri, Nov 20, 2009 at 4:20 PM, Mark Miller <mark...@gmail.com <mailto:mark...@gmail.com>> wrote:
Mark Miller wrote: > > it looks expensive to me to do both > of them properly. Okay - I guess that somewhat makes sense - you can calculate the magnitude of the doc vectors at index time. How is that impossible with incremental indexing though? Isn't it just expensive? Seems somewhat expensive in the non incremental case as well - your just eating it at index time rather than query time - though the same could be done for incremental? The information is all there in either case.
The expense, if you have the idfs of all terms in the vocabulary (keep them in the form of idf^2 for efficiency at index time), is pretty trivial, isn't it? If you have a document with 1000 terms, it's maybe 3000 floating point operations, all CPU actions, in memory, no disk seeks.
What it does require, is knowing, even when you have no documents yet on disk, what the idf of terms in the first few documents are. Where do you know this, in Lucene, if you haven't externalized some notion of idf?
-jake
--------------------------------------------------------------------- To unsubscribe, e-mail: java...@lucene.apache.org <mailto:java...@lucene.apache.org> For additional commands, e-mail: java...@lucene.apache.org <mailto:java...@lucene.apache.org>





