| From | Sent On | Attachments |
|---|---|---|
| Grant Ingersoll | Nov 20, 2009 7:55 am | |
| Mark Miller | Nov 20, 2009 8:04 am | |
| Jake Mannix | Nov 20, 2009 8:14 am | |
| Mark Miller | Nov 20, 2009 8:14 am | |
| Jake Mannix | Nov 20, 2009 8:18 am | |
| Grant Ingersoll | Nov 20, 2009 10:08 am | |
| Jake Mannix | Nov 20, 2009 10:24 am | |
| Grant Ingersoll | Nov 20, 2009 1:58 pm | |
| Mark Miller | Nov 20, 2009 2:24 pm | |
| Jake Mannix | Nov 20, 2009 2:31 pm | |
| Mark Miller | Nov 20, 2009 2:39 pm | |
| Mark Miller | Nov 20, 2009 2:50 pm | |
| Jake Mannix | Nov 20, 2009 3:39 pm | |
| Mark Miller | Nov 20, 2009 4:09 pm | |
| Mark Miller | Nov 20, 2009 4:20 pm | |
| Jake Mannix | Nov 20, 2009 4:36 pm | |
| Jake Mannix | Nov 20, 2009 4:42 pm | |
| Jake Mannix | Nov 20, 2009 4:49 pm | |
| Mark Miller | Nov 20, 2009 4:49 pm | |
| Mark Miller | Nov 20, 2009 4:51 pm | |
| Jake Mannix | Nov 20, 2009 4:56 pm | |
| Mark Miller | Nov 20, 2009 5:02 pm | |
| Jake Mannix | Nov 20, 2009 5:10 pm | |
| Jake Mannix | Nov 20, 2009 5:13 pm | |
| Otis Gospodnetic | Nov 24, 2009 9:18 pm | |
| Otis Gospodnetic | Nov 24, 2009 9:31 pm | |
| Jake Mannix | Nov 24, 2009 9:39 pm | |
| Jake Mannix | Nov 24, 2009 9:43 pm | |
| Jake Mannix | Nov 24, 2009 9:55 pm | |
| Jake Mannix | Nov 24, 2009 10:30 pm |
| Subject: | Re: Whither Query Norm? | |
|---|---|---|
| From: | Mark Miller (mark...@gmail.com) | |
| Date: | Nov 20, 2009 5:02:09 pm | |
| List: | org.apache.lucene.java-dev | |
Go back and put it in after you have all the documents for that commit point. Or on reader load, calculate it.
- Mark
http://www.lucidimagination.com (mobile)
On Nov 20, 2009, at 7:56 PM, Jake Mannix <jake...@gmail.com> wrote:
On Fri, Nov 20, 2009 at 4:51 PM, Mark Miller <mark...@gmail.com> wrote: Okay - my fault - I'm not really talking in terms of Lucene. Though even there I consider it possible. You'd just have to like, rewrite it :) And it would likely be pretty slow.
Rewrite it how? When you index the very first document, the docFreq of all terms is 1, out of numDocs = 1 docs in the corpus. Everybody's idf is the same. No matter how you normalize this, it'll be wrong, once you've indexed a million documents. This isn't a matter of Lucene architecture, it's a matter of idf being a query-time exactly available value (you can approximate it partway through indexing, but you don't know it at all when you start).
-jake





