atom feed7 messages in Include BM25 in Lucene?
FromSent OnAttachments
J.ZhuOct 17, 2006 2:50 am 
Grant IngersollOct 17, 2006 3:56 am 
J.ZhuOct 17, 2006 3:58 am 
Vic BancroftOct 17, 2006 5:44 am 
J.ZhuOct 17, 2006 9:03 am 
Chuck WilliamsOct 17, 2006 12:41 pm 
Vic BancroftOct 19, 2006 5:27 am 
Subject:Re: Include BM25 in Lucene?
From:Chuck Williams (
Date:Oct 17, 2006 12:41:33 pm

Vic Bancroft wrote on 10/17/2006 02:44 AM:

In some of my group's usage of lucene over large document collections, we have split the documents across several machines. This has lead to a concern of whether the inverse document frequency was appropriate, since the score seems to be dependant on the partioning of documents over indexing hosts. We have not formulated an experiment to determine if it seriously effects our results, though it has been discussed.

What version of Lucene are you using? Are you using ParallelMultiSearcher to manage the distributed indexes or have you implemented your own mechanism? There was a bug a couple years ago, in the 1.4.3 version as I recall, where ParallelMultiSearcher was not computing df's appropriately, but that has been fixed for a long time now. The df's are the sum of the df's from each distributed index and thus are independent of the partitioning.