7 messages in org.apache.lucene.java-devRe: Include BM25 in Lucene?
FromSent OnAttachments
J.ZhuOct 17, 2006 2:50 am 
Grant IngersollOct 17, 2006 3:56 am 
J.ZhuOct 17, 2006 3:58 am 
Vic BancroftOct 17, 2006 5:43 am 
J.ZhuOct 17, 2006 9:02 am 
Chuck WilliamsOct 17, 2006 12:41 pm 
Vic BancroftOct 19, 2006 5:27 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: Include BM25 in Lucene?Actions...
From:Vic Bancroft (banc@america.net)
Date:Oct 17, 2006 5:43:44 am
List:org.apache.lucene.java-dev

J.Zhu wrote:

If I would like to contribute, what should I do? I am not a good Java developer myself though. Can I work with someone also interested?

In some of my group's usage of lucene over large document collections, we have split the documents across several machines. This has lead to a concern of whether the inverse document frequency was appropriate, since the score seems to be dependant on the partioning of documents over indexing hosts. We have not formulated an experiment to determine if it seriously effects our results, though it has been discussed.

If someone could elaborate how BM25 or some DFR algorithm would differ from what (TF/IDF) is implemented in lucene, I would be willing to help translate that into java as an indexing/searching option . . .

more, l8r, v