|Subject:||Re: Summer of Code idea for lucene|
|From:||Mark Miller (mark...@gmail.com)|
|Date:||Sep 13, 2008 7:47:56 am|
Your work here looks very interesting. The Lucene community has shown a strong interest in this area before (see LUCENE-965).
I see you went with an lgpl license though. This might be a bit of a barrier in getting feedback from a community based on apache license software. Obviously, there still might be interest,learning, and an exchange of ideas, but none of your code can be distributed with Lucene, and so what you have done loses some of its appeal in that sense. Is there any chance you would be willing to relax the license, possibly gaining more feedback, contributors, and possible inclusion in Lucene? Certainly not necessary to receive feedback, but I think it would help -- I'd certainly be looking closer anyway.
Joaquin Perez Iglesias wrote:
finally I got some time to finish the BM25/BM25F implementation for Lucene you can find more details at http://nlp.uned.es/~jperezi/Lucene-BM25/, it has been tested but I cannot assure that is bugs free. It would be great to receive some feedback about it.
There are some details about the implementation that I consider will be of interest,as how to calculate the average_length or idf at document level. Please if you find any bug or mistake in the supplied implementation let me know and I will try to solve it, same for questions.
Hope that some of you will find useful.
Thanks in advance.
as my colleague said, we have a first implementation of BM25 over Lucene, this development is part of a (almost finished) thesis project that compares different IR models, over an standard collection. At the same time we are trying to extend this first implementation in order to support BM25F for multifield queries, unfortunately at this time we are too busy to prepare a final version of this code, so we will have to finish this code over the summer (hopefully we will have more time :-))), and make it public at this time.
We will inform to this list when we will finish the preparation of a final version.
Thanks to everybody for the interest!!!
----------------------------------------------------------- Joaquín Pérez Iglesias Dpto. Lenguajes y Sistemas Informáticos E.T.S.I. Informática (UNED) Ciudad Universitaria C/ Juan del Rosal nº 16 28040 Madrid - Spain Phone. +34 91 398 87 25 Fax +34 91 398 65 35 Office 2.07 Email: joaq...@lsi.uned.es ----------------------------------------------------------- Otis Gospodnetic <otis...@yahoo.com> escribe :
I was wondering if you ever got to this. I would love to see and try BM25 for Lucene!
I'm looking at http://code.google.com/soc/2008/asf/about.html and it looks like this didn't make it into GSoC, but this would still be great to have.
-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
Joaquin Perez-Iglesias <joaq...@gmail.com>
Sent: Saturday, March 15, 2008 4:54:08 AM Subject: Re: Summer of Code idea for lucene
we have almost implemented BM25 using lucene structure, but we need help to finish query parser and other details. If you o somebody want We can send you the code and you can help us to implement the query parser and prepare the code to sandbox.
If there are people interested I can made a web page for the project and put our implementatio to download
Somebody is interested?
-- José Ramón Pérez Agüera
Dept. de Ingeniería del Software e Inteligencia Artificial Despacho 411 tlf. 913947599 Facultad de Informática Universidad Complutense de Madrid
On Sat, Mar 15, 2008 at 5:32 AM, Ian Holsman wrote:
If no one objects (I don't think it's too late)
would you mind a GSOC project to implement BM25
________________________________________________ Servicio WebMail de CiberUNED http://www.uned.es