atom feed7 messages in org.apache.lucene.java-devRe: Summer of Code idea for lucene
FromSent OnAttachments
Otis GospodneticMay 16, 2008 9:10 pm 
joaq...@lsi.uned.esMay 17, 2008 12:49 am 
Joaquin Perez IglesiasSep 2, 2008 2:27 am 
Mark MillerSep 13, 2008 7:47 am 
joaq...@lsi.uned.esSep 13, 2008 10:48 am 
Mark MillerSep 13, 2008 3:35 pm 
joaq...@lsi.uned.esSep 14, 2008 2:53 am 
Subject:Re: Summer of Code idea for lucene
From:Mark Miller (mark@gmail.com)
Date:Sep 13, 2008 3:35:48 pm
List:org.apache.lucene.java-dev

Cool, thanks.

Have you done any comparisons with the current scoring system? Can you claim strong improvements? Have you looked at the performance impact at all yet? That score method in termscorer looks particularly slow. Could you explain a little how your bm25 implementation differs from the current imp in a bit more depth? It looks like your booleanquery is pretty simplified compared to the old.

joaq@lsi.uned.es wrote:

Hi Mark,

thank you for your advice, I don't know too much about licenses. I have just changed the license to Apache-2.0, hope that this will be ok, and
make things easier.

If you need any help or have some comments about the implementation, please let
me know. I would be really happy if this implementation is finally integrated
into Lucene.

Mark Miller <mark@gmail.com> escribe :

Hey Joaquin,

Your work here looks very interesting. The Lucene community has shown a strong interest in this area before (see LUCENE-965).

I see you went with an lgpl license though. This might be a bit of a barrier in getting feedback from a community based on apache license software. Obviously, there still might be interest,learning, and an exchange of ideas, but none of your code can be distributed with Lucene, and so what you have done loses some of its appeal in that sense. Is there any chance you would be willing to relax the license, possibly gaining more feedback, contributors, and possible inclusion in Lucene? Certainly not necessary to receive feedback, but I think it would help -- I'd certainly be looking closer anyway.

- Mark

Joaquin Perez Iglesias wrote:

Hi all,

finally I got some time to finish the BM25/BM25F implementation for Lucene you can find more details at http://nlp.uned.es/~jperezi/Lucene-BM25/,

it has been tested but I

cannot assure that is bugs free. It would be great to receive some feedback about it.

There are some details about the implementation that I consider will be of interest,as how to calculate the average_length or idf at document level. Please if you find any bug or mistake in the supplied implementation let me know and I will try to solve it, same for questions.

Hope that some of you will find useful.

Thanks in advance.

joaq@lsi.uned.es

escribió:

Hi Otis,

as my colleague said, we have a first implementation of BM25 over Lucene, this development is part of a (almost finished) thesis project that compares different IR models, over an standard collection. At the same time we are trying to extend this first implementation in order to support BM25F for multifield queries, unfortunately at this time we are too busy to prepare a final version

of this code, so we will have to finish this code over the summer (hopefully we will have more time :-))), and make it public at this

time.

We will inform to this list when we will finish the preparation of a

final version.

Thanks to everybody for the interest!!!

Bye Joaquin

escribe :

Hi Jose,

I was wondering if you ever got to this. I would love to see and

try BM25 for Lucene!

I'm looking at http://code.google.com/soc/2008/asf/about.html and it looks like this didn't make it into GSoC, but this would

still be great to have.

Thanks, Otis

-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----

From: José Ramón Pérez Agüera <jose@gmail.com> To: java@lucene.apache.org;

Joaquin Perez-Iglesias <joaq@gmail.com>

Sent: Saturday, March 15, 2008 4:54:08 AM Subject: Re: Summer of Code idea for lucene

we have almost implemented BM25 using lucene structure, but we

need

help to finish query parser and other details. If you o

somebody want

We can send you the code and you can help us to implement the

query

parser and prepare the code to sandbox.

If there are people interested I can made a web page for the

project

and put our implementatio to download

Somebody is interested?

jose

-- José Ramón Pérez Agüera

Dept. de Ingeniería del Software e Inteligencia Artificial Despacho 411 tlf. 913947599 Facultad de Informática Universidad Complutense de Madrid

On Sat, Mar 15, 2008 at 5:32 AM, Ian Holsman wrote:

If no one objects (I don't think it's too late)

would you mind a GSOC project to implement BM25

relevancy/scoring?

To unsubscribe, e-mail: java@lucene.apache.org For additional commands, e-mail: java@lucene.apache.org

________________________________________________ Servicio WebMail de CiberUNED http://www.uned.es

To unsubscribe, e-mail: java@lucene.apache.org For additional commands, e-mail: java@lucene.apache.org

________________________________________________ Servicio WebMail de CiberUNED http://www.uned.es