17 messages in org.apache.lucene.java-userRe: Wikia search goes live today
FromSent OnAttachments
Lukas VlcekJan 7, 2008 4:48 am 
Grant IngersollJan 7, 2008 5:13 am 
Grant IngersollJan 7, 2008 8:21 am 
Otis GospodneticJan 7, 2008 2:14 pm 
Lukas VlcekJan 7, 2008 11:48 pm 
Lukas VlcekJan 7, 2008 11:54 pm 
Grant IngersollJan 8, 2008 4:46 am 
Mike KlaasJan 8, 2008 11:59 am 
Dennis KubesJan 8, 2008 12:09 pm 
Michael StoppelmanJan 8, 2008 12:11 pm 
Lukas VlcekJan 8, 2008 12:15 pm 
Andrzej BialeckiJan 8, 2008 12:23 pm 
Ryan McKinleyJan 8, 2008 12:31 pm 
Lukas VlcekJan 8, 2008 12:36 pm 
Lukas VlcekJan 8, 2008 12:38 pm 
Andrzej BialeckiJan 8, 2008 2:23 pm 
Dennis KubesJan 8, 2008 2:53 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: Wikia search goes live todayActions...
From:Lukas Vlcek (luka@gmail.com)
Date:Jan 8, 2008 12:38:11 pm
List:org.apache.lucene.java-user

I should note that this technique is probably not easily applicable to current Lucene scoring mechanism without additional development.

After checking the Lucene API of ParallelReader it seems that the star score could be stored in different index which shares the same identifier for the documents. Such index could be small (partitioned to many small indices?) so the updates can be fast. Is that what you meant Andrzej? ;-)

Anyway, I remember different technique which I once mentioned in Lucene mail list taking inspiration from book called Programming Collective Intelligence <http://www.oreilly.com/catalog/9780596529321/> . The idea is not to store score (may be I should call it user preference) into index but into neural net. One useful side effect is that this technique could score reasonably even document without any stars (meaning "similar" document to highly started documents could score better even if they haven't been stared by any user yet).

Regards, Lukas

On 1/8/08, Andrzej Bialecki <ab@getopt.org> wrote:

Lukas Vlcek wrote:

So staring will be accommodated only during indexing phase. Does it mean it will be pretty static value not a dynamically changing variable... correct? In other words if I add my starts to some document it won't affect the

scoring immediately but after indexing cycle. Correct?

(I'm not involved in Wikia development). There are some ways to go about it even in the pure Lucene-land, so that the updates are fast without reindexing the main content. Hint: ParallelReader.