49 messages in com.mysql.lists.mysqlRe: Distributed Fulltext?
FromSent OnAttachments
Brian DeFeyter07 Feb 2002 11:52 
Brian DeFeyter07 Feb 2002 11:52 
Tod Harter07 Feb 2002 12:40 
Tod Harter07 Feb 2002 12:40 
Brian DeFeyter07 Feb 2002 13:21 
Brian DeFeyter07 Feb 2002 13:21 
Brian Bray07 Feb 2002 14:50 
Brian Bray07 Feb 2002 14:50 
James Montebello07 Feb 2002 15:18 
James Montebello07 Feb 2002 15:18 
Alex Aulbach07 Feb 2002 17:45 
Alex Aulbach07 Feb 2002 17:45 
Amir Aliabadi07 Feb 2002 22:45 
alec...@quantel.com08 Feb 2002 01:20 
alec...@quantel.com08 Feb 2002 01:20 
Steve Rapaport08 Feb 2002 02:11 
Steve Rapaport08 Feb 2002 02:11 
Steve Rapaport08 Feb 2002 02:49 
Steve Rapaport08 Feb 2002 02:49 
Brian DeFeyter08 Feb 2002 05:42 
Brian DeFeyter08 Feb 2002 05:42 
Steve Rapaport08 Feb 2002 07:05 
Steve Rapaport08 Feb 2002 07:05 
James Montebello08 Feb 2002 10:51 
James Montebello08 Feb 2002 10:51 
Steve Rapaport08 Feb 2002 11:47 
Steve Rapaport08 Feb 2002 11:47 
George M. Ellenburg10 Feb 2002 03:59 
George M. Ellenburg10 Feb 2002 03:59 
Steve Rapaport10 Feb 2002 04:04 
Steve Rapaport10 Feb 2002 04:04 
David Axmark11 Feb 2002 23:32 
Steve Rapaport12 Feb 2002 06:38 
Mike Wexler12 Feb 2002 13:55 
David Axmark12 Feb 2002 18:26 
Steven Roussey12 Feb 2002 20:26 
Mike Wexler13 Feb 2002 09:08 
Mike Wexler13 Feb 2002 09:27 
Brian DeFeyter13 Feb 2002 12:34 
Mike Wexler13 Feb 2002 13:39 
Brian DeFeyter13 Feb 2002 13:46 
Mike Wexler13 Feb 2002 14:02 
hoo...@rave.iinet.net.au13 Feb 2002 15:28 
hoo...@rave.iinet.net.au13 Feb 2002 15:49 
Steven Roussey13 Feb 2002 18:44 
Alex Aulbach14 Feb 2002 16:31 
Alex Aulbach14 Feb 2002 16:51 
Alex Aulbach14 Feb 2002 17:43 
David Axmark15 Feb 2002 06:32 
Subject:Re: Distributed Fulltext?
From:Mike Wexler (mwex@tias.com)
Date:02/13/2002 09:08:53 AM
List:com.mysql.lists.mysql

My understanding is that part of how google and Altavista get such high speeds is to keep everything in memory. Is it possible to create a HEAP table with a full text index? If so, does the full text index take advantage of being in memory? For example, I would imagine that if you were keeping the whole index in memory, details like the index page size, and the format of the pointers/record numbers would be different.

Then you could just do something roughly like (i know the syntax is a little
off)

CREATE HEAP TABLE fooFast SELECT * FROM fooSlow ALTER fooFast ADD fulltext(a, b, c)

Or maybe you could just have fooSlow on one server. And then have it replicated on N other servers. But on the other servers you could alter the table type so it was a heap table. So you would have one persistent table and a bunch of replicated heap tables. And all the search go could against the heap tables.

Brian Bray wrote:

It seems to me like the best solution that could be implemented as-is would be to keep a random int column in your table (with a range of say 1-100) and then have fulltext server 1 psudo-replicate records with a the random number in the range of 1-10, server 2 11-20 and server 3 21-30 and so on.

Then run your query on all 10 servers and merge the result sets and possibly re-sort them if you use the score column.

The problem with splitting the index up by word is that is messes up all your scoring and ranking. For example what if you search using 5 keywords, all starting with letters from different groups? Your going to get pretty bad score for each match, and it could totally break boolean searches.

-- Brian Bray

Brian DeFeyter wrote:

On Thu, 2002-02-07 at 15:40, Tod Harter wrote: [snip]

Wouldn't be too tough to write a little query routing system if you are using perl. Use DBD::Proxy on the web server side, and just hack the perl proxy server so it routes the query to several places and returns a single result set. Ordering could be achieved as well. I'm sure there are commercial packages out there as well. I don't see why the individual database servers would need to do anything special.

[snip]

If I'm understanding you correctly, I think you're refering to routing based on the first character of the word. That would work for cases where the query is searching for a word that begins with a certain character.. however fulltext searches also return results with the term in the middle.

ie: a search for 'foo' could return: foo.txt foobar

but also could return: thisisfoo that_is_foolish

I could be wrong, but it's my understanding that MySQL stores it's fulltext index based on all the 'unique words' found. For such a system as you mentioned above, you'd probably have to create your own fulltext indexing system to determine: a) where to store the data 'segments' and b) how to route queries. It seems like this could probably be done much more efficiently inside of the server.

- Brian

To request this thread, e-mail <mysq@lists.mysql.com> To unsubscribe, e-mail <mysql-unsubscribe-bbray=xmis@lists.mysql.com> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

--------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive)

To request this thread, e-mail <mysq@lists.mysql.com> To unsubscribe, e-mail <mysql-unsubscribe-cyon=best@lists.mysql.com> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

To request this thread, e-mail <mysq@lists.mysql.com> To unsubscribe, e-mail <mysql-unsubscribe-mwexler=tias@lists.mysql.com> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php