49 messages in com.mysql.lists.mysqlRe: Distributed Fulltext?| From | Sent On | Attachments |
|---|---|---|
| Brian DeFeyter | 07 Feb 2002 11:52 | |
| Brian DeFeyter | 07 Feb 2002 11:52 | |
| Tod Harter | 07 Feb 2002 12:40 | |
| Tod Harter | 07 Feb 2002 12:40 | |
| Brian DeFeyter | 07 Feb 2002 13:21 | |
| Brian DeFeyter | 07 Feb 2002 13:21 | |
| Brian Bray | 07 Feb 2002 14:50 | |
| Brian Bray | 07 Feb 2002 14:50 | |
| James Montebello | 07 Feb 2002 15:18 | |
| James Montebello | 07 Feb 2002 15:18 | |
| Alex Aulbach | 07 Feb 2002 17:45 | |
| Alex Aulbach | 07 Feb 2002 17:45 | |
| Amir Aliabadi | 07 Feb 2002 22:45 | |
| alec...@quantel.com | 08 Feb 2002 01:20 | |
| alec...@quantel.com | 08 Feb 2002 01:20 | |
| Steve Rapaport | 08 Feb 2002 02:11 | |
| Steve Rapaport | 08 Feb 2002 02:11 | |
| Steve Rapaport | 08 Feb 2002 02:49 | |
| Steve Rapaport | 08 Feb 2002 02:49 | |
| Brian DeFeyter | 08 Feb 2002 05:42 | |
| Brian DeFeyter | 08 Feb 2002 05:42 | |
| Steve Rapaport | 08 Feb 2002 07:05 | |
| Steve Rapaport | 08 Feb 2002 07:05 | |
| James Montebello | 08 Feb 2002 10:51 | |
| James Montebello | 08 Feb 2002 10:51 | |
| Steve Rapaport | 08 Feb 2002 11:47 | |
| Steve Rapaport | 08 Feb 2002 11:47 | |
| George M. Ellenburg | 10 Feb 2002 03:59 | |
| George M. Ellenburg | 10 Feb 2002 03:59 | |
| Steve Rapaport | 10 Feb 2002 04:04 | |
| Steve Rapaport | 10 Feb 2002 04:04 | |
| David Axmark | 11 Feb 2002 23:32 | |
| Steve Rapaport | 12 Feb 2002 06:38 | |
| Mike Wexler | 12 Feb 2002 13:55 | |
| David Axmark | 12 Feb 2002 18:26 | |
| Steven Roussey | 12 Feb 2002 20:26 | |
| Mike Wexler | 13 Feb 2002 09:08 | |
| Mike Wexler | 13 Feb 2002 09:27 | |
| Brian DeFeyter | 13 Feb 2002 12:34 | |
| Mike Wexler | 13 Feb 2002 13:39 | |
| Brian DeFeyter | 13 Feb 2002 13:46 | |
| Mike Wexler | 13 Feb 2002 14:02 | |
| hoo...@rave.iinet.net.au | 13 Feb 2002 15:28 | |
| hoo...@rave.iinet.net.au | 13 Feb 2002 15:49 | |
| Steven Roussey | 13 Feb 2002 18:44 | |
| Alex Aulbach | 14 Feb 2002 16:31 | |
| Alex Aulbach | 14 Feb 2002 16:51 | |
| Alex Aulbach | 14 Feb 2002 17:43 | |
| David Axmark | 15 Feb 2002 06:32 |
| Subject: | Re: Distributed Fulltext?![]() |
|---|---|
| From: | Mike Wexler (mwex...@tias.com) |
| Date: | 02/13/2002 09:08:53 AM |
| List: | com.mysql.lists.mysql |
My understanding is that part of how google and Altavista get such high speeds is to keep everything in memory. Is it possible to create a HEAP table with a full text index? If so, does the full text index take advantage of being in memory? For example, I would imagine that if you were keeping the whole index in memory, details like the index page size, and the format of the pointers/record numbers would be different.
Then you could just do something roughly like (i know the syntax is a little
off)
CREATE HEAP TABLE fooFast SELECT * FROM fooSlow ALTER fooFast ADD fulltext(a, b, c)
Or maybe you could just have fooSlow on one server. And then have it replicated on N other servers. But on the other servers you could alter the table type so it was a heap table. So you would have one persistent table and a bunch of replicated heap tables. And all the search go could against the heap tables.
Brian Bray wrote:
It seems to me like the best solution that could be implemented as-is would be to keep a random int column in your table (with a range of say 1-100) and then have fulltext server 1 psudo-replicate records with a the random number in the range of 1-10, server 2 11-20 and server 3 21-30 and so on.
Then run your query on all 10 servers and merge the result sets and possibly re-sort them if you use the score column.
The problem with splitting the index up by word is that is messes up all your scoring and ranking. For example what if you search using 5 keywords, all starting with letters from different groups? Your going to get pretty bad score for each match, and it could totally break boolean searches.
-- Brian Bray
Brian DeFeyter wrote:
On Thu, 2002-02-07 at 15:40, Tod Harter wrote: [snip]
Wouldn't be too tough to write a little query routing system if you are using perl. Use DBD::Proxy on the web server side, and just hack the perl proxy server so it routes the query to several places and returns a single result set. Ordering could be achieved as well. I'm sure there are commercial packages out there as well. I don't see why the individual database servers would need to do anything special.
[snip]
If I'm understanding you correctly, I think you're refering to routing based on the first character of the word. That would work for cases where the query is searching for a word that begins with a certain character.. however fulltext searches also return results with the term in the middle.
ie: a search for 'foo' could return: foo.txt foobar
but also could return: thisisfoo that_is_foolish
I could be wrong, but it's my understanding that MySQL stores it's fulltext index based on all the 'unique words' found. For such a system as you mentioned above, you'd probably have to create your own fulltext indexing system to determine: a) where to store the data 'segments' and b) how to route queries. It seems like this could probably be done much more efficiently inside of the server.
- Brian
--------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <mysq...@lists.mysql.com> To unsubscribe, e-mail <mysql-unsubscribe-bbray=xmis...@lists.mysql.com> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
--------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <mysq...@lists.mysql.com> To unsubscribe, e-mail <mysql-unsubscribe-cyon=best...@lists.mysql.com> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
--------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <mysq...@lists.mysql.com> To unsubscribe, e-mail <mysql-unsubscribe-mwexler=tias...@lists.mysql.com> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




