8 messages in com.mysql.lists.mysqlRe: Re[2]: explain & optimize
FromSent OnAttachments
Alexander Sergeev09 Dec 1999 09:38 
sin...@mysql.com10 Dec 1999 04:20 
Alexander Sergeev11 Dec 1999 06:55 
sin...@mysql.com11 Dec 1999 08:29 
Philips13 Dec 1999 02:30 
sin...@mysql.com13 Dec 1999 05:10 
Philips14 Dec 1999 01:31 
sin...@mysql.com14 Dec 1999 08:02 
Subject:Re: Re[2]: explain & optimize
From:sin...@mysql.com (sin@mysql.com)
Date:12/11/1999 08:29:10 AM
List:com.mysql.lists.mysql

Alexander Sergeev writes:

Hello Sinisa, you wrote:

smc> I was not able to perceive difference between two queries. Are you smc> sure you have not sent two copies of a single query. sorry. smc> Regarding using word indexex column, you should now that there is no smc> way of using indices with REGEXP. ye, that's right. But, why we can see 'const' in the 'explain' query (in the second case) ??

smc> Last, but not least, it is impossible to make auto_increment as a smc> non-unique key. How have you done it ?? It was stupid mistake. It was corrected.

So, some changes were made due to the incredible book "SQL For Smarties". But nevertheless the problem still exists.

mysql> describe dict; +--------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+---------+------+-----+---------+-------+ | url_id | int(11) | | MUL | 0 | | | word | int(10) | | MUL | 0 | | | intag | int(5) | | MUL | 0 | | +--------+---------+------+-----+---------+-------+ mysql> describe tr; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | word | varchar(40) | | PRI | | | | code | int(10) | | PRI | 0 | | +-------+-------------+------+-----+---------+-------+

mysql> show index from dict; | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
Cardinality | Sub_part | | dict | 1 | word | 1 | word | A | NULL
| NULL | | dict | 1 | url_id | 1 | url_id | A | NULL
| NULL | | dict | 1 | intag | 1 | intag | A | NULL
| NULL |

mysql> show index from tr; | tr | 0 | PRIMARY | 1 | word | A | 27286
| NULL | | tr | 0 | PRIMARY | 2 | code | A | 27286
| NULL | | tr | 1 | code | 1 | code | A | NULL
| NULL |

<dict.url_id> - id of the headline; <dict.word> - code of the word from the headline; <dict.intag> - position of the word in the headline; <tr.word> - the word from the headline; <tr.code> - code of the word;

some dumps:

dict tr +--------+------+-------+ +----------+------+ | url_id | word | intag | | word | code | +--------+------+-------+ +----------+------+ | 5035 | 899 | 274 | | Burns | 1116 | | 5035 | 448 | 273 | | McCrae | 1115 | | 5035 | 45 | 272 | | doubts | 1114 | | 5035 | 156 | 271 | | athlete' | 1113 | | 5035 | 786 | 270 | | killed | 1112 | | 5035 | 8684 | 269 | | 'may | 1111 | | 5035 | 51 | 268 | | reaction | 1110 | | 5035 | 447 | 267 | | Drug | 1109 | | 5035 | 3452 | 266 | | defence | 1108 | | 5035 | 592 | 265 | | spring | 1107 | +--------+------+-------+ +----------+------+ Here are some examples of queries I need to run (such queries are generated dynamically when the end user input the string for searching among headlines):

I) query: (this query is for searching phrase "japan tr* to*")

SELECT distinct h_0.url_id FROM tr as t_0 , tr as t_1, tr as t_2, dict as h_0, dict as h_1, dict as h_2 WHERE (h_1.intag=h_0.intag+1) AND (h_2.intag=h_1.intag+1) AND (h_0.url_id = h_1.url_id) AND (h_0.url_id = h_2.url_id) AND (t_0.code = h_0.word) AND (t_1.code = h_1.word) AND (t_2.code = h_2.word) AND !ISNULL(t_0.word) AND !ISNULL(t_1.word) AND !ISNULL(t_2.word) AND ( t_0.word='japan' && UCASE(t_1.word) REGEXP '^TRI.*' && UCASE(t_2.word) REGEXP '^TO.*' ) 'explain' gives us: table type possible_keys key key_len ref rows Extra t_0 range PRIMARY,code PRIMARY NULL NULL 1 h_0 ref word,url_id word 4 t_0.code 13 h_1 ref word,url_id,intag url_id 4 h_0.url_id 13
where used h_2 ref word,url_id,intag url_id 4 h_0.url_id 13
where used t_2 ref code code 4 h_2.word 19 where used t_1 ref code code 4 h_1.word 19 where used

II) query: (this query is for searching phrases "japan || tr* || to*")

SELECT distinct h_0.url_id FROM tr as t_0 , tr as t_1, tr as t_2, dict as h_0, dict as h_1, dict as h_2 WHERE (h_1.intag=h_0.intag+1) AND (h_2.intag=h_1.intag+1) AND (h_0.url_id = h_1.url_id) AND (h_0.url_id = h_2.url_id) AND (t_0.code = h_0.word) AND (t_1.code = h_1.word) AND (t_2.code = h_2.word) AND !ISNULL(t_0.word) AND !ISNULL(t_1.word) AND !ISNULL(t_2.word) AND ( t_0.word='japan' || UCASE(t_1.word) REGEXP '^TRI.*' || UCASE(t_2.word) REGEXP '^TO.*' )

'explain' gives us: t_0 ALL PRIMARY,code NULL NULL NULL 27286 h_0 ref word,url_id word 4 t_0.code 13 h_1 ref word,url_id,intag url_id 4 h_0.url_id 13
where used h_2 ref word,url_id,intag url_id 4 h_0.url_id 13
where used t_2 ref code code 4 h_2.word 19

The question have changed a bit. How can I optimize the second query ?? Should I use decomposition of dict ?

Some info about gathering headelines: they are flattened into table "News" (Id, url, title,category,date,header, etc.). Also the headlines comes to the indexer which parses each header: splits it into atoms (words,punctuation, etc), and inserts into tr (if its required - new word) && finally inserts all tokens (lineary) (words,etc) into dict for futher using.

Size of table <dict> is approx. 2Mb. rows... The middle size of tr is about 90,000 rows.

ps. sorry for english, i'm russian. pss. appreciate your advices twicely if they will be fast , f.e. using icq uin: 44907890

Hi!

As h_? all point to the same table, as well as t_? also all point to the second table, then

(h_1.intag=h_0.intag+1) AND (h_2.intag=h_1.intag+1) AND (h_0.url_id = h_1.url_id) AND (h_0.url_id = h_2.url_id) AND (t_0.code = h_0.word) AND (t_1.code = h_1.word) AND

should work as well as

(h_1.intag=h_0.intag+1) AND (h_0.url_id = h_1.url_id) AND (t_0.code = h_0.word) AND

while !ISNULL(t_0.word) AND !ISNULL(t_1.word) AND !ISNULL(t_2.word) AND

is obsolte.

Last but not least, t_0.word='japan' should be made a first expression in a WHERE clause, but for a first query only !!

Sinisa

+----------------------------------------------------------------------+ | TcX ____ __ _____ _____ ___ == mys@tcx.se | | /*/\*\/\*\ /*/ \*\ /*/ \*\ |*| Sinisa Milivojevic | | /*/ /*/ /*/ \*\_ |*| |*||*| mailto:sin@cytanet.com.cy| | /*/ /*/ /*/\*\/*/ \*\|*| |*||*| Larnaka, Cyprus | | /*/ /*/ /*/\*\_/*/ \*\_/*/ |*|____ | | ^^^^^^^^^^^^/*/^^^^^^^^^^^\*\^^^^^^^^^^^ | | /*/ \*\ Developers Team | +----------------------------------------------------------------------+