8 messages in com.mysql.lists.mysqlRe: Re[2]: explain & optimize| From | Sent On | Attachments |
|---|---|---|
| Alexander Sergeev | 09 Dec 1999 09:38 | |
| sin...@mysql.com | 10 Dec 1999 04:20 | |
| Alexander Sergeev | 11 Dec 1999 06:55 | |
| sin...@mysql.com | 11 Dec 1999 08:29 | |
| Philips | 13 Dec 1999 02:30 | |
| sin...@mysql.com | 13 Dec 1999 05:10 | |
| Philips | 14 Dec 1999 01:31 | |
| sin...@mysql.com | 14 Dec 1999 08:02 |
| Subject: | Re: Re[2]: explain & optimize![]() |
|---|---|
| From: | sin...@mysql.com (sin...@mysql.com) |
| Date: | 12/11/1999 08:29:10 AM |
| List: | com.mysql.lists.mysql |
Alexander Sergeev writes:
Hello Sinisa, you wrote:
smc> I was not able to perceive difference between two queries. Are you smc> sure you have not sent two copies of a single query. sorry. smc> Regarding using word indexex column, you should now that there is no smc> way of using indices with REGEXP. ye, that's right. But, why we can see 'const' in the 'explain' query (in the second case) ??
smc> Last, but not least, it is impossible to make auto_increment as a smc> non-unique key. How have you done it ?? It was stupid mistake. It was corrected.
So, some changes were made due to the incredible book "SQL For Smarties". But nevertheless the problem still exists.
mysql> describe dict; +--------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+---------+------+-----+---------+-------+ | url_id | int(11) | | MUL | 0 | | | word | int(10) | | MUL | 0 | | | intag | int(5) | | MUL | 0 | | +--------+---------+------+-----+---------+-------+ mysql> describe tr; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | word | varchar(40) | | PRI | | | | code | int(10) | | PRI | 0 | | +-------+-------------+------+-----+---------+-------+
mysql> show index from dict;
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
Cardinality | Sub_part |
| dict | 1 | word | 1 | word | A | NULL
| NULL |
| dict | 1 | url_id | 1 | url_id | A | NULL
| NULL |
| dict | 1 | intag | 1 | intag | A | NULL
| NULL |
mysql> show index from tr;
| tr | 0 | PRIMARY | 1 | word | A | 27286
| NULL |
| tr | 0 | PRIMARY | 2 | code | A | 27286
| NULL |
| tr | 1 | code | 1 | code | A | NULL
| NULL |
<dict.url_id> - id of the headline; <dict.word> - code of the word from the headline; <dict.intag> - position of the word in the headline; <tr.word> - the word from the headline; <tr.code> - code of the word;
some dumps:
dict tr +--------+------+-------+ +----------+------+ | url_id | word | intag | | word | code | +--------+------+-------+ +----------+------+ | 5035 | 899 | 274 | | Burns | 1116 | | 5035 | 448 | 273 | | McCrae | 1115 | | 5035 | 45 | 272 | | doubts | 1114 | | 5035 | 156 | 271 | | athlete' | 1113 | | 5035 | 786 | 270 | | killed | 1112 | | 5035 | 8684 | 269 | | 'may | 1111 | | 5035 | 51 | 268 | | reaction | 1110 | | 5035 | 447 | 267 | | Drug | 1109 | | 5035 | 3452 | 266 | | defence | 1108 | | 5035 | 592 | 265 | | spring | 1107 | +--------+------+-------+ +----------+------+ Here are some examples of queries I need to run (such queries are generated dynamically when the end user input the string for searching among headlines):
I) query: (this query is for searching phrase "japan tr* to*")
SELECT distinct h_0.url_id
FROM
tr as t_0 ,
tr as t_1,
tr as t_2,
dict as h_0,
dict as h_1,
dict as h_2
WHERE
(h_1.intag=h_0.intag+1) AND
(h_2.intag=h_1.intag+1) AND
(h_0.url_id = h_1.url_id) AND
(h_0.url_id = h_2.url_id) AND
(t_0.code = h_0.word) AND
(t_1.code = h_1.word) AND
(t_2.code = h_2.word) AND
!ISNULL(t_0.word) AND
!ISNULL(t_1.word) AND
!ISNULL(t_2.word) AND
(
t_0.word='japan' &&
UCASE(t_1.word) REGEXP '^TRI.*' &&
UCASE(t_2.word) REGEXP '^TO.*'
)
'explain' gives us:
table type possible_keys key key_len ref rows Extra
t_0 range PRIMARY,code PRIMARY NULL NULL 1
h_0 ref word,url_id word 4 t_0.code 13
h_1 ref word,url_id,intag url_id 4 h_0.url_id 13
where used
h_2 ref word,url_id,intag url_id 4 h_0.url_id 13
where used
t_2 ref code code 4 h_2.word 19 where used
t_1 ref code code 4 h_1.word 19 where used
II) query: (this query is for searching phrases "japan || tr* || to*")
SELECT distinct h_0.url_id FROM tr as t_0 , tr as t_1, tr as t_2, dict as h_0, dict as h_1, dict as h_2 WHERE (h_1.intag=h_0.intag+1) AND (h_2.intag=h_1.intag+1) AND (h_0.url_id = h_1.url_id) AND (h_0.url_id = h_2.url_id) AND (t_0.code = h_0.word) AND (t_1.code = h_1.word) AND (t_2.code = h_2.word) AND !ISNULL(t_0.word) AND !ISNULL(t_1.word) AND !ISNULL(t_2.word) AND ( t_0.word='japan' || UCASE(t_1.word) REGEXP '^TRI.*' || UCASE(t_2.word) REGEXP '^TO.*' )
'explain' gives us:
t_0 ALL PRIMARY,code NULL NULL NULL 27286
h_0 ref word,url_id word 4 t_0.code 13
h_1 ref word,url_id,intag url_id 4 h_0.url_id 13
where used
h_2 ref word,url_id,intag url_id 4 h_0.url_id 13
where used
t_2 ref code code 4 h_2.word 19
The question have changed a bit. How can I optimize the second query ?? Should I use decomposition of dict ?
Some info about gathering headelines: they are flattened into table "News" (Id, url, title,category,date,header, etc.). Also the headlines comes to the indexer which parses each header: splits it into atoms (words,punctuation, etc), and inserts into tr (if its required - new word) && finally inserts all tokens (lineary) (words,etc) into dict for futher using.
Size of table <dict> is approx. 2Mb. rows... The middle size of tr is about 90,000 rows.
ps. sorry for english, i'm russian. pss. appreciate your advices twicely if they will be fast , f.e. using icq uin: 44907890
Best regards, Alexander mailto:left...@softhome.net
Hi!
As h_? all point to the same table, as well as t_? also all point to the second table, then
(h_1.intag=h_0.intag+1) AND (h_2.intag=h_1.intag+1) AND (h_0.url_id = h_1.url_id) AND (h_0.url_id = h_2.url_id) AND (t_0.code = h_0.word) AND (t_1.code = h_1.word) AND
should work as well as
(h_1.intag=h_0.intag+1) AND (h_0.url_id = h_1.url_id) AND (t_0.code = h_0.word) AND
while !ISNULL(t_0.word) AND !ISNULL(t_1.word) AND !ISNULL(t_2.word) AND
is obsolte.
Last but not least, t_0.word='japan' should be made a first expression in a WHERE clause, but for a first query only !!
Sinisa
+----------------------------------------------------------------------+ | TcX ____ __ _____ _____ ___ == mys...@tcx.se | | /*/\*\/\*\ /*/ \*\ /*/ \*\ |*| Sinisa Milivojevic | | /*/ /*/ /*/ \*\_ |*| |*||*| mailto:sin...@cytanet.com.cy| | /*/ /*/ /*/\*\/*/ \*\|*| |*||*| Larnaka, Cyprus | | /*/ /*/ /*/\*\_/*/ \*\_/*/ |*|____ | | ^^^^^^^^^^^^/*/^^^^^^^^^^^\*\^^^^^^^^^^^ | | /*/ \*\ Developers Team | +----------------------------------------------------------------------+




