5 messages in com.mysql.lists.mysqlRe: GROUP BY question| From | Sent On | Attachments |
|---|---|---|
| Bengt Lindholm | 10 Feb 2004 06:23 | |
| Brian Power | 10 Feb 2004 07:21 | |
| Bengt Lindholm | 10 Feb 2004 08:07 | |
| Jochem van Dieten | 10 Feb 2004 13:58 | |
| Roger Baklund | 11 Feb 2004 08:36 |
| Subject: | Re: GROUP BY question![]() |
|---|---|
| From: | Roger Baklund (rog...@charlott.no) |
| Date: | 02/11/2004 08:36:58 AM |
| List: | com.mysql.lists.mysql |
* Bengt Lindholm
I have a table where I need to group the content on a timestamp. Any record that is less than say 5 minutes from any other record needs to be grouped with that other record.
ID timestamp 1 2004-02-02 12:00:00 2 2004-02-02 12:00:05 3 2004-02-02 12:05:20 4 2004-02-02 12:15:00 5 2004-02-02 12:17:15 6 2004-02-02 12:21:20
With this recordset I would get: 1 & 2 should be grouped since there's less than 5 minutes between the records 3 will be in a group of its own 4, 5 & 6 is a third group since any record in the group has less than 5 minutes to another record in the group
Is this possible to do in MySQL, and how is it accomplished?
You could use mysql user variables:
mysql> set @a:=NULL,@b:=0; Query OK, 0 rows affected (0.00 sec)
mysql> select if(clicktime-interval 5 minute>=@a,@b:=@b+1,@b) x, -> @a:=clicktime from stats where clicktime between -> "2003-11-24 02:00:00" and "2003-11-24 02:59:59" -> order by clicktime; +------+---------------------+ | x | @a:=clicktime | +------+---------------------+ | 0 | 2003-11-24 02:09:57 | | 0 | 2003-11-24 02:10:05 | | 0 | 2003-11-24 02:10:09 | | 0 | 2003-11-24 02:10:22 | | 0 | 2003-11-24 02:10:24 | : | 0 | 2003-11-24 02:41:42 | | 0 | 2003-11-24 02:41:43 | | 0 | 2003-11-24 02:43:12 | | 1 | 2003-11-24 02:49:19 | | 1 | 2003-11-24 02:50:46 | : | 1 | 2003-11-24 02:56:28 | | 1 | 2003-11-24 02:56:41 | | 1 | 2003-11-24 02:56:50 | +------+---------------------+ 136 rows in set (0.01 sec)
As you can see, I only get two groups with my test data, and the result seems to be correct, but when I apply a GROUP BY and a COUNT() I get a different result. I get groups of all rows within five minutes of the _first_ row of each group:
mysql> set @a:=NULL,@b:=0; Query OK, 0 rows affected (0.00 sec)
mysql> select if(clicktime-interval 5 minute>=@a,@b:=@b+1,@b) x, -> @a:=clicktime,count(*) from stats where clicktime between -> "2003-11-24 02:00:00" and "2003-11-24 02:59:59" -> group by x order by clicktime; +------+---------------------+----------+ | x | @a:=clicktime | count(*) | +------+---------------------+----------+ | 0 | 2003-11-24 02:09:57 | 22 | | 2 | 2003-11-24 02:15:17 | 29 | | 4 | 2003-11-24 02:20:55 | 17 | | 6 | 2003-11-24 02:25:55 | 22 | | 8 | 2003-11-24 02:31:12 | 4 | | 10 | 2003-11-24 02:36:19 | 21 | | 12 | 2003-11-24 02:41:42 | 3 | | 14 | 2003-11-24 02:49:19 | 13 | | 16 | 2003-11-24 02:56:12 | 5 | +------+---------------------+----------+ 9 rows in set (0.00 sec)
This seems to have something to do with how mysql handles GROUP BY queries, the @a variable is not re-assigned for each row. Note that this is a non-standard GROUP BY statement, as a column (clicktime) is used in the field list but not in the GROUP BY clause. Using a group function (MIN() in this case) on the assignment expression makes this a more standard compliant GROUP BY, and it seems to give the result we want:
mysql> set @a:=NULL,@b:=0; Query OK, 0 rows affected (0.00 sec)
mysql> select if(clicktime-interval 5 minute>=@a,@b:=@b+1,@b) x, -> min(@a:=clicktime),count(*) from stats where clicktime between -> "2003-11-24 02:00:00" and "2003-11-24 02:59:59" -> group by x order by clicktime; +------+---------------------+----------+ | x | min(@a:=clicktime) | count(*) | +------+---------------------+----------+ | 0 | 2003-11-24 02:09:57 | 118 | | 1 | 2003-11-24 02:49:19 | 18 | +------+---------------------+----------+ 2 rows in set (0.01 sec)
-- Roger




