5 messages in com.mysql.lists.mysqlRe: GROUP BY question
FromSent OnAttachments
Bengt Lindholm10 Feb 2004 06:23 
Brian Power10 Feb 2004 07:21 
Bengt Lindholm10 Feb 2004 08:07 
Jochem van Dieten10 Feb 2004 13:58 
Roger Baklund11 Feb 2004 08:36 
Subject:Re: GROUP BY question
From:Roger Baklund (rog@charlott.no)
Date:02/11/2004 08:36:58 AM
List:com.mysql.lists.mysql

* Bengt Lindholm

I have a table where I need to group the content on a timestamp. Any record that is less than say 5 minutes from any other record needs to be grouped with that other record.

ID timestamp 1 2004-02-02 12:00:00 2 2004-02-02 12:00:05 3 2004-02-02 12:05:20 4 2004-02-02 12:15:00 5 2004-02-02 12:17:15 6 2004-02-02 12:21:20

With this recordset I would get: 1 & 2 should be grouped since there's less than 5 minutes between the records 3 will be in a group of its own 4, 5 & 6 is a third group since any record in the group has less than 5 minutes to another record in the group

Is this possible to do in MySQL, and how is it accomplished?

You could use mysql user variables:

mysql> set @a:=NULL,@b:=0; Query OK, 0 rows affected (0.00 sec)

mysql> select if(clicktime-interval 5 minute>=@a,@b:=@b+1,@b) x, -> @a:=clicktime from stats where clicktime between -> "2003-11-24 02:00:00" and "2003-11-24 02:59:59" -> order by clicktime; +------+---------------------+ | x | @a:=clicktime | +------+---------------------+ | 0 | 2003-11-24 02:09:57 | | 0 | 2003-11-24 02:10:05 | | 0 | 2003-11-24 02:10:09 | | 0 | 2003-11-24 02:10:22 | | 0 | 2003-11-24 02:10:24 | : | 0 | 2003-11-24 02:41:42 | | 0 | 2003-11-24 02:41:43 | | 0 | 2003-11-24 02:43:12 | | 1 | 2003-11-24 02:49:19 | | 1 | 2003-11-24 02:50:46 | : | 1 | 2003-11-24 02:56:28 | | 1 | 2003-11-24 02:56:41 | | 1 | 2003-11-24 02:56:50 | +------+---------------------+ 136 rows in set (0.01 sec)

As you can see, I only get two groups with my test data, and the result seems to be correct, but when I apply a GROUP BY and a COUNT() I get a different result. I get groups of all rows within five minutes of the _first_ row of each group:

mysql> set @a:=NULL,@b:=0; Query OK, 0 rows affected (0.00 sec)

mysql> select if(clicktime-interval 5 minute>=@a,@b:=@b+1,@b) x, -> @a:=clicktime,count(*) from stats where clicktime between -> "2003-11-24 02:00:00" and "2003-11-24 02:59:59" -> group by x order by clicktime; +------+---------------------+----------+ | x | @a:=clicktime | count(*) | +------+---------------------+----------+ | 0 | 2003-11-24 02:09:57 | 22 | | 2 | 2003-11-24 02:15:17 | 29 | | 4 | 2003-11-24 02:20:55 | 17 | | 6 | 2003-11-24 02:25:55 | 22 | | 8 | 2003-11-24 02:31:12 | 4 | | 10 | 2003-11-24 02:36:19 | 21 | | 12 | 2003-11-24 02:41:42 | 3 | | 14 | 2003-11-24 02:49:19 | 13 | | 16 | 2003-11-24 02:56:12 | 5 | +------+---------------------+----------+ 9 rows in set (0.00 sec)

This seems to have something to do with how mysql handles GROUP BY queries, the @a variable is not re-assigned for each row. Note that this is a non-standard GROUP BY statement, as a column (clicktime) is used in the field list but not in the GROUP BY clause. Using a group function (MIN() in this case) on the assignment expression makes this a more standard compliant GROUP BY, and it seems to give the result we want:

mysql> set @a:=NULL,@b:=0; Query OK, 0 rows affected (0.00 sec)

mysql> select if(clicktime-interval 5 minute>=@a,@b:=@b+1,@b) x, -> min(@a:=clicktime),count(*) from stats where clicktime between -> "2003-11-24 02:00:00" and "2003-11-24 02:59:59" -> group by x order by clicktime; +------+---------------------+----------+ | x | min(@a:=clicktime) | count(*) | +------+---------------------+----------+ | 0 | 2003-11-24 02:09:57 | 118 | | 1 | 2003-11-24 02:49:19 | 18 | +------+---------------------+----------+ 2 rows in set (0.01 sec)