07月04, 2022
收藏本站

mysql分组取第一条

好久没写文章了,废话不多说,直接上。


一、问题

今天有个同事咨询怎么分组取出时间最早的一条数据,他也知道先子查询按时间排序,再嵌套一层group by,可以取出最早一条记录。但他的SQL是这样子的↓

root bjh_wish>select * from (SELECT x.id wishId,x.buyer_id userId,x.wish_date wishDate FROM wish_info x
    ->     WHERE x.buyer_id in ('100000', '165904')
    ->     and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY'
    ->     order by x.wish_date asc) a group by a.userId;
+---------------------+--------+----------+
| wishId              | userId | wishDate |
+---------------------+--------+----------+
| 1511914153873133568 | 100000 | 05-07    |
| 1517037810018570240 | 165904 | 09-21    |
+---------------------+--------+----------+
2 rows in set (0.00 sec)

结果是错的。我们看下里面的子查询, image.png

从图中可以看出,userId=100000,最小wishDate应是01-20;userId=165904,最小wishDate应是01-01。 为什么有问题呢?explain看下。 4551c2b8acb9c90760bcf285656e0a2.png

被mysql优化器优化了!
从show warnings结果可以看出子查询被mysql优化器优化成一条SQL,因此最终是按group by buyer_id的顺序排序取对应第一条。

二、解决

那如何避免“被优化”呢?子查询加个distinct

root xxxx>explain select * from (SELECT distinct x.id wishId,x.buyer_id userId,x.wish_date wishDate FROM wish_info x     WHERE x.buyer_id in ('100000', '165904')     and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY'     order by x.wish_date asc) a group by a.userId;
+----+-------------+------------+------------+-------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+------------+------------+-------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL |   25 |   100.00 | Using temporary; Using filesort                    |
|  2 | DERIVED     | x          | NULL       | range | idx_user_id   | idx_user_id | 99      | NULL |  346 |     7.29 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------------+-------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

对应show warnings结果
image.png

查询下

root bjh_wish>select * from (SELECT distinct x.id wishId,x.buyer_id userId,x.wish_date wishDate FROM wish_info x     WHERE x.buyer_id in ('100000', '165904')     and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY'     order by x.wish_date asc) a group by a.userId;
+---------------------+--------+----------+
| wishId              | userId | wishDate |
+---------------------+--------+----------+
| 1513796891244908544 | 100000 | 01-20    |
| 187350              | 165904 | 01-01    |
+---------------------+--------+----------+
2 rows in set (0.00 sec)

结果符合预期。

###·三、建议 然而我还是不建议使用上面SQL,靠排序取第一条,因为存在不确定性(假如数据库换成oracle了呢)。所以还是建议先group by取min再inner join方式,SQL如下:

select t2.id wishId,t2.buyer_id userId,t2.wish_date wishDate from 
(SELECT x.buyer_id userId,substring_index(group_concat(id order by wish_date),',',1) as mid
FROM wish_info x
    WHERE x.buyer_id in ('100000', '165904')
    and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY'
    group by x.buyer_id
)t1 inner join wish_info t2 on t1.mid=t2.id

里面有个取巧的方式substring_index(group_concat(id order by wish_date),',',1),可以好好研究


Comments