好久没写文章了,废话不多说,直接上。
一、问题
今天有个同事咨询怎么分组取出时间最早的一条数据,他也知道先子查询按时间排序,再嵌套一层group by,可以取出最早一条记录。但他的SQL是这样子的↓
root bjh_wish>select * from (SELECT x.id wishId,x.buyer_id userId,x.wish_date wishDate FROM wish_info x
-> WHERE x.buyer_id in ('100000', '165904')
-> and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY'
-> order by x.wish_date asc) a group by a.userId;
+---------------------+--------+----------+
| wishId | userId | wishDate |
+---------------------+--------+----------+
| 1511914153873133568 | 100000 | 05-07 |
| 1517037810018570240 | 165904 | 09-21 |
+---------------------+--------+----------+
2 rows in set (0.00 sec)
结果是错的。我们看下里面的子查询,
从图中可以看出,userId=100000,最小wishDate应是01-20;userId=165904,最小wishDate应是01-01。 为什么有问题呢?explain看下。
被mysql优化器优化了!
从show warnings结果可以看出子查询被mysql优化器优化成一条SQL,因此最终是按group by buyer_id的顺序排序取对应第一条。
二、解决
那如何避免“被优化”呢?子查询加个distinct
root xxxx>explain select * from (SELECT distinct x.id wishId,x.buyer_id userId,x.wish_date wishDate FROM wish_info x WHERE x.buyer_id in ('100000', '165904') and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY' order by x.wish_date asc) a group by a.userId;
+----+-------------+------------+------------+-------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 25 | 100.00 | Using temporary; Using filesort |
| 2 | DERIVED | x | NULL | range | idx_user_id | idx_user_id | 99 | NULL | 346 | 7.29 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------------+-------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
对应show warnings结果
查询下
root bjh_wish>select * from (SELECT distinct x.id wishId,x.buyer_id userId,x.wish_date wishDate FROM wish_info x WHERE x.buyer_id in ('100000', '165904') and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY' order by x.wish_date asc) a group by a.userId;
+---------------------+--------+----------+
| wishId | userId | wishDate |
+---------------------+--------+----------+
| 1513796891244908544 | 100000 | 01-20 |
| 187350 | 165904 | 01-01 |
+---------------------+--------+----------+
2 rows in set (0.00 sec)
结果符合预期。
###·三、建议 然而我还是不建议使用上面SQL,靠排序取第一条,因为存在不确定性(假如数据库换成oracle了呢)。所以还是建议先group by取min再inner join方式,SQL如下:
select t2.id wishId,t2.buyer_id userId,t2.wish_date wishDate from
(SELECT x.buyer_id userId,substring_index(group_concat(id order by wish_date),',',1) as mid
FROM wish_info x
WHERE x.buyer_id in ('100000', '165904')
and x.wish_date is not null and x.wish_date != '' and x.is_delete = 0 and wish_type != 'TA_BUY'
group by x.buyer_id
)t1 inner join wish_info t2 on t1.mid=t2.id
里面有个取巧的方式substring_index(group_concat(id order by wish_date),',',1)
,可以好好研究
Comments