05月20, 2022
收藏本站

MySQL in子查询导致全表扫描

这个SQL性能如何?

select gvm2.* 
from goods_version_manage gvm2 
where id in (
  select max(id) as id 
  from goods_version_manage gvm1
  where gvm1.version_flag = 1
    and gvm1.goods_id in (
      '1496886471670255616',
      '1501485202432479232',
      '1507187868873674752',
      '1508018538864857088',
      '1508019855326859264',
      '1514527041460785152'
    )
  group by gvm1.goods_id
);

常理分析,先执行in里面的子查询gvm1,取出结果放到in里进行外层查询,内层goods_id有索引,外层id本身为主键,效率还会差?

root bjh_goods>explain 
   -> select gvm2.* 
   -> from goods_version_manage gvm2 
   -> where id in (
   ->   select max(id) as id 
   ->   from goods_version_manage gvm1
   ->   where gvm1.version_flag = 1
   ->     and gvm1.goods_id in (
   ->       '1496886471670255616',
   ->       '1501485202432479232',
   ->       '1507187868873674752',
   ->       '1508018538864857088',
   ->       '1508019855326859264',
   ->       '1514527041460785152'
   ->     )
   ->   group by gvm1.goods_id
   -> );
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                              |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------+
|  1 | PRIMARY     | gvm2  | NULL       | ALL   | NULL          | NULL         | NULL    | NULL |  445 |   100.00 | Using where                        |
|  2 | SUBQUERY    | gvm1  | NULL       | range | idx_goods_id  | idx_goods_id | 98      | NULL |   27 |    10.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------+
2 rows in set, 1 warning (0.00 sec)

从explain看,外层gvm2表使用ALL全表扫描,当gvm2全表数据量越来越大,该SQL查询则更慢。

为什么外层gvm2用不上主键索引? 子查询结果未知,子查询中的条件可以使用外层的字段值,所以外层无法直接使用索引(即使对gvm2强制使用索引也失效)。

优化后: 更改为派生表inner join

root bjh_goods>explain 
   -> select gvm2.* 
   -> from goods_version_manage gvm2 
   ->  inner join (
   ->    select max(id) as id 
   ->    from goods_version_manage gvm1
   ->    where gvm1.version_flag = 1
   ->      and gvm1.goods_id in (
   ->        '1496886471670255616',
   ->        '1501485202432479232',
   ->        '1507187868873674752',
   ->        '1508018538864857088',
   ->        '1508019855326859264',
   ->        '1514527041460785152'
   ->      )
   ->    group by gvm1.goods_id
   ->  )t1 on gvm2.id=t1.id ;
+----+-------------+------------+------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
| id | select_type | table      | partitions | type   | possible_keys | key          | key_len | ref   | rows | filtered | Extra                              |
+----+-------------+------------+------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL         | NULL    | NULL  |    2 |   100.00 | Using where                        |
|  1 | PRIMARY     | gvm2       | NULL       | eq_ref | PRIMARY       | PRIMARY      | 98      | t1.id |    1 |   100.00 | NULL                               |
|  2 | DERIVED     | gvm1       | NULL       | range  | idx_goods_id  | idx_goods_id | 98      | NULL  |   27 |    10.00 | Using index condition; Using where |
+----+-------------+------------+------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
3 rows in set, 1 warning (0.00 sec)

使用上索引,整体扫描行从原先1600降到36,全表数据量再增加差值更明显。 当使用高版本mysql,如8.0,可开启optimizer_switch中的subquery_to_derived,MySQL优化器会自动将子查询转换成派生表。(目前所有环境使用版本MySQL 5.7.18)

建议

谨慎使用in子查询,虽然SQL上更容易理解,高版本也能支持优化器自动转换,但没必要增加MySQL优化时间。建议直接使用关联查询join代替子查询!

思考:gvm1扫描行为什么是27?含有索引外字段version_flag,所以需要用到ICP(索引下推)。


Comments