这个SQL性能如何?
select gvm2.*
from goods_version_manage gvm2
where id in (
select max(id) as id
from goods_version_manage gvm1
where gvm1.version_flag = 1
and gvm1.goods_id in (
'1496886471670255616',
'1501485202432479232',
'1507187868873674752',
'1508018538864857088',
'1508019855326859264',
'1514527041460785152'
)
group by gvm1.goods_id
);
常理分析,先执行in里面的子查询gvm1
,取出结果放到in里进行外层查询,内层goods_id有索引,外层id本身为主键,效率还会差?
root bjh_goods>explain
-> select gvm2.*
-> from goods_version_manage gvm2
-> where id in (
-> select max(id) as id
-> from goods_version_manage gvm1
-> where gvm1.version_flag = 1
-> and gvm1.goods_id in (
-> '1496886471670255616',
-> '1501485202432479232',
-> '1507187868873674752',
-> '1508018538864857088',
-> '1508019855326859264',
-> '1514527041460785152'
-> )
-> group by gvm1.goods_id
-> );
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------+
| 1 | PRIMARY | gvm2 | NULL | ALL | NULL | NULL | NULL | NULL | 445 | 100.00 | Using where |
| 2 | SUBQUERY | gvm1 | NULL | range | idx_goods_id | idx_goods_id | 98 | NULL | 27 | 10.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------+
2 rows in set, 1 warning (0.00 sec)
从explain看,外层gvm2表使用ALL全表扫描,当gvm2全表数据量越来越大,该SQL查询则更慢。
为什么外层gvm2用不上主键索引? 子查询结果未知,子查询中的条件可以使用外层的字段值,所以外层无法直接使用索引(即使对gvm2强制使用索引也失效)。
优化后:
更改为派生表inner join
root bjh_goods>explain
-> select gvm2.*
-> from goods_version_manage gvm2
-> inner join (
-> select max(id) as id
-> from goods_version_manage gvm1
-> where gvm1.version_flag = 1
-> and gvm1.goods_id in (
-> '1496886471670255616',
-> '1501485202432479232',
-> '1507187868873674752',
-> '1508018538864857088',
-> '1508019855326859264',
-> '1514527041460785152'
-> )
-> group by gvm1.goods_id
-> )t1 on gvm2.id=t1.id ;
+----+-------------+------------+------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where |
| 1 | PRIMARY | gvm2 | NULL | eq_ref | PRIMARY | PRIMARY | 98 | t1.id | 1 | 100.00 | NULL |
| 2 | DERIVED | gvm1 | NULL | range | idx_goods_id | idx_goods_id | 98 | NULL | 27 | 10.00 | Using index condition; Using where |
+----+-------------+------------+------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
3 rows in set, 1 warning (0.00 sec)
使用上索引,整体扫描行从原先1600降到36,全表数据量再增加差值更明显。
当使用高版本mysql,如8.0,可开启optimizer_switch
中的subquery_to_derived
,MySQL优化器会自动将子查询转换成派生表。(目前所有环境使用版本MySQL 5.7.18)
建议
谨慎使用in子查询,虽然SQL上更容易理解,高版本也能支持优化器自动转换,但没必要增加MySQL优化时间。建议直接使用关联查询join代替子查询!
思考:gvm1扫描行为什么是27?含有索引外字段version_flag,所以需要用到ICP(索引下推)。
Comments