工作和面试中,很频繁被问到的一个问题,如何分组取前N个数据?
今天Abul也来总结下。
1. 示例结构和数据:
create table t_score(
id int not null auto_increment,
cid int not null comment '班级ID',
sid int not null comment '学生ID',
score int not null comment '分数',
primary key (id)
) comment '学生分数表';
create table t_student(
id int not null auto_increment ,
name varchar(20) comment '姓名',
primary key(id)
) comment '学生表';
# 学生
insert into t_student(id,name) value(1,'A'),(2,'B'),(3,'C'),(4,'D'),(5,'E'),(6,'F'),(7,'G'),(8,'H'),(9,'I'),(10,'J'),(11,'K');
# 分数
insert into t_score(cid,sid,score) value
(1,1,88),
(1,2,90),
(2,3,100),
(2,4,60),
(2,5,80),
(2,6,80),
(3,7,90),
(3,8,100),
(3,9,100),
(3,10,88),
(3,11,60);
2. 要求
输出每个班级前3名(分数相同都显示,不跳排名)
# 示例结果
+-----+-------+-------+
| cid | sname | score |
+-----+-------+-------+
| 1 | B | 90 |
| 1 | A | 88 |
| 2 | C | 100 |
| 2 | E | 80 |
| 2 | F | 80 |
| 2 | D | 60 |
| 3 | H | 100 |
| 3 | I | 100 |
| 3 | G | 90 |
| 3 | J | 88 |
+-----+-------+-------+
3. 解法
a. 子查询
select t1.cid,t2.name sname,t1.score
from t_score t1
inner join t_student t2 on t1.sid=t2.id
where 3>(
select count(distinct score)
from t_score m
where t1.cid=m.cid and t1.score<m.score
)
order by t1.cid asc,t1.score desc;
b. join
select t3.cid,t4.name sname,t3.score
from t_score t3
inner join t_student t4 on t3.sid=t4.id
inner join (
select t1.id
from t_score t1
left join t_score t2 on t1.cid=t2.cid and t1.score < t2.score
group by t1.id
having count(distinct t2.score)<3
)m on m.id = t3.id
order by t3.cid asc,t3.score desc;
c. 自定义变量
select n.cid,m.name sname,n.score
from t_student m
inner join (
select t1.cid,t1.sid,t1.score,
if(@last_cid=t1.cid,if(@last_score=t1.score,@i,@i:=@i+1),@i:=1) as num,
@last_cid:=t1.cid,
@last_score:=t1.score
from (select * from t_score order by cid asc,score desc) t1,
(select @i:=0,@last_cid:=0,@last_score:=-1)t2
)n on m.id=n.sid
where n.num<=3
order by n.cid asc,n.score desc;
d. 窗口函数(mysql8.0)
select n.cid,m.name sname,n.score
from t_student m
inner join (
select cid,sid,score,
dense_rank() over(partition by cid order by score desc) as num
from t_score
) n on m.id=n.sid
where n.num<=3
order by n.cid asc,n.score desc;
4. 总结
需求还可能是查询前N名,相同得分要跳过名次;或者只取前N个学生(即使得分相同也可能不显示)
总的来说窗口函数
解法算是比较简洁易懂,其次是子查询。但从查询效率来看,每个的复杂度基本差不多。
建议优先选择: 窗口函数 > 子查询 > join > 自定义变量
Comments