07月11, 2022
收藏本站

分组取前N名数据SQL

工作和面试中,很频繁被问到的一个问题,如何分组取前N个数据?
今天Abul也来总结下。

1. 示例结构和数据:

create table t_score(
  id int not null auto_increment,
  cid int not null comment '班级ID',
  sid int not null comment '学生ID',
  score int not null comment '分数',
  primary key (id)
) comment '学生分数表';

create table t_student(
  id int not null auto_increment ,
  name varchar(20) comment '姓名',
  primary key(id)
) comment '学生表';

# 学生
insert into t_student(id,name) value(1,'A'),(2,'B'),(3,'C'),(4,'D'),(5,'E'),(6,'F'),(7,'G'),(8,'H'),(9,'I'),(10,'J'),(11,'K');

# 分数
insert into t_score(cid,sid,score) value
(1,1,88),
(1,2,90),
(2,3,100),
(2,4,60),
(2,5,80),
(2,6,80),
(3,7,90),
(3,8,100),
(3,9,100),
(3,10,88),
(3,11,60);

image.png

2. 要求

输出每个班级前3名(分数相同都显示,不跳排名)

# 示例结果
+-----+-------+-------+
| cid | sname | score |
+-----+-------+-------+
|   1 | B     |    90 |
|   1 | A     |    88 |
|   2 | C     |   100 |
|   2 | E     |    80 |
|   2 | F     |    80 |
|   2 | D     |    60 |
|   3 | H     |   100 |
|   3 | I     |   100 |
|   3 | G     |    90 |
|   3 | J     |    88 |
+-----+-------+-------+

3. 解法

a. 子查询
select t1.cid,t2.name sname,t1.score
from t_score t1
  inner  join t_student t2 on t1.sid=t2.id
where 3>(
  select count(distinct score)
  from t_score m
  where t1.cid=m.cid and t1.score<m.score
)
order by t1.cid asc,t1.score desc;
b. join
select t3.cid,t4.name sname,t3.score
from t_score t3
  inner join t_student t4 on t3.sid=t4.id
  inner join (
    select t1.id
    from t_score t1
    left join t_score t2 on t1.cid=t2.cid and t1.score < t2.score
    group by t1.id
    having count(distinct t2.score)<3
  )m on m.id = t3.id
order by t3.cid asc,t3.score desc;
c. 自定义变量
select n.cid,m.name sname,n.score
from t_student m
  inner join (
    select t1.cid,t1.sid,t1.score,
    if(@last_cid=t1.cid,if(@last_score=t1.score,@i,@i:=@i+1),@i:=1) as num,
    @last_cid:=t1.cid,
    @last_score:=t1.score
    from (select * from t_score order by cid asc,score desc) t1,
        (select @i:=0,@last_cid:=0,@last_score:=-1)t2
  )n on m.id=n.sid
where n.num<=3
order by n.cid asc,n.score desc;
d. 窗口函数(mysql8.0)
select n.cid,m.name sname,n.score
from t_student m
  inner join (
    select cid,sid,score,
    dense_rank() over(partition by cid order by score desc) as num
    from t_score
  ) n on m.id=n.sid
where n.num<=3
order by n.cid asc,n.score desc;

4. 总结

需求还可能是查询前N名,相同得分要跳过名次;或者只取前N个学生(即使得分相同也可能不显示)
总的来说窗口函数解法算是比较简洁易懂,其次是子查询。但从查询效率来看,每个的复杂度基本差不多。
建议优先选择: 窗口函数 > 子查询 > join > 自定义变量


Comments