[oracle/sql]关于清除重复,not in方案和not exists方案的对比
2024-08-26 12:43:29
有这样一张表:
CREATE TABLE tb_sc
(
id NUMBER not null primary key,
studentid int not null,
courseid int not null,
score int not null
)
用以下语句给它充值十万条数据:
Insert into tb_sc
select rownum,dbms_random.value(0,10000),dbms_random.value(1,5),dbms_random.value(0,150) from dual
connect by level<=100000
order by dbms_random.random
当然上面这样填充完会给同一studentid和courseid时不同的score记录,相当于学生同一科目考了两次,这在现实中是不合理的,因此我们要剔除掉,只保留同一studentid和courseid时score最高的那条记录。
用下面的sql能查询出该保留的记录:
select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid
如果直接去删除不在上面的sql查出的结果集中的记录,会写出如下的sql:
delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid)
但是,真执行起来就会发现,要等这条delete语句执行完简直遥遥无期。
而用同为反连接的not exist做就很快:
delete from tb_sc where not exists (
select 'x' from tb_sc a,
(select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid) b
where a.studentid=b.studentid and a.courseid=b.courseid and a.score=b.score and tb_sc.id=a.id)
SQL> delete from tb_sc where not exists (
2 select 'x' from tb_sc a,
3 (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid) b
4 where a.studentid=b.studentid and a.courseid=b.courseid and a.score=b.score and tb_sc.id=a.id); 已删除58032行。 已用时间: 00: 00: 00.75
到这里很多人可能直接否决的not in,直接认为它慢,放弃了,但是看看下面sql,它会跑多久呢?
delete from tb_sc where id not in (select tb_sc.id from tb_sc,( select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid ) tb_sc2
where tb_sc.studentid=tb_sc2.studentid and tb_sc.courseid=tb_sc2.courseid and tb_sc.score=tb_sc2.score)
让我们实际跑一下:
SQL> delete from tb_sc where id not in (select tb_sc.id from tb_sc,( select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid ) tb_sc2
2 where tb_sc.studentid=tb_sc2.studentid and tb_sc.courseid=tb_sc2.courseid and tb_sc.score=tb_sc2.score); 已删除58032行。 已用时间: 00: 00: 00.56
发现和上面的not exist差不多!你是不是又恢复对not in的信心了呢?
但是
delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid)
确实是慢得让人发指,而将delete换成select之后,却并不慢,不信大家请执行下面sql:
select a.* from tb_sc a where (a.studentid,a.courseid,a.score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid)
我执行的结果是:
19563 9998 2 6
27799 9998 3 95 ID STUDENTID COURSEID SCORE
---------- ---------- ---------- ----------
37515 9998 4 60
35809 9998 4 144
65663 9998 5 64
84961 9999 3 2
24730 9999 3 14
99371 9999 3 16
4349 10000 5 2 已选择58032行。 已用时间: 00: 00: 16.01
并没有多长时间,为什么同样的条件,查询就快,而删除就慢得不要不要的,其中的原因还有待继续研究。
--2020年1月24日--
最新文章
- 15 个实用的 PHP 正则表达式
- 跨域请求ajax jsonp的使用解惑
- 解决打开CHM文件后,右侧空白
- HDU 5422 Rikka with Graph
- url(data:image/png;base64,R0**)(转)
- UVa 699 The Falling Leaves
- jQuery 黑白插件
- Android Service 通过 BroadcastReceiver 更新Activity UI
- 用C++如何实现开放API接口服务器
- AngularJS学习篇(十四)
- Android WebView重定向问题的解决方案
- C++反射机制:可变参数模板实现C++反射
- vue入手
- 【剑指offer】链表第一个公共子结点
- python 装饰方法
- burpsuite的使用(三)
- Vue 中组件概念
- Python zipfile 编码问题
- Syslink Control in MFC 9.0(转)
- 深入理解java集合框架之---------LinkedList