⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 探讨一下这个sql的执行(统计,join相关).txt

📁 oracle问题集
💻 TXT
📖 第 1 页 / 共 2 页
字号:
ITPUB论坛 - 探讨一下这个sql的执行!
ITPUB论坛 (http://www.itpub.net/index.php)
- Oracle数据库管理 (http://www.itpub.net/forumdisplay.php?forumid=2)
-- 探讨一下这个sql的执行! (http://www.itpub.net/showthread.php?threadid=536)


由 ALAN.YANG 于 01-10-10 16:42 发表: 
 探讨一下这个sql的执行!


请大家看看下面这个SQL:

SELECT COUNT(*) 
FROM TABLE_A A
WHERE SUBSTR(A.COL,1,7) NOT IN 
( SELECT SUBSTR(B.COL,1,7) 
FROM TABLE_B B 
WHERE B.DH='XXX')

我的问题是:当比较A.COL时 , 子查询是不是都要执行一次?
说明一下:A ,B都是千万数据级的,如果是,那有什么办法提高查询速度?

__________________
如果ORACLE是大海
那我是在浅水区徘徊......




由 ALAN.YANG 于 01-10-10 17:04 发表: 
呜呜,大虾们都还在睡觉吗?
chao_ping,tidycc 帮我看一看可以吗?
__________________
如果ORACLE是大海
那我是在浅水区徘徊......




由 kezizi 于 01-10-10 17:17 发表: 
I like this question. I have lots of experience with optimizing sql with VLDB, 
but just can't remember the syntax 
Method 1, not very good, but good for learning purpose 
---------------------------------- 
1. the sub query run once --to answer your question
2. if the sub query "select sub(b.col,1,7) from table_b b where ... " reture 
small amount of data (even if table_b is large), the query is OK
3. if sub query return lots of records (more than 100000) you may want to 
consider rewrite the query as follow
select distinct sub(b.col,1,7) x from table_b b where ... into a temp table xx
create index on xx (x) -- depends on the optimizor, no index needed if it 
chooses hash join
SELECT COUNT(*) FROM TABLE_A A WHERE SUBSTR(A.COL,1,7) NOT IN ( SELECT x from 
xx) 
4. depend on how the data looks like , you can also optimize the select col(1,7) 
from a with temp table ,etc. no need to explain i think

-----Method 2 , the one i like, but not work if a, b are both very very very 
large -----------------
select count(*) from a, outer b where a.col(1,7) = b.col(1.7) and b.col(1.7) is 
null (I forget if it's Oracle syntax)
or for safety, split the above sql into two sql statements
select a.col(1,7) a ,b.col(1.7) b from a, outer b where a.col(1.7) = b.col(1.7) 
into a temp table x
select count(*) from table x where b is null


由 kezizi 于 01-10-10 17:20 发表: 
you can combine above method 1, and 2 to take both advantages. depend on how the 
data look like and the size. 

Next time, you should like us know the column length and number of records when 
ask such question.


由 ALAN.YANG 于 01-10-10 17:26 发表: 
Thank you very much for your ansewer !

I'am sorry ! That is a mistake:In fact , table_b have
about 200,000 records。 The table_a have about 
10,000,000 records 。The sub_query return about 40000 records 。And the sql 
executes for many hours ! if I use hash join but not sub_query ,the time is 
longer!

so , what should I do ,use method 2 ?
__________________
如果ORACLE是大海
那我是在浅水区徘徊......




由 kezizi 于 01-10-10 17:32 发表: 
how many records returned by the select distinct col(1,7) from b where ? = 'xxx'


由 ALAN.YANG 于 01-10-10 17:35 发表: 
  quote:

  最初由 kezizi 发布
  how many records returned by the select distinct col(1,7) from b where ? = 
  'xxx' 





about 40000
__________________
如果ORACLE是大海
那我是在浅水区徘徊......




由 smile_lan 于 01-10-10 17:41 发表: 
你这种建临时表的办法是用于A,B表随时有DML操作的情况吗
rt


由 smile_lan 于 01-10-10 18:15 发表: 
再试试
上一篇再首页没显示.

刚才当机后,我的浏览器有no thread 的提示


由 tidycc 于 01-10-10 18:19 发表: 
你的语句用了NOT IN ,应来说极慢的
我只能提出我的改进方法,即不用NOT IN
SELECT COUNT(*) FROM 
(
SELECT SUBSTR(A.COL,1,7) FROM TABLE_A A 
MINUES 
SELECT SUBSTR(B.COL,1,7) FROM TABLE_B B WHERE B.DH='XXX'
)
以上的方法应比NOT IN 快一些.
你也可参照KEZIZI的方法, 建些临时表,但最后查找的时候不要用NOT IN,
一般都可以改为 MINUES, 
请其它大虾提出改进意见.
不防从外联接的角度着手.


由 oldwain 于 01-10-10 18:23 发表: 
1. 用等价的exists语句代替

SELECT COUNT(*) 
FROM TABLE_A A 
WHERE NOT EXISTS 
( SELECT 1 FROM 
FROM TABLE_B B 
WHERE B.DH='XXX'
AND SUBSTR(A.COL,1,7) = SUBSTR(B.COL, 1, 7)
) 

注意在a.col, b.col可能为null时与原语句语义上的差别, 并考虑使用外连接.

2. 建立基于SUBSTR(A.COL, 7, 1), SUBSTR(B.COL,1,7)的function based index会有帮助
__________________
原      ▇▇▇▇▇▇▇ 
装      ▇ITPUB▇ 
       ▇狂热☆分子▇ 
正      ▇灌水专用章▇ 
版      ▇▇▇▇▇▇▇ 

    下 载 I T P U B 电 子 杂 志




由 zhuyg 于 01-10-10 18:28 发表: 
Not in is the slowest 
use out join instead
Not exists is the fastest way to deal with your problem

SELECT COUNT(*) 
FROM TABLE_A A 
WHERE not exists (SELECT 'X' 
FROM TABLE_B B 
WHERE B.DH='XXX' and 
SUBSTR(A.COL,1,7) = SUBSTR(B.COL,1,7) )

create index on table_aa and table_bb but also consider to deal with the substr 
function since it will prevend u to use the index


由 kezizi 于 01-10-10 18:44 发表: 
if you still need my help please run following sql and let me know the result 
and how long it takes

create table x as select substr(a.col,1,7) s ,count(*) c from a group by s 
select count(*), sum(c) from x


由 ttdb 于 01-10-10 19:11 发表: 
就本例来说,是
tidycc的minues快
还是oldwain的not exists快?

俺想知道 结果的说^^

当然 还可以比较 create function index前后的差别
__________________
whopawho

http://www.itpub.net/search.php
今天你google了没有?



由 ALAN.YANG 于 01-10-10 19:53 发表: 
  quote:

  最初由 kezizi 发布
  if you still need my help please run following sql and let me know the result 
  and how long it takes

  create table x as select substr(a.col,1,7) s ,count(*) c from a group by s 
  select count(*), sum(c) from x 




The first sql takes about 1000 seconds
and the second sql return follow results:
COUNT(S) SUM(C)
15 12126713

And kezizi, are you have good advices?

BTW , 谢谢各位大虾的鼎力相助!
如果我的语句还要统计其他项目,如统计其他列的和呢?
例如:
SELECT COUNT(*),SUM(A.COL2)
FROM TABLE_A A
WHERE .......(省略)
__________________
如果ORACLE是大海
那我是在浅水区徘徊......




由 kezizi 于 01-10-10 20:17 发表: 
count(*) = 15?!!, so small?

1. create index on table_a and/or table_b will not help because of the table 
size, plus it creates trouble later when data is updated
2. the different between 'not in' and 'exists' is a good point where I didn't 
pay attention. acturelly Internally in any kind of database, the 'exists' is use 
the same algorithm (in certain degree) to get data as the 'method 2' i mentioned 
before. ( use sort join or other join instead of inner loop join as in 'not in')

if 30 minutes is acceptable, then just do following (much better than several 
hours). 
by the way, There following sql can be optimized as well, but I don't believe 
any tuning will speed it up for another 10 times. because the following sql 
perform full table scan for 2 times (on two table) and perform a join on small 
tables whose time can be ignored. Mathematiclly there are no other significant 
algorithm better than this one

create table x as select substr(a.col,1,7) s ,count(*) c from a group by s 
create table y as select distinct substr(b.col,1,7) s from b where b.dh = 'xxx'
select sum(c) from x where not exists (select 1 from x,y where x.s = y.s)


由 kezizi 于 01-10-10 20:19 发表: 
by the way the sum(c) is the result you want


由 ALAN.YANG 于 01-10-10 20:32 发表: 
呵呵,你说15 太少?那是因为我是按substr(a.col1,1,7)
去分组统计的,如果不是那就和后面那个一千多万的数是一样的?

非常感谢!在您的英明指导下问题已经解决!

非常想跟你多学习!因为平时处理大数据量的查询统计工作
比较多。
__________________
如果ORACLE是大海
那我是在浅水区徘徊......




由 kezizi 于 01-10-10 20:36 发表: 
I was wrong. create index on b(bh) and/or b(col,1,7) will help on step two. and 
save you a few minutes. 

you can also save several minutes by tuning configuration, fragmentation and 
refine the sql to speed the query up, but not easy to speed it up for another 10 
times.


由 ALAN.YANG 于 01-10-10 20:41 发表: 
  quote:

  最初由 kezizi 发布
  I was wrong. create index on b(bh) and/or b(col,1,7) will help on step two. 
  and save you a few minutes. 

  you can also save several minutes by tuning configuration, fragmentation and 
  refine the sql to speed the query up, but not easy to speed it up for another 
  10 times. 




run the query takes about 30 minutes ! 
Thanks a lot!
BTW, you may check you message! 
Can we take a secret talk ? 
Hope to make friends with you ! 
And hope to learn more from you about the sql tunning !
__________________
如果ORACLE是大海
那我是在浅水区徘徊......

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -