subselects - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | subselects |
Date | |
Msg-id | 199803070547.AAA02928@candle.pha.pa.us Whole thread Raw |
List | pgsql-hackers |
Here is a discussion from the Informix group on subselect performance. I think it makes us look pretty good. --------------------------------------------------------------------------- Path: readme1.op.net!op.net!cezanne.op.net!op.net!newsfeed.direct.ca!cpk-news-hub1.bbnplanet.com!cpk-news-feed4.bbnplanet.com!cpk-news-feed1.bbnplanet.com!news.bbnplanet.com!news.iquest.net!not-for-mail From: "Matt Reprogle" <reprogle@iquest.net> Newsgroups: comp.databases.informix Subject: SELECT subquery much slower than IN ( list...)??? Date: 5 Mar 1998 05:07:31 GMT Organization: IQuest Internet, Inc. Lines: 30 Message-ID: <01bd47f3$ab35d9a0$55392bd1@reprogle> NNTP-Posting-Host: iq-ind-ns006-21.iquest.net X-Newsreader: Microsoft Internet News 4.70.1155 Xref: readme1.op.net comp.databases.informix:43794 I have been having problems with a select statement of the type: select col1, col2 from bigtable where col1 in (select key from temp_list_table); In one case I looked at, the subquery returns just 13 unique values in subsecond time, yet it took almost 7 minutes for the main query to complete. On the other hand, if I write out the result of the subquery explicitly, such as: select col1, col2 from bigtable where col1 in ('A','B','C','D','E','F','G','H','I','J','K','L','M'); the query completes in less than 2 seconds. I guess I had the mistaken assumption that the main query treated the subquery result like an explicit list of the form ('val1','val2',...). What could cause the huge performance difference between the two query forms? I am on 7.23 and Solaris 2.5.1, Sun E3000. -- Matt Reprogle reprogle@iquest.net Path: readme1.op.net!op.net!cezanne.op.net!op.net!howland.erols.net!news.idt.net!woodstock.news.demon.net!demon!news.demon.co.uk!demon!smooth1.demon.co.uk!djw From: David Williams <djw@smooth1.demon.co.uk> Newsgroups: comp.databases.informix Subject: Re: SELECT subquery much slower than IN ( list...)??? Date: Thu, 5 Mar 1998 22:00:41 +0000 Organization: not applicable Message-ID: <94a6CHAJCy$0Ew0s@smooth1.demon.co.uk> References: <01bd47f3$ab35d9a0$55392bd1@reprogle> NNTP-Posting-Host: smooth1.demon.co.uk X-NNTP-Posting-Host: smooth1.demon.co.uk [194.222.39.154] MIME-Version: 1.0 X-Newsreader: Turnpike (32) Version 3.05 <9Hhi+s$5$1$z+XxjwCrFWIswYg> Lines: 65 Xref: readme1.op.net comp.databases.informix:43884 In article <01bd47f3$ab35d9a0$55392bd1@reprogle>, Matt Reprogle <reprogle@iquest.net> writes >I have been having problems with a select statement of the type: > >select col1, col2 >from bigtable >where col1 in > (select key from temp_list_table); > Foreach row in big table get value of col1 (A) run the subquery and get the results (B) check if a in B end foreach If bigtable has n rows this runs the subquery n times. Also it scans every row in BIGTABLE!!! No indexeson BIG TABLE are used. >In one case I looked at, the subquery returns just 13 unique values in >subsecond time, yet it took almost 7 minutes for the main query to >complete. > >On the other hand, if I write out the result of the subquery explicitly, >such as: > >select col1, col2 >from bigtable >where col1 in ('A','B','C','D','E','F','G','H','I','J','K','L','M'); > This will use the index on col1.. >the query completes in less than 2 seconds. > >I guess I had the mistaken assumption that the main query treated the >subquery result like an explicit list of the form ('val1','val2',...). > >What could cause the huge performance difference between the two query >forms? > >I am on 7.23 and Solaris 2.5.1, Sun E3000. Try select col1, col2 from bigtable,temp_list_table where bigtable.col1 = temp_list_table.key i.e. do a join not a corelated subquery!! -- David Williams Maintainer of the Informix FAQ Primary site (Beta Version) http://www.smooth1.demon.co.uk Official site http://www.iiug.org/techinfo/faq/faq_top.html I see you standin', Standin' on your own, It's such a lonely place for you, For you to be If you need a shoulder, Or if you need a friend, I'll be here standing, Until the bitter end... So don't chastise me Or think I, I mean you harm... All I ever wanted Was for you To know that I care Path: readme1.op.net!op.net!cezanne.op.net!op.net!news1.ispnews.com!europa.clark.net!206.251.127.50!newsfeed.gte.net!news.gte.net!not-for-mail From: dgwilson@gte.net (Douglas Wilson) Newsgroups: comp.databases.informix Subject: Re: SELECT subquery much slower than IN ( list...)??? Date: Fri, 06 Mar 1998 18:21:30 GMT Organization: gte.net Lines: 31 Message-ID: <6dpenh$dkd$1@gte1.gte.net> References: <01bd47f3$ab35d9a0$55392bd1@reprogle> <94a6CHAJCy$0Ew0s@smooth1.demon.co.uk> NNTP-Posting-Host: fw.brightpoint.com X-Auth: D203870C029BCB8A4BC48491 X-Newsreader: Forte Free Agent 1.11/32.235 Xref: readme1.op.net comp.databases.informix:43942 On Thu, 5 Mar 1998 22:00:41 +0000, David Williams <djw@smooth1.demon.co.uk> wrote: >In article <01bd47f3$ab35d9a0$55392bd1@reprogle>, Matt Reprogle ><reprogle@iquest.net> writes >>I have been having problems with a select statement of the type: >> >>select col1, col2 >>from bigtable >>where col1 in >> (select key from temp_list_table); (stuff clipped) > i.e. do a join not a corelated subquery!! True, if the 'key' in the temp table has no duplicates then just join; if there are duplicates, you can 'select unique key' from the temp table into another temp table, but I dont think this is a corelated subquery, just a subquery. A corelated subquery would be something like select col1, col2 from bigtable where col1 in (select col1 from temp_list_table where temp_list_table.col2=bigtable.col2); Cheers, Douglas Wilson Path: readme1.op.net!op.net!cezanne.op.net!op.net!darla.visi.com!news-out.visi.com!feed2.news.erols.com!erols!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!newsfeed.gte.net!news.gte.net!not-for-mail From: dgwilson@gte.net (Douglas Wilson) Newsgroups: comp.databases.informix Subject: Re: SELECT subquery much slower than IN ( list...)??? Date: Thu, 05 Mar 1998 23:39:05 GMT Organization: gte.net Lines: 24 Message-ID: <6dncuu$hsh$1@gte2.gte.net> References: <01bd47f3$ab35d9a0$55392bd1@reprogle> NNTP-Posting-Host: fw.brightpoint.com X-Auth: D203990A1986CB8653C88491 X-Newsreader: Forte Free Agent 1.11/32.235 Xref: readme1.op.net comp.databases.informix:43888 On 5 Mar 1998 05:07:31 GMT, "Matt Reprogle" <reprogle@iquest.net> wrote: >I have been having problems with a select statement of the type: > >select col1, col2 >from bigtable >where col1 in > (select key from temp_list_table); > >In one case I looked at, the subquery returns just 13 unique values in >subsecond time, yet it took almost 7 minutes for the main query to >complete. have you done a 'set explain on'? I had a similar situation once, and I didn't realize (until the 'explain') that the table in the main query was really an alias (synonym, whatever) for a table in another database on another machine. The optimizer therefore could not use the index on the main table. Also could be an 'update statistics' thing. Cheers, Douglas Wilson Path: readme1.op.net!op.net!cezanne.op.net!op.net!news.bconnex.net!nac!news-xfer.netaxs.com!fastnet!howland.erols.net!news.idt.net!nntp2.cerf.net!nntp3.cerf.net!hacgate2.hac.com!news.delcoelect.com!not-for-mail From: Matt Reprogle <mcreprog@ictest.delcoelect.com> Newsgroups: comp.databases.informix Subject: Re: SELECT subquery much slower than IN ( list...)??? Date: Fri, 06 Mar 1998 15:31:10 -0500 Organization: Delco Electronics Lines: 120 Message-ID: <35005D0E.4E76@ictest.delcoelect.com> References: <01bd47f3$ab35d9a0$55392bd1@reprogle> <6dncuu$hsh$1@gte2.gte.net> NNTP-Posting-Host: koicew00.delcoelect.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 3.0 (X11; I; HP-UX A.09.01 9000/715) Xref: readme1.op.net comp.databases.informix:43973 Douglas Wilson wrote: > have you done a 'set explain on'? > I had a similar situation once, and I didn't realize > (until the 'explain') that the > table in the main query was really an alias (synonym, whatever) for > a table in another database on another machine. The optimizer > therefore could not use the index on the main table. > Also could be an 'update statistics' thing. > > Cheers, > Douglas Wilson First, some additional information: 1) the main table I am querying is about 3,000,000 rows. 2) I have a unique index for table h_tab on columns (l_key, h_seq) Here is the sqexplain.out for each query mode: EXPLICIT LIST (runs in about 3 seconds) QUERY: ------ select l_key,max(h_seq) last_h_seq from h_tab where l_key in ( '80914', '80D74', '80C30', '80C28', '80F98', '80915', '80A26', '80917', '80F92', '80A25', '80A24', '80A23', '80811') group by l_key into temp last_temp with no log Estimated Cost: 362 Estimated # of Rows Returned: 2 Temporary Files Required For: Group By 1) h_tab: INDEX PATH (1) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80914' (2) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80D74' (3) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80C30' (4) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80C28' (5) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80F98' (6) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80915' (7) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80A26' (8) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80917' (9) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80F92' (10) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80A25' (11) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80A24' (12) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80A23' (13) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Lower Index Filter: h_tab.l_key = '80811' SUBQUERY: runs in about 7 minutes QUERY: ------ select l_key,max(h_seq) last_h_seq from h_tab where l_key in (select temp_l from l_temp_tab) group by l_key into temp last_temp with no log Estimated Cost: 88140 Estimated # of Rows Returned: 9142 1) h_tab: INDEX PATH Filters: h_tab.l_key = ANY <subquery> (1) Index Keys: l_key h_seq (Key-Only) (Serial, fragments: ALL) Subquery: --------- Estimated Cost: 2 Estimated # of Rows Returned: 10 1) mcreprog.l_temp_tab: SEQUENTIAL SCAN (Serial, fragments: ALL) This tells me that it is doing a key-only query on the big table, and a sequential scan on the temp table. Isn't that what you would expect? -- Matt Reprogle IS Engineer, Delphi Delco Electronics Systems phone:(765)451-9651 FAX: (765)451-8230 mcreprog@ictest.delcoelect.com Path: readme1.op.net!op.net!cezanne.op.net!op.net!darla.visi.com!news-out.visi.com!feed2.news.erols.com!erols!newsfeed.internetmci.com!131.103.1.116!news2.chicago.iagnet.net!qual.net!iagnet.net!203.29.160.2!ihug.co.nz!nsw1.news.telstra.net!egprod05.westpac.com.au!fbox@westpac.com.au From: jharris@westpac.com.au (Jason Harris) Newsgroups: comp.databases.informix Subject: Re: SELECT subquery much slower than IN ( list...)??? Date: Fri, 06 Mar 1998 00:05:57 GMT Organization: Westpac Banking Corporation Lines: 48 Message-ID: <34ff3cbd.261919430@10.209.30.200> References: <01bd47f3$ab35d9a0$55392bd1@reprogle> Reply-To: jharris@westpac.com.au NNTP-Posting-Host: egprod03.westpac.com.au X-Newsreader: Forte Free Agent 1.11/32.235 Xref: readme1.op.net comp.databases.informix:43876 Matt, I too am interested in this. I have approx 50 delete statements that use a subquery on a key. All tables have at least one index on the column that I am using, with that column as the first or only element. At the moment around half use the indexes and about half sequential scan. I have not been able to figure out why they dont all use the index. If I found out more I will let you know. Jason On 5 Mar 1998 05:07:31 GMT, "Matt Reprogle" <reprogle@iquest.net> wrote: >I have been having problems with a select statement of the type: > >select col1, col2 >from bigtable >where col1 in > (select key from temp_list_table); > >In one case I looked at, the subquery returns just 13 unique values in >subsecond time, yet it took almost 7 minutes for the main query to >complete. > >On the other hand, if I write out the result of the subquery explicitly, >such as: > >select col1, col2 >from bigtable >where col1 in ('A','B','C','D','E','F','G','H','I','J','K','L','M'); > >the query completes in less than 2 seconds. > >I guess I had the mistaken assumption that the main query treated the >subquery result like an explicit list of the form ('val1','val2',...). > >What could cause the huge performance difference between the two query >forms? > >I am on 7.23 and Solaris 2.5.1, Sun E3000. >-- >Matt Reprogle >reprogle@iquest.net -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
pgsql-hackers by date: