subselects - Mailing list pgsql-hackers

From Bruce Momjian
Subject subselects
Date
Msg-id 199803070547.AAA02928@candle.pha.pa.us
Whole thread Raw
List pgsql-hackers
Here is a discussion from the Informix group on subselect performance.
I think it makes us look pretty good.

---------------------------------------------------------------------------

Path:
readme1.op.net!op.net!cezanne.op.net!op.net!newsfeed.direct.ca!cpk-news-hub1.bbnplanet.com!cpk-news-feed4.bbnplanet.com!cpk-news-feed1.bbnplanet.com!news.bbnplanet.com!news.iquest.net!not-for-mail
From: "Matt Reprogle" <reprogle@iquest.net>
Newsgroups: comp.databases.informix
Subject: SELECT subquery much slower than IN ( list...)???
Date: 5 Mar 1998 05:07:31 GMT
Organization: IQuest Internet, Inc.
Lines: 30
Message-ID: <01bd47f3$ab35d9a0$55392bd1@reprogle>
NNTP-Posting-Host: iq-ind-ns006-21.iquest.net
X-Newsreader: Microsoft Internet News 4.70.1155
Xref: readme1.op.net comp.databases.informix:43794

I have been having problems with a select statement of the type:

select col1, col2
from bigtable
where col1 in
    (select key from temp_list_table);

In one case I looked at, the subquery returns just 13 unique values in
subsecond time, yet it took almost 7 minutes for the main query to
complete.

On the other hand, if I write out the result of the subquery explicitly,
such as:

select col1, col2
from bigtable
where col1 in ('A','B','C','D','E','F','G','H','I','J','K','L','M');

the query completes in less than 2 seconds.

I guess I had the mistaken assumption that the main query treated the
subquery result like an explicit list of the form ('val1','val2',...).

What could cause the huge performance difference between the two query
forms?

I am on 7.23 and Solaris 2.5.1, Sun E3000.
--
Matt Reprogle
reprogle@iquest.net
Path:
readme1.op.net!op.net!cezanne.op.net!op.net!howland.erols.net!news.idt.net!woodstock.news.demon.net!demon!news.demon.co.uk!demon!smooth1.demon.co.uk!djw
From: David Williams <djw@smooth1.demon.co.uk>
Newsgroups: comp.databases.informix
Subject: Re: SELECT subquery much slower than IN ( list...)???
Date: Thu, 5 Mar 1998 22:00:41 +0000
Organization: not applicable
Message-ID: <94a6CHAJCy$0Ew0s@smooth1.demon.co.uk>
References: <01bd47f3$ab35d9a0$55392bd1@reprogle>
NNTP-Posting-Host: smooth1.demon.co.uk
X-NNTP-Posting-Host: smooth1.demon.co.uk [194.222.39.154]
MIME-Version: 1.0
X-Newsreader: Turnpike (32) Version 3.05 <9Hhi+s$5$1$z+XxjwCrFWIswYg>
Lines: 65
Xref: readme1.op.net comp.databases.informix:43884

In article <01bd47f3$ab35d9a0$55392bd1@reprogle>, Matt Reprogle
<reprogle@iquest.net> writes
>I have been having problems with a select statement of the type:
>
>select col1, col2
>from bigtable
>where col1 in
>    (select key from temp_list_table);
>
  Foreach row in big table
     get value of col1 (A)
     run the subquery and get the results (B)
     check if a in B
  end foreach

   If bigtable has n rows this runs the subquery n times.

   Also it scans every row in BIGTABLE!!! No indexeson BIG TABLE are
   used.



>In one case I looked at, the subquery returns just 13 unique values in
>subsecond time, yet it took almost 7 minutes for the main query to
>complete.
>
>On the other hand, if I write out the result of the subquery explicitly,
>such as:
>
>select col1, col2
>from bigtable
>where col1 in ('A','B','C','D','E','F','G','H','I','J','K','L','M');
>
  This will use the index on col1..

>the query completes in less than 2 seconds.
>
>I guess I had the mistaken assumption that the main query treated the
>subquery result like an explicit list of the form ('val1','val2',...).
>
>What could cause the huge performance difference between the two query
>forms?
>
>I am on 7.23 and Solaris 2.5.1, Sun E3000.

  Try
 select col1, col2
 from bigtable,temp_list_table
 where bigtable.col1 = temp_list_table.key


 i.e. do a join not a corelated subquery!!

--
David Williams

Maintainer of the Informix FAQ
 Primary site (Beta Version)  http://www.smooth1.demon.co.uk
 Official site                http://www.iiug.org/techinfo/faq/faq_top.html

I see you standin', Standin' on your own, It's such a lonely place for you, For
you to be If you need a shoulder, Or if you need a friend, I'll be here
standing, Until the bitter end...
So don't chastise me Or think I, I mean you harm...
All I ever wanted Was for you To know that I care
Path:
readme1.op.net!op.net!cezanne.op.net!op.net!news1.ispnews.com!europa.clark.net!206.251.127.50!newsfeed.gte.net!news.gte.net!not-for-mail
From: dgwilson@gte.net (Douglas Wilson)
Newsgroups: comp.databases.informix
Subject: Re: SELECT subquery much slower than IN ( list...)???
Date: Fri, 06 Mar 1998 18:21:30 GMT
Organization: gte.net
Lines: 31
Message-ID: <6dpenh$dkd$1@gte1.gte.net>
References: <01bd47f3$ab35d9a0$55392bd1@reprogle> <94a6CHAJCy$0Ew0s@smooth1.demon.co.uk>
NNTP-Posting-Host: fw.brightpoint.com
X-Auth: D203870C029BCB8A4BC48491
X-Newsreader: Forte Free Agent 1.11/32.235
Xref: readme1.op.net comp.databases.informix:43942

On Thu, 5 Mar 1998 22:00:41 +0000, David Williams
<djw@smooth1.demon.co.uk> wrote:

>In article <01bd47f3$ab35d9a0$55392bd1@reprogle>, Matt Reprogle
><reprogle@iquest.net> writes
>>I have been having problems with a select statement of the type:
>>
>>select col1, col2
>>from bigtable
>>where col1 in
>>    (select key from temp_list_table);

(stuff clipped)

> i.e. do a join not a corelated subquery!!

True, if the 'key' in the temp table has no duplicates
then just join; if there are duplicates, you can
'select unique key' from the temp table into another
temp table, but I dont think this is a corelated
subquery, just a subquery. A corelated subquery
would be something like

select col1, col2
from bigtable
where col1 in
    (select col1 from temp_list_table
     where temp_list_table.col2=bigtable.col2);

Cheers,
Douglas Wilson
Path:
readme1.op.net!op.net!cezanne.op.net!op.net!darla.visi.com!news-out.visi.com!feed2.news.erols.com!erols!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!newsfeed.gte.net!news.gte.net!not-for-mail
From: dgwilson@gte.net (Douglas Wilson)
Newsgroups: comp.databases.informix
Subject: Re: SELECT subquery much slower than IN ( list...)???
Date: Thu, 05 Mar 1998 23:39:05 GMT
Organization: gte.net
Lines: 24
Message-ID: <6dncuu$hsh$1@gte2.gte.net>
References: <01bd47f3$ab35d9a0$55392bd1@reprogle>
NNTP-Posting-Host: fw.brightpoint.com
X-Auth: D203990A1986CB8653C88491
X-Newsreader: Forte Free Agent 1.11/32.235
Xref: readme1.op.net comp.databases.informix:43888

On 5 Mar 1998 05:07:31 GMT, "Matt Reprogle" <reprogle@iquest.net>
wrote:

>I have been having problems with a select statement of the type:
>
>select col1, col2
>from bigtable
>where col1 in
>    (select key from temp_list_table);
>
>In one case I looked at, the subquery returns just 13 unique values in
>subsecond time, yet it took almost 7 minutes for the main query to
>complete.

have you done a 'set explain on'?
I had a similar situation once, and I didn't realize
(until the 'explain') that the
table in the main query was really an alias (synonym, whatever) for
a table in another database on another machine. The optimizer
therefore could not use the index on the main table.
Also could be an 'update statistics' thing.

Cheers,
Douglas Wilson
Path:
readme1.op.net!op.net!cezanne.op.net!op.net!news.bconnex.net!nac!news-xfer.netaxs.com!fastnet!howland.erols.net!news.idt.net!nntp2.cerf.net!nntp3.cerf.net!hacgate2.hac.com!news.delcoelect.com!not-for-mail
From: Matt Reprogle <mcreprog@ictest.delcoelect.com>
Newsgroups: comp.databases.informix
Subject: Re: SELECT subquery much slower than IN ( list...)???
Date: Fri, 06 Mar 1998 15:31:10 -0500
Organization: Delco Electronics
Lines: 120
Message-ID: <35005D0E.4E76@ictest.delcoelect.com>
References: <01bd47f3$ab35d9a0$55392bd1@reprogle> <6dncuu$hsh$1@gte2.gte.net>
NNTP-Posting-Host: koicew00.delcoelect.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.0 (X11; I; HP-UX A.09.01 9000/715)
Xref: readme1.op.net comp.databases.informix:43973

Douglas Wilson wrote:
> have you done a 'set explain on'?
> I had a similar situation once, and I didn't realize
> (until the 'explain') that the
> table in the main query was really an alias (synonym, whatever) for
> a table in another database on another machine. The optimizer
> therefore could not use the index on the main table.
> Also could be an 'update statistics' thing.
>
> Cheers,
> Douglas Wilson
First, some additional information:
1) the main table I am querying is about 3,000,000 rows.
2) I have a unique index for table h_tab on columns (l_key, h_seq)

Here is the sqexplain.out for each query mode:

EXPLICIT LIST (runs in about 3 seconds)
QUERY:
------
select l_key,max(h_seq) last_h_seq
from h_tab
where l_key in (
'80914',
'80D74',
'80C30',
'80C28',
'80F98',
'80915',
'80A26',
'80917',
'80F92',
'80A25',
'80A24',
'80A23',
'80811')
group by l_key
into temp last_temp
with no log

Estimated Cost: 362
Estimated # of Rows Returned: 2
Temporary Files Required For: Group By

1) h_tab: INDEX PATH

    (1) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80914'

    (2) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80D74'

    (3) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80C30'

    (4) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80C28'

    (5) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80F98'

    (6) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80915'

    (7) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80A26'

    (8) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80917'

    (9) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80F92'

    (10) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80A25'

    (11) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80A24'

    (12) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80A23'

    (13) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)
        Lower Index Filter: h_tab.l_key = '80811'


SUBQUERY: runs in about 7 minutes
QUERY:
------
select l_key,max(h_seq) last_h_seq
from h_tab
where l_key in (select temp_l from l_temp_tab)
group by l_key
into temp last_temp
with no log

Estimated Cost: 88140
Estimated # of Rows Returned: 9142

1) h_tab: INDEX PATH

    Filters: h_tab.l_key = ANY <subquery>

    (1) Index Keys: l_key h_seq   (Key-Only)  (Serial, fragments: ALL)

    Subquery:
    ---------
    Estimated Cost: 2
    Estimated # of Rows Returned: 10

    1) mcreprog.l_temp_tab: SEQUENTIAL SCAN  (Serial, fragments: ALL)

This tells me that it is doing a key-only query on the big table, and a
sequential scan on the temp table.  Isn't that what you would expect?

--
Matt Reprogle
IS Engineer, Delphi Delco Electronics Systems
phone:(765)451-9651    FAX:  (765)451-8230
mcreprog@ictest.delcoelect.com
Path:
readme1.op.net!op.net!cezanne.op.net!op.net!darla.visi.com!news-out.visi.com!feed2.news.erols.com!erols!newsfeed.internetmci.com!131.103.1.116!news2.chicago.iagnet.net!qual.net!iagnet.net!203.29.160.2!ihug.co.nz!nsw1.news.telstra.net!egprod05.westpac.com.au!fbox@westpac.com.au
From: jharris@westpac.com.au (Jason Harris)
Newsgroups: comp.databases.informix
Subject: Re: SELECT subquery much slower than IN ( list...)???
Date: Fri, 06 Mar 1998 00:05:57 GMT
Organization: Westpac Banking Corporation
Lines: 48
Message-ID: <34ff3cbd.261919430@10.209.30.200>
References: <01bd47f3$ab35d9a0$55392bd1@reprogle>
Reply-To: jharris@westpac.com.au
NNTP-Posting-Host: egprod03.westpac.com.au
X-Newsreader: Forte Free Agent 1.11/32.235
Xref: readme1.op.net comp.databases.informix:43876

Matt,

I too am interested in this.

I have approx 50 delete statements that use a subquery on a key. All
tables have at least one index on the column that I am using, with
that column as the first or only element. At the moment around half
use the indexes and about half sequential scan. I have not been able
to figure out why they dont all use the index.

If I found out more I will let you know.

Jason

On 5 Mar 1998 05:07:31 GMT, "Matt Reprogle" <reprogle@iquest.net>
wrote:

>I have been having problems with a select statement of the type:
>
>select col1, col2
>from bigtable
>where col1 in
>    (select key from temp_list_table);
>
>In one case I looked at, the subquery returns just 13 unique values in
>subsecond time, yet it took almost 7 minutes for the main query to
>complete.
>
>On the other hand, if I write out the result of the subquery explicitly,
>such as:
>
>select col1, col2
>from bigtable
>where col1 in ('A','B','C','D','E','F','G','H','I','J','K','L','M');
>
>the query completes in less than 2 seconds.
>
>I guess I had the mistaken assumption that the main query treated the
>subquery result like an explicit list of the form ('val1','val2',...).
>
>What could cause the huge performance difference between the two query
>forms?
>
>I am on 7.23 and Solaris 2.5.1, Sun E3000.
>--
>Matt Reprogle
>reprogle@iquest.net


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] errors in pg_attribute.h
Next
From: "Thomas G. Lockhart"
Date:
Subject: Re: [HACKERS] dec alpha/64bit stuff