Thread: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
"Alexander Steffens"
Date:
The following bug has been logged online:

Bug reference:      3826
Logged by:          Alexander Steffens
Email address:      mail@a-st.de
PostgreSQL version: 8.3b4
Operating system:   Win2003R2x64
Description:        Very Slow Execution of examplequery (wrong plan?)
Details:

Hello, I have found an Query (with data)
that need to execute on MS-SQL 2005 < 9sec,
on Postgresql I will stop it now after more than 30 mins:

create table t1 (a int);
create table t2 (a int);

insert into t1 select 1; --for t-sql compat
insert into t1 select 2;insert into t1 select 3;

insert into t2 select 1;
insert into t2 select 2;
insert into t2 select 5;

--execute 8 times QUERY A
insert into t1
select distinct (t1.a + t2.a)*2
from t1,t2
where not exists (
 select * from t1  tt where tt.a = (t1.a + t2.a)*2
)

--execute 1 times
insert into t2
select distinct (t1.a + t2.a)*3
from t1,t2
where not exists (
 select * from t2  tt where tt.a = (t1.a + t2.a)*3
)

--
--data now t1: 1642 t2: 3301
--
--now again QUERY A
--will need much too much time:
insert into t1
select distinct (t1.a + t2.a)*2
from t1,t2
where not exists (
 select * from t1  tt where tt.a = (t1.a + t2.a)*2
)

Alex.

Re: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
Gregory Stark
Date:
"Alexander Steffens" <mail@a-st.de> writes:

> Hello, I have found an Query (with data)
> that need to execute on MS-SQL 2005 < 9sec,
> on Postgresql I will stop it now after more than 30 mins:

> insert into t1
> select distinct (t1.a + t2.a)*2
> from t1,t2
> where not exists (
>  select * from t1  tt where tt.a = (t1.a + t2.a)*2
> )

What plan does MS-SQL use to complete this? I wonder whether it's producing
the same answer Postgres is.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
>> insert into t1
>> select distinct (t1.a + t2.a)*2
>> from t1,t2
>> where not exists (
>> select * from t1  tt where tt.a = (t1.a + t2.a)*2
>> )

> What plan does MS-SQL use to complete this? I wonder whether it's producing
> the same answer Postgres is.

AFAICS there is just no very good way to execute a query with such
little structure.  You pretty much have to form the cartesian product
of t1 x t2 and then do a rather expensive subquery probe for each row.
There isn't even an index on tt.a to help :-(

You could probably make it slightly less bad by changing the query to

select distinct (t1.a + t2.a)*2
from t1,t2
where (t1.a + t2.a)*2 not in (
 select tt.a from t1 tt
);

which would enable PG to use a hashed-subplan implementation of the NOT
IN probe.  This transformation is not legal in general --- it will
produce different answers if t1.a or t2.a could be NULL --- but if you
know you don't care about that then it's OK.

It's possible that MS-SQL is doing something analogous to the
hashed-subplan approach (hopefully with suitable tweaking for the NULL
case) but even then it's hard to see how it could take only 9 sec.
The cartesian product is too big.

BTW, increasing work_mem should help; it looks to me like a sizable
amount of time goes into the sort for the final DISTINCT.

            regards, tom lane

Re: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> It's possible that MS-SQL is doing something analogous to the
> hashed-subplan approach (hopefully with suitable tweaking for the NULL
> case) but even then it's hard to see how it could take only 9 sec.
> The cartesian product is too big.

Fwiw it seems MS-SQL is doing something funny. The three plans posted in the
screenshots for the "small", "mediu", and "large" cases are:

Top
    Sort (Distinct Sort)
        Nested Loop (Left Anti Semi Join)
            Nested Loop
                Table Scan
                Table Scan
            Top
                Table Scan



[cut off by the screenshot] Match
    Hash Match (Right Anti Semi Join)
        Table Scan
        Nested Loop
            Table Scan
            Table Scan



Hash Match (Right Anti Semi Join)
    Parallelism (Repartition Streams)
        Table Scan
    Parallelism (Repartition Streams)
        Nested Loop (Inner Join)
            Table Scan
            Table Spool (Lazy Spool)
                Table Scan


Postgres is doing something equivalent to the first plan.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> It's possible that MS-SQL is doing something analogous to the
>> hashed-subplan approach (hopefully with suitable tweaking for the NULL
>> case) but even then it's hard to see how it could take only 9 sec.
>> The cartesian product is too big.

> Fwiw it seems MS-SQL is doing something funny. The three plans posted in the
> screenshots for the "small", "mediu", and "large" cases are:
> ...
> Postgres is doing something equivalent to the first plan.

Hmm.  I think the second plan is probably equivalent to the
hashed-subplan behavior that you can get in PG by rewriting the query to
NOT IN as I illustrated.  The third plan looks to be the same thing plus
some parallelization frammishes.

I'm not clear on what "small/medium/large" means, in particular not on
which of these corresponds to the OP's report of 9-second execution.

            regards, tom lane

Re: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
"Alexander Steffens"
Date:
So sorry, I send you only the half of information:
MS-SQL uses the the plan with only nested loops when i have abt less
than 100 tuples in both the tables (="small")

when t1 gets abt > 1000 tuples (="medium") it switches to the
hash-anti-semi-join, when both tables gets more than 1000 tuples
(="big") it too adds parallelism. (on my 2cpu machine).

from the file i attached in the last mail in the big-plan-xml-file I
extraxced this plan by cutting of what i thought is too much detail:


<TopExpression>
  <Parallelism>
      <Sort Distinct="true">
          <Parallelism PartitioningType="Hash">
              <Hash>
                  <ComputeScalar>
                      <Hash>
                          <Parallelism PartitioningType="Hash">
                              <TableScan>
                          <Parallelism PartitioningType="Hash">
                              <ComputeScalar>
                                  <NestedLoops Optimized="false">
                                      <TableScan>
                                      <Spool>
                                          <TableScan>



It is the third one Gregory had get from the image (that had a cutoff
on the top because of my resolution)

If you look inside the xml (plan_bigdata.sqlplan) you can find
interesting details i think.

for me it's clear that the query is not nice. I used it to provoke the
optimizer only for studiing the possibilities of what can be optimized
how far.

from my POV it's not clear why PostgreSQL runs into the triple-table
nested-loop which will lead to a cardinality of abt 8*10^9 where it
could make the t1,t2-nested loop with cardinality 5*10^6 and then a
merge-anti-semi-join on t1 (#1300) which should be able to do in about
log(5*10^6)*5*10^6. so there is a gap of nearly factor 1000.

for me it looks like it can not rotate the calculation of the
expression (2*(a1+a2)) outside of the "not exists"?

best regards, alexander.

PS: I will now let the query run to an end if it takes less than 10 hours

2007/12/19, Tom Lane <tgl@sss.pgh.pa.us>:
> Gregory Stark <stark@enterprisedb.com> writes:
> > "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> >> It's possible that MS-SQL is doing something analogous to the
> >> hashed-subplan approach (hopefully with suitable tweaking for the NULL
> >> case) but even then it's hard to see how it could take only 9 sec.
> >> The cartesian product is too big.
>
> > Fwiw it seems MS-SQL is doing something funny. The three plans posted in the
> > screenshots for the "small", "mediu", and "large" cases are:
> > ...
> > Postgres is doing something equivalent to the first plan.
>
> Hmm.  I think the second plan is probably equivalent to the
> hashed-subplan behavior that you can get in PG by rewriting the query to
> NOT IN as I illustrated.  The third plan looks to be the same thing plus
> some parallelization frammishes.
>
> I'm not clear on what "small/medium/large" means, in particular not on
> which of these corresponds to the OP's report of 9-second execution.
>
>                         regards, tom lane
>


--
Alexander Steffens
Georgstr. 3
53111 Bonn
+49 228 2661615

Re: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
Alexander Steffens
Date:
Congratulations PostgreSQL-Team for your work!

with the current 8.4 B1 Version of PostgreSQL the bad execution time
of my example-Query containing a NOT EXISTS
changed from "aborted after hours" to just 5 seconds - double as quick
as my result with MS-SQL.

That's wonderful!

> insert into t1
> select distinct (t1.a + t2.a)*2
> from t1,t2
> where not exists (
>  select * from t1  tt where tt.a =3D (t1.a + t2.a)*2
> )

I assume there is now a hashed ANTI-SEMI-JOIN operator implemented?

Thanks

Alexander
--=20
Dipl.-Math. Alexander Steffens
Dorotheenstr. 16
23564 L=C3=BCbeck

Re: BUG #3826: Very Slow Execution of examplequery (wrong plan?)

From
Tom Lane
Date:
Alexander Steffens <mail@a-st.de> writes:
> I assume there is now a hashed ANTI-SEMI-JOIN operator implemented?

What does EXPLAIN say about it?

            regards, tom lane