Home > mailing lists

Re: optimizer picks smaller table to drive nested loops? - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: optimizer picks smaller table to drive nested loops?
Date	July 14, 2003 18:04:16
Msg-id	10251.1058205875@sss.pgh.pa.us Whole thread Raw
In response to	optimizer picks smaller table to drive nested loops? (Greg Stark <gsstark@mit.edu>)
Responses	Re: optimizer picks smaller table to drive nested loops?
List	pgsql-performance

Tree view

Greg Stark <gsstark@mit.edu> writes:
> slo=> explain analyze select * from region, (select 1 union all select 2) as x;
>                                                        QUERY PLAN
  
>
------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..11162.00 rows=5534 width=108) (actual time=0.13..541.19 rows=5534 loops=1)
>    ->  Subquery Scan x  (cost=0.00..2.00 rows=2 width=0) (actual time=0.03..0.08 rows=2 loops=1)
>          ->  Append  (cost=0.00..2.00 rows=2 width=0) (actual time=0.02..0.05 rows=2 loops=1)
>                ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.02 rows=1
loops=1)
>                      ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1)
>                ->  Subquery Scan "*SELECT* 2"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.02 rows=1
loops=1)
>                      ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1)
>    ->  Seq Scan on region  (cost=0.00..2813.00 rows=2767 width=104) (actual time=0.03..123.44 rows=2767 loops=2)
>  Total runtime: 566.24 msec
> (9 rows)

> Wouldn't it be faster to drive the nested loop the other way around?

You seem to be using a rather wacko value of cpu_tuple_cost; those
Result nodes ought to be costed at 0.01 not 1.00.  With the default
cost settings I get an other-way-around plan for a similar test.
(I used tenk1 from the regression database as the outer table.)

However, it looks to me like the subquery-scan-outside plan probably
is the faster one, on both my machine and yours.  I get

regression=# explain analyze select * from tenk1, (select 1 union all select 2) as x;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..858.00 rows=20000 width=248) (actual time=0.42..3648.61 rows=20000 loops=1)
   ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.23..199.97 rows=10000 loops=1)
   ->  Subquery Scan x  (cost=0.00..0.02 rows=2 width=0) (actual time=0.07..0.24 rows=2 loops=10000)
         ->  Append  (cost=0.00..0.02 rows=2 width=0) (actual time=0.05..0.17 rows=2 loops=10000)
               ->  Subquery Scan "*SELECT* 1"  (cost=0.00..0.01 rows=1 width=0) (actual time=0.03..0.06 rows=1
loops=10000)
                     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=10000)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..0.01 rows=1 width=0) (actual time=0.03..0.06 rows=1
loops=10000)
                     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=10000)
 Total runtime: 3807.39 msec
(9 rows)

regression=# set cpu_tuple_cost = 1;
SET
regression=# explain analyze select * from tenk1, (select 1 union all select 2) as x;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..40718.00 rows=20000 width=248) (actual time=0.39..1214.42 rows=20000 loops=1)
   ->  Subquery Scan x  (cost=0.00..2.00 rows=2 width=0) (actual time=0.10..0.31 rows=2 loops=1)
         ->  Append  (cost=0.00..2.00 rows=2 width=0) (actual time=0.06..0.22 rows=2 loops=1)
               ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.05..0.08 rows=1 loops=1)
                     ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.03..0.04 rows=1 loops=1)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.05..0.08 rows=1 loops=1)
                     ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.02..0.03 rows=1 loops=1)
   ->  Seq Scan on tenk1  (cost=0.00..10358.00 rows=10000 width=244) (actual time=0.17..188.37 rows=10000 loops=2)
 Total runtime: 1371.17 msec
(9 rows)

The flipover point between the two plans is cpu_tuple_cost = 0.04 in
my tests.

It looks to me like we've neglected to charge any cost associated with
Subquery Scan or Append nodes.  Certainly Subquery Scan ought to charge
at least a cpu_tuple_cost per row.  Perhaps Append ought to as well ---
although since it doesn't do selection or projection, I'm not quite sure
where the time is going in that case.  (Hmmm... time to get out the
profiler...)

            regards, tom lane

pgsql-performance by date:

From: "Nick Fankhauser"
Date: 14 July 2003, 18:00:46
Subject: Re: Tunning FreeeBSD and PostgreSQL

From: Greg Stark
Date: 14 July 2003, 18:40:56
Subject: Re: optimizer picks smaller table to drive nested loops?

Re: optimizer picks smaller table to drive nested loops? - Mailing list pgsql-performance

Previous

Next