Thread: plan difference between set-returning function with ROWS within IN() and a plain join

L.S.

I'm noticing a difference in planning between a join and an in() clause,
before trying to create an independent test-case, I'd like to know if there's
an obvious reason why this would be happening:


=> the relatively simple PLPGSQL si_credit_tree() function has 'ROWS 5' in
it's definition


df=# select version();
                                version
------------------------------------------------------------------------
 PostgreSQL 8.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
(1 row)



db=# explain analyse
    select sum(si.base_total_val)
    from sales_invoice si, si_credit_tree(80500007) foo(id)
    where si.id = foo.id;
                                                                    QUERY PLAN
                   
-----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=42.73..42.74 rows=1 width=8) (actual time=0.458..0.459
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..42.71 rows=5 width=8) (actual
time=0.361..0.429 rows=5 loops=1)
         ->  Function Scan on si_credit_tree foo  (cost=0.00..1.30 rows=5
width=4) (actual time=0.339..0.347 rows=5 loops=1)
         ->  Index Scan using sales_invoice_pkey on sales_invoice si
(cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5)
               Index Cond: (si.id = foo.id)

Total runtime: 0.562 ms




db=# explain analyse
    select sum(base_total_val)
    from sales_invoice
    where id in (select id from si_credit_tree(80500007));
                                                                   QUERY PLAN
                   
-----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15338.31..15338.32 rows=1 width=8) (actual
time=3349.401..3349.402 rows=1 loops=1)
   ->  Seq Scan on sales_invoice  (cost=0.00..15311.19 rows=10846 width=8)
(actual time=0.781..3279.046 rows=21703 loops=1)
         Filter: (subplan)
         SubPlan
           ->  Function Scan on si_credit_tree  (cost=0.00..1.30 rows=5
width=0) (actual time=0.146..0.146 rows=1 loops=21703)

Total runtime: 3349.501 ms





I'd hoped the planner would use the ROWS=5 knowledge a bit better:


db=# explain analyse
    select sum(base_total_val)
    from sales_invoice
    where id in (80500007,80500008,80500009,80500010,80500011);
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=40.21..40.22 rows=1 width=8) (actual time=0.105..0.106
rows=1 loops=1)
   ->  Bitmap Heap Scan on sales_invoice  (cost=21.29..40.19 rows=5 width=8)
(actual time=0.061..0.070 rows=5 loops=1)
         Recheck Cond: (id = ANY
('{80500007,80500008,80500009,80500010,80500011}'::integer[]))
         ->  Bitmap Index Scan on sales_invoice_pkey  (cost=0.00..21.29 rows=5
width=0) (actual time=0.049..0.049 rows=5 loops=1)
               Index Cond: (id = ANY
('{80500007,80500008,80500009,80500010,80500011}'::integer[]))

Total runtime: 0.201 ms






--
Best,




Frank.

On Tue, 06 May 2008 10:21:43 +0200, Frank van Vugt <ftm.van.vugt@foxi.nl>
wrote:

> L.S.
>
> I'm noticing a difference in planning between a join and an in() clause,
> before trying to create an independent test-case, I'd like to know if
> there's
> an obvious reason why this would be happening:

Is the function STABLE ?

> > I'm noticing a difference in planning between a join and an in() clause,
> > before trying to create an independent test-case, I'd like to know if
> > there's
> > an obvious reason why this would be happening:
>
> Is the function STABLE ?

Yep.

For the record, even changing it to immutable doesn't make a difference in
performance here.



--
Best,




Frank.

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> db=# explain analyse
>     select sum(base_total_val)
>     from sales_invoice
>     where id in (select id from si_credit_tree(80500007));

Did you check whether this query even gives the right answer?  The
EXPLAIN output shows that 21703 rows of sales_invoice are being
selected, which is a whole lot different than the other behavior.

I think you forgot the alias foo(id) in the subselect and it's
actually reducing to "where id in (id)", ie, TRUE.

            regards, tom lane

> > db=# explain analyse
> >     select sum(base_total_val)
> >     from sales_invoice
> >     where id in (select id from si_credit_tree(80500007));
>
> Did you check whether this query even gives the right answer?

You knew the right answer to that already ;)

> I think you forgot the alias foo(id) in the subselect and it's
> actually reducing to "where id in (id)", ie, TRUE.

Tricky, but completely obvious once pointed out, that's _exactly_ what was
happening.


db=# explain analyse
    select sum(base_total_val)
    from sales_invoice
    where id in (select id from si_credit_tree(80500007) foo(id));
                                                                     QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=42.79..42.80 rows=1 width=8) (actual time=0.440..0.441
rows=1 loops=1)
   ->  Nested Loop  (cost=1.31..42.77 rows=5 width=8) (actual
time=0.346..0.413 rows=5 loops=1)
         ->  HashAggregate  (cost=1.31..1.36 rows=5 width=4) (actual
time=0.327..0.335 rows=5 loops=1)
               ->  Function Scan on si_credit_tree foo  (cost=0.00..1.30
rows=5 width=4) (actual time=0.300..0.306 rows=5 loops=1)
         ->  Index Scan using sales_invoice_pkey on sales_invoice
(cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5)
               Index Cond: (sales_invoice.id = foo.id)

Total runtime: 0.559 ms




Thanks for the replies!


--
Best,




Frank.

On Tue, May 6, 2008 at 11:27 AM, Frank van Vugt <ftm.van.vugt@foxi.nl> wrote:
>> > db=# explain analyse
>> >     select sum(base_total_val)
>> >     from sales_invoice
>> >     where id in (select id from si_credit_tree(80500007));
>>
>> Did you check whether this query even gives the right answer?
>
> You knew the right answer to that already ;)
>
>> I think you forgot the alias foo(id) in the subselect and it's
>> actually reducing to "where id in (id)", ie, TRUE.
>
> Tricky, but completely obvious once pointed out, that's _exactly_ what was
> happening.

This is one of the reasons why, for a table named 'foo', I name the
columns 'foo_id', not 'id'.  Also, if you prefix the id column with
the table name, you can usually use JOIN USING which is a little bit
tighter and easier than JOIN ON.

merlin