Thread: plan difference between set-returning function with ROWS within IN() and a plain join
plan difference between set-returning function with ROWS within IN() and a plain join
From
Frank van Vugt
Date:
L.S. I'm noticing a difference in planning between a join and an in() clause, before trying to create an independent test-case, I'd like to know if there's an obvious reason why this would be happening: => the relatively simple PLPGSQL si_credit_tree() function has 'ROWS 5' in it's definition df=# select version(); version ------------------------------------------------------------------------ PostgreSQL 8.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 (1 row) db=# explain analyse select sum(si.base_total_val) from sales_invoice si, si_credit_tree(80500007) foo(id) where si.id = foo.id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=42.73..42.74 rows=1 width=8) (actual time=0.458..0.459 rows=1 loops=1) -> Nested Loop (cost=0.00..42.71 rows=5 width=8) (actual time=0.361..0.429 rows=5 loops=1) -> Function Scan on si_credit_tree foo (cost=0.00..1.30 rows=5 width=4) (actual time=0.339..0.347 rows=5 loops=1) -> Index Scan using sales_invoice_pkey on sales_invoice si (cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5) Index Cond: (si.id = foo.id) Total runtime: 0.562 ms db=# explain analyse select sum(base_total_val) from sales_invoice where id in (select id from si_credit_tree(80500007)); QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=15338.31..15338.32 rows=1 width=8) (actual time=3349.401..3349.402 rows=1 loops=1) -> Seq Scan on sales_invoice (cost=0.00..15311.19 rows=10846 width=8) (actual time=0.781..3279.046 rows=21703 loops=1) Filter: (subplan) SubPlan -> Function Scan on si_credit_tree (cost=0.00..1.30 rows=5 width=0) (actual time=0.146..0.146 rows=1 loops=21703) Total runtime: 3349.501 ms I'd hoped the planner would use the ROWS=5 knowledge a bit better: db=# explain analyse select sum(base_total_val) from sales_invoice where id in (80500007,80500008,80500009,80500010,80500011); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=40.21..40.22 rows=1 width=8) (actual time=0.105..0.106 rows=1 loops=1) -> Bitmap Heap Scan on sales_invoice (cost=21.29..40.19 rows=5 width=8) (actual time=0.061..0.070 rows=5 loops=1) Recheck Cond: (id = ANY ('{80500007,80500008,80500009,80500010,80500011}'::integer[])) -> Bitmap Index Scan on sales_invoice_pkey (cost=0.00..21.29 rows=5 width=0) (actual time=0.049..0.049 rows=5 loops=1) Index Cond: (id = ANY ('{80500007,80500008,80500009,80500010,80500011}'::integer[])) Total runtime: 0.201 ms -- Best, Frank.
Re: plan difference between set-returning function with ROWS within IN() and a plain join
From
PFC
Date:
On Tue, 06 May 2008 10:21:43 +0200, Frank van Vugt <ftm.van.vugt@foxi.nl> wrote: > L.S. > > I'm noticing a difference in planning between a join and an in() clause, > before trying to create an independent test-case, I'd like to know if > there's > an obvious reason why this would be happening: Is the function STABLE ?
Re: plan difference between set-returning function with ROWS within IN() and a plain join
From
Frank van Vugt
Date:
> > I'm noticing a difference in planning between a join and an in() clause, > > before trying to create an independent test-case, I'd like to know if > > there's > > an obvious reason why this would be happening: > > Is the function STABLE ? Yep. For the record, even changing it to immutable doesn't make a difference in performance here. -- Best, Frank.
Re: plan difference between set-returning function with ROWS within IN() and a plain join
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > db=# explain analyse > select sum(base_total_val) > from sales_invoice > where id in (select id from si_credit_tree(80500007)); Did you check whether this query even gives the right answer? The EXPLAIN output shows that 21703 rows of sales_invoice are being selected, which is a whole lot different than the other behavior. I think you forgot the alias foo(id) in the subselect and it's actually reducing to "where id in (id)", ie, TRUE. regards, tom lane
Re: plan difference between set-returning function with ROWS within IN() and a plain join
From
Frank van Vugt
Date:
> > db=# explain analyse > > select sum(base_total_val) > > from sales_invoice > > where id in (select id from si_credit_tree(80500007)); > > Did you check whether this query even gives the right answer? You knew the right answer to that already ;) > I think you forgot the alias foo(id) in the subselect and it's > actually reducing to "where id in (id)", ie, TRUE. Tricky, but completely obvious once pointed out, that's _exactly_ what was happening. db=# explain analyse select sum(base_total_val) from sales_invoice where id in (select id from si_credit_tree(80500007) foo(id)); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=42.79..42.80 rows=1 width=8) (actual time=0.440..0.441 rows=1 loops=1) -> Nested Loop (cost=1.31..42.77 rows=5 width=8) (actual time=0.346..0.413 rows=5 loops=1) -> HashAggregate (cost=1.31..1.36 rows=5 width=4) (actual time=0.327..0.335 rows=5 loops=1) -> Function Scan on si_credit_tree foo (cost=0.00..1.30 rows=5 width=4) (actual time=0.300..0.306 rows=5 loops=1) -> Index Scan using sales_invoice_pkey on sales_invoice (cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5) Index Cond: (sales_invoice.id = foo.id) Total runtime: 0.559 ms Thanks for the replies! -- Best, Frank.
Re: plan difference between set-returning function with ROWS within IN() and a plain join
From
"Merlin Moncure"
Date:
On Tue, May 6, 2008 at 11:27 AM, Frank van Vugt <ftm.van.vugt@foxi.nl> wrote: >> > db=# explain analyse >> > select sum(base_total_val) >> > from sales_invoice >> > where id in (select id from si_credit_tree(80500007)); >> >> Did you check whether this query even gives the right answer? > > You knew the right answer to that already ;) > >> I think you forgot the alias foo(id) in the subselect and it's >> actually reducing to "where id in (id)", ie, TRUE. > > Tricky, but completely obvious once pointed out, that's _exactly_ what was > happening. This is one of the reasons why, for a table named 'foo', I name the columns 'foo_id', not 'id'. Also, if you prefix the id column with the table name, you can usually use JOIN USING which is a little bit tighter and easier than JOIN ON. merlin