Re: optimizing constant quals within outer joins - Mailing list pgsql-hackers

From Phil Frost
Subject Re: optimizing constant quals within outer joins
Date
Msg-id 20060628162404.GB23509@unununium.org
Whole thread Raw
In response to Re: optimizing constant quals within outer joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Jun 28, 2006 at 11:40:52AM -0400, Tom Lane wrote:
> Phil Frost <indigo@bitglue.com> writes:
> > The planner in fact can move the function around without changing the
> > output.
> 
> Not when it's within the nullable side of an outer join --- moving a
> WHERE clause up out of that would make the difference between no row
> out, and a null-extended row out, which are certainly not the same.
> 
> I'm not sure why it's not pulling up from the left side of the left join
> though.  That might be a bug.  What PG version is this exactly?
> 
> Of course the real question is why is your app generating such poorly
> phrased queries ;-)

Sure it can't pull the condition to the root result node, but it can
make an intermediate result node that is a child of the join and wraps
the sequential scan. "offset 0" makes it do this. I'd like this:

create table a(i int);
create table b(i int);
create function stable_function() returns bool language plpgsql stable as $$   begin return true; end $$;
create view c as select * from b where stable_function();
explain select * from a left join c using (i);                          QUERY PLAN                            
-----------------------------------------------------------------Merge Right Join  (cost=220.32..338.32 rows=7629
width=4) Merge Cond: ("outer".i = "inner".i)  ->  Sort  (cost=70.54..72.32 rows=713 width=4)        Sort Key: b.i
->  Seq Scan on b  (cost=0.00..36.75 rows=713 width=4)              Filter: stable_function()  ->  Sort
(cost=149.78..155.13rows=2140 width=4)        Sort Key: a.i        ->  Seq Scan on a  (cost=0.00..31.40 rows=2140
width=4)

to become this:
                          QUERY PLAN                            
-----------------------------------------------------------------Merge Right Join  (cost=220.32..338.32 rows=7629
width=4) Merge Cond: ("outer".i = "inner".i)  ->  Sort  (cost=70.54..72.32 rows=713 width=4)        Sort Key: b.i
->  Result              One-Time Filter: stable_function()              ->  Seq Scan on b  (cost=0.00..36.75 rows=713
width=4)                   Filter: stable_function()  ->  Sort  (cost=149.78..155.13 rows=2140 width=4)        Sort
Key:a.i        ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=4)
 

That will make the same results. Maybe there is something about the
implementation that I don't understand that makes it hard, but the
concept is simple: before you do a seq scan on b, you call
stable_function(), and if it returns true, you just do the sequential
scan without calling stable_function() for each row. If it returns
false, you can not do the sequental scan at all, and return the empty
set immediately.

I wasn't aware my queries are "badly phrased". The application generates
quite nice queries like "select * from saleorder_summary", which is a view
along the lines of 'select * from "order" left join saleorder using
(objectid)'. "order" and "saleorder" are views like "select * from
private.order where i_have_global_priv(20)". The subqueries are in the
examples I gave just to make it simpler to demonstrate.

The only other way I can think of phrasing a query like that is perhaps

select *
from private.order
left join purchaseorder on (   order.objectid = purchaseorder.objectid and i_have_global_priv(31)
)

This of course would not only be hugely inconvinent, but would require
that regular users have unrestricted access to the base tables, which
totally defeats the purpose of using veil. Also, that too is not
optimized as well as it could be:

test=# explain select * from a left join b on (a.i = b.i and stable_function());                          QUERY PLAN
                       
 
-----------------------------------------------------------------Merge Left Join  (cost=299.56..710.97 rows=7633
width=8) Merge Cond: ("outer".i = "inner".i)  Join Filter: stable_function()  ->  Sort  (cost=149.78..155.13 rows=2140
width=4)       Sort Key: a.i        ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=4)  ->  Sort
(cost=149.78..155.13rows=2140 width=4)        Sort Key: b.i        ->  Seq Scan on b  (cost=0.00..31.40 rows=2140
width=4)

stable_function() will still be called multiple times needlessly.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Instability in TRUNCATE regression test
Next
From: Marc Munro
Date:
Subject: Index corruption