Thread: View's plan not taking advantage of WHERE?

View's plan not taking advantage of WHERE?

From
Mike Summers
Date:
I have a VIEW that does not appear to take advantage of the WHERE when given the opportunity:

db=# explain select * from best_for_sale_layouts;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Aggregate  (cost=1977.21..1977.22 rows=1 width=118)
   ->  Sort  (cost=1938.18..1940.05 rows=748 width=122)
         Sort Key: sources.for_sale_layout_rank
         ->  Hash Join  (cost=1.04..1902.48 rows=748 width=122)
               Hash Cond: (for_sale_layouts.source_id = sources.id)
               ->  Append  (cost=0.00..1613.60 rows=74760 width=118)
                     ->  Seq Scan on for_sale_layouts  (cost=0.00..806.74 rows=37374 width=118)
                     ->  Seq Scan on assessor_records  (cost=0.00..806.86 rows=37386 width=118)
               ->  Hash  (cost=1.02..1.02 rows=2 width=8)
                     ->  Seq Scan on sources  (cost=0.00..1.02 rows=2 width=8)
(10 rows)

db=# explain analyze select * from best_for_sale_layouts where address_id = 2871034;;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1979.33..1979.34 rows=1 width=118) (actual time=93569.509..93569.510 rows=1 loops=1)
   Filter: (first_not_null(for_sale_layouts.address_id) = 2871034)
   ->  Sort  (cost=1938.18..1940.05 rows=748 width=122) (actual time=320.652..464.523 rows=74748 loops=1)
         Sort Key: sources.for_sale_layout_rank
         Sort Method: external sort  Disk: 5840kB
         ->  Hash Join  (cost=1.04..1902.48 rows=748 width=122) (actual time=0.057..198.500 rows=74748 loops=1)
               Hash Cond: (for_sale_layouts.source_id = sources.id)
               ->  Append  (cost=0.00..1613.60 rows=74760 width=118) (actual time=0.022..94.871 rows=74748 loops=1)
                     ->  Seq Scan on for_sale_layouts  (cost=0.00..806.74 rows=37374 width=118) (actual time=0.021..22.361 rows=37374 loops=1)
                     ->  Seq Scan on assessor_records  (cost=0.00..806.86 rows=37386 width=118) (actual time=0.011..23.383 rows=37374 loops=1)
               ->  Hash  (cost=1.02..1.02 rows=2 width=8) (actual time=0.015..0.015 rows=2 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Seq Scan on sources  (cost=0.00..1.02 rows=2 width=8) (actual time=0.007..0.010 rows=2 loops=1)
 Total runtime: 93573.390 ms
(14 rows)

If I run the View's select with the WHERE in psql I get what I expect (first_not_null is an aggregate function):

db=# explain analyze SELECT                                                          
    first_not_null(a.id) as id,
    first_not_null(a.address_id) as address_id,
    ....
    first_not_null(a.created_at) as created_at,
    first_not_null(a.updated_at) as updated_at
  FROM (SELECT b.*, for_sale_layout_rank
        FROM ((SELECT *
               FROM for_sale_layouts
               UNION ALL SELECT *
                         FROM assessor_records) AS b INNER JOIN sources ON b.source_id = sources.id)
  ORDER BY for_sale_layout_rank) AS a 
where address_id = 2871034;
                                                                                   QUERY PLAN                                                                                    
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21.23..21.24 rows=1 width=118) (actual time=0.571..0.571 rows=1 loops=1)
   ->  Sort  (cost=17.64..17.64 rows=2 width=122) (actual time=0.272..0.274 rows=2 loops=1)
         Sort Key: sources.for_sale_layout_rank
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.00..17.63 rows=2 width=122) (actual time=0.199..0.253 rows=2 loops=1)
               Join Filter: (for_sale_layouts.source_id = sources.id)
               Rows Removed by Join Filter: 2
               ->  Append  (cost=0.00..16.54 rows=2 width=118) (actual time=0.140..0.185 rows=2 loops=1)
                     ->  Index Scan using index_for_sale_layouts_on_address_id on for_sale_layouts  (cost=0.00..8.27 rows=1 width=118) (actual time=0.139..0.142 rows=1 loops=1)
                           Index Cond: (address_id = 2871034)
                     ->  Index Scan using index_assessor_layouts_on_address_id on assessor_records  (cost=0.00..8.27 rows=1 width=118) (actual time=0.038..0.039 rows=1 loops=1)
                           Index Cond: (address_id = 2871034)
               ->  Materialize  (cost=0.00..1.03 rows=2 width=8) (actual time=0.022..0.025 rows=2 loops=2)
                     ->  Seq Scan on sources  (cost=0.00..1.02 rows=2 width=8) (actual time=0.020..0.023 rows=2 loops=1)
 Total runtime: 0.802 ms
(15 rows)

Is there anything I can do to get the View to update its plan? 

Improvements are welcome, although for other reasons (Rails' ActiveRecord) the View is a must.

Thanks in advance.

Re: View's plan not taking advantage of WHERE?

From
Mike Summers
Date:
It appears that the culprit is a cached query plan, the tables in the UNION have changed and no long match however the View does not throw a "each UNION query must have the same number of columns" error.

Is there a way to force the View's query plan to be updated on each access?

Re: View's plan not taking advantage of WHERE?

From
Scott Marlowe
Date:
On Wed, Jun 5, 2013 at 5:31 AM, Mike Summers <msummers57@gmail.com> wrote:
> It appears that the culprit is a cached query plan, the tables in the UNION
> have changed and no long match however the View does not throw a "each UNION
> query must have the same number of columns" error.
>
> Is there a way to force the View's query plan to be updated on each access?

In postgresql, a view is simply a re-write driven by a rule. So when
you use it it always creates a new plan. Are you sure you don't have >
1 view def in another schema or something?


Re: View's plan not taking advantage of WHERE?

From
Mike Summers
Date:
From what I'm reading the View is frozen when it's created, including it's plan, and the usual solution is to use a set returning function... is this not true?

I've double checked all schemas and the view is only defined once.

Thanks.

Re: View's plan not taking advantage of WHERE?

From
Scott Marlowe
Date:
On Wed, Jun 5, 2013 at 6:01 AM, Mike Summers <msummers57@gmail.com> wrote:
> From what I'm reading the View is frozen when it's created, including it's
> plan, and the usual solution is to use a set returning function... is this
> not true?

No it is not.  Here:

smarlowe=# create table a (id int);
CREATE TABLE
smarlowe=# create index a_id on a(id);
CREATE INDEX
smarlowe=# insert into a values (1),(2),(3);
INSERT 0 3
smarlowe=# create view x as select * from a;
CREATE VIEW

smarlowe=# analyze a;
ANALYZE
smarlowe=# show enable_seqscan;
 enable_seqscan
----------------
 on
(1 row)
smarlowe=# explain select * from x where id=1;
                   QUERY PLAN
-------------------------------------------------
 Seq Scan on a  (cost=0.00..1.04 rows=1 width=4)
   Filter: (id = 1)
(2 rows)

smarlowe=# set enable_seqscan =off;
SET
smarlowe=# explain select * from x where id=1;
                          QUERY PLAN
--------------------------------------------------------------
 Index Scan using a_id on a  (cost=0.00..8.27 rows=1 width=4)
   Index Cond: (id = 1)
(2 rows)

smarlowe=#

> I've double checked all schemas and the view is only defined once.

Well you're gonna have to come up with some kind of self-contained
test to show what's happening then.


Re: View's plan not taking advantage of WHERE?

From
Scott Marlowe
Date:
Note that the view DEFINITION is frozen, the query PLAN is NOT.


Re: View's plan not taking advantage of WHERE?

From
Mike Summers
Date:
Thanks Scott, interesting.

Other than the tests in the original post do you have any suggestions?

Thanks for your time.

Re: View's plan not taking advantage of WHERE?

From
Tom Lane
Date:
Mike Summers <msummers57@gmail.com> writes:
> Other than the tests in the original post do you have any suggestions?

If you're speaking of
http://www.postgresql.org/message-id/CAJGeMG89QbDxMab7-aPD_yXVsGx7Q=auXYM9UFVaq06cRZ4E2A@mail.gmail.com
that has nothing to do with cached plans, obsolete or otherwise.
You seem to be wishing that the planner would deduce "x = constant" from
"aggregate_function(x) = constant", which is wrong on its face.
AFAICS it's not even correct for the special case that the aggregate
function is first_not_null(), since you have multiple occurrences of
that in the view and there's no certainty that they all choose to return
values from the same row.

Even if the optimization is valid given some additional assumptions that
you've not told us, it's going to be a sufficiently narrow case that
I doubt we'd ever be willing to expend planner cycles on checking for it.

If you want WHERE clauses to be pushed down into this query you need to
think of some other way to define the query.  Perhaps something
involving GROUP BY or DISTINCT instead of first_not_null() would be more
amenable to optimization.

            regards, tom lane


Re: View's plan not taking advantage of WHERE?

From
Mike Summers
Date:
Thanks Tom,

I've found other discussion of this, that aggregates foul-up the planner with views.

GROUP BY & DISTINCT don't work, we're trying to grab a subset of records and backfill any nulls to present a complete, single record...  we're stuck with a view as this is used by a Rails app.

We'll see what else we can come-up with.

Thanks again.


On Wed, Jun 5, 2013 at 9:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mike Summers <msummers57@gmail.com> writes:
> Other than the tests in the original post do you have any suggestions?

If you're speaking of
http://www.postgresql.org/message-id/CAJGeMG89QbDxMab7-aPD_yXVsGx7Q=auXYM9UFVaq06cRZ4E2A@mail.gmail.com
that has nothing to do with cached plans, obsolete or otherwise.
You seem to be wishing that the planner would deduce "x = constant" from
"aggregate_function(x) = constant", which is wrong on its face.
AFAICS it's not even correct for the special case that the aggregate
function is first_not_null(), since you have multiple occurrences of
that in the view and there's no certainty that they all choose to return
values from the same row.

Even if the optimization is valid given some additional assumptions that
you've not told us, it's going to be a sufficiently narrow case that
I doubt we'd ever be willing to expend planner cycles on checking for it.

If you want WHERE clauses to be pushed down into this query you need to
think of some other way to define the query.  Perhaps something
involving GROUP BY or DISTINCT instead of first_not_null() would be more
amenable to optimization.

                        regards, tom lane