Thread: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

performance regression, 7.2.3 -> 7.3b5 w/ VIEW

From
"Ross J. Reedstrom"
Date:
Hey Hackers - 
I was testing beta5 and found a performance regression involving
application of constraints into a VIEW - I've got a view that is fairly
expensive, involving a subselet and an aggregate.  When the query is
rewritten in 7.2.3, the toplevel constraint is used to filter before
the subselect - in 7.3b5, it comes after.

For this query, the difference is 160 ms vs. 2 sec. Any reason for this
change?

Here's the view def., and explain analyzes for the view, and two hand
rewritten versions (since the explain analyze in 7.2.3 doesn't display
the filter parameters)

Ross

CREATE VIEW current_modules AS       SELECT * FROM modules m              WHERE module_ident =
(SELECTmax(module_ident) FROM modules                            WHERE m.moduleid = moduleid GROUP BY moduleid);
 

repository=# explain analyze select * from current_modules where name ~ 'Fourier';
                          QUERY PLAN                                                                  
 

----------------------------------------------------------------------------------------------------------------------------------------------Seq
Scanon modules m  (cost=0.00..116090.23 rows=1 width=135) (actual time=18.74..1968.01 rows=37 loops=1)  Filter:
((module_ident= (subplan)) AND (name ~ 'Fourier'::text))  SubPlan    ->  Aggregate  (cost=0.00..25.57 rows=1 width=13)
(actualtime=0.41..0.41 rows=1 loops=4534)          ->  Group  (cost=0.00..25.55 rows=6 width=13) (actual
time=0.08..0.37rows=10 loops=4534)                ->  Index Scan using moduleid_idx on modules  (cost=0.00..25.54
rows=6width=13) (actual time=0.06..0.27 rows=10 loops=4534)                      Index Cond: ($0 = moduleid)Total
runtime:1968.65 msec
 
(8 rows)

repository=# explain analyze select module_ident from modules m where m.name ~ 'Fourier' and m.module_ident = (SELECT
max(modules.module_ident)as max from modules where (m.moduleid=moduleid) group by modules.moduleid);
                                           QUERY PLAN                                                                 
 

--------------------------------------------------------------------------------------------------------------------------------------------Seq
Scanon modules m  (cost=0.00..116090.23 rows=1 width=4) (actual time=2.46..158.33 rows=37 loops=1)  Filter: ((name ~
'Fourier'::text)AND (module_ident = (subplan)))  SubPlan    ->  Aggregate  (cost=0.00..25.57 rows=1 width=13) (actual
time=0.35..0.35rows=1 loops=270)          ->  Group  (cost=0.00..25.55 rows=6 width=13) (actual time=0.07..0.31 rows=9
loops=270)               ->  Index Scan using moduleid_idx on modules  (cost=0.00..25.54 rows=6 width=13) (actual
time=0.06..0.22rows=9 loops=270)                      Index Cond: ($0 = moduleid)Total runtime: 158.81 msec
 
(8 rows)

repository=# explain analyze  select module_ident from modules m where m.module_ident = (SELECT
max(modules.module_ident)as max from modules where (m.moduleid=moduleid) group by modules.moduleid) and m.name ~
'Fourier';                                                                QUERY PLAN
                             
 

----------------------------------------------------------------------------------------------------------------------------------------------Seq
Scanon modules m  (cost=0.00..116090.23 rows=1 width=4) (actual time=18.66..1959.31 rows=37 loops=1)  Filter:
((module_ident= (subplan)) AND (name ~ 'Fourier'::text))  SubPlan    ->  Aggregate  (cost=0.00..25.57 rows=1 width=13)
(actualtime=0.41..0.41 rows=1 loops=4534)          ->  Group  (cost=0.00..25.55 rows=6 width=13) (actual
time=0.08..0.37rows=10 loops=4534)                ->  Index Scan using moduleid_idx on modules  (cost=0.00..25.54
rows=6width=13) (actual time=0.06..0.27 rows=10 loops=4534)                      Index Cond: ($0 = moduleid)Total
runtime:1959.84 msec
 
(8 rows)


Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

From
Mike Mascari
Date:
Ross J. Reedstrom wrote:
> Hey Hackers - 
> I was testing beta5 and found a performance regression involving
> application of constraints into a VIEW - I've got a view that is fairly
> expensive, involving a subselet and an aggregate.  When the query is
> rewritten in 7.2.3, the toplevel constraint is used to filter before
> the subselect - in 7.3b5, it comes after.
> 
> For this query, the difference is 160 ms vs. 2 sec. Any reason for this
> change?

I could be way off base, but here's a shot in the dark:


http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3D0885E1.8F369ACA%40mascari.com&rnum=3&prev=/groups%3Fq%3DMike%2BMascari%2Bsecurity%2BTom%2BLane%26ie%3DUTF-8%26oe%3DUTF-8%26hl%3Den

At the time I thought PostgreSQL was doing something naughty by 
allowing user functions to be invoked on data that would 
ultimately not be returned. Now I know how Oracle uses VIEWS for 
row security: Oracle functions invoked in DML statements can't 
record any changes to the database. So if the above is the 
cause, I wouldn't have any problems with the patch being 
reversed. Maybe separate privileges for read-only vs. read-write 
functions are in order at some point in the future though...

Mike Mascari
mascarm@mascari.com




Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

From
"Ross J. Reedstrom"
Date:
On Wed, Nov 13, 2002 at 02:40:40AM -0500, Mike Mascari wrote:
> Ross J. Reedstrom wrote:
> >
> >For this query, the difference is 160 ms vs. 2 sec. Any reason for this
> >change?
> 
> I could be way off base, but here's a shot in the dark:
> 
>
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3D0885E1.8F369ACA%40mascari.com&rnum=3&prev=/groups%3Fq%3DMike%2BMascari%2Bsecurity%2BTom%2BLane%26ie%3DUTF-8%26oe%3DUTF-8%26hl%3Den
> 
> At the time I thought PostgreSQL was doing something naughty by 
> allowing user functions to be invoked on data that would 
> ultimately not be returned. Now I know how Oracle uses VIEWS for 
> row security: Oracle functions invoked in DML statements can't 
> record any changes to the database. So if the above is the 
> cause, I wouldn't have any problems with the patch being 
> reversed. Maybe separate privileges for read-only vs. read-write 
> functions are in order at some point in the future though...

Bingo, that solved it. I'm back to 160 ms. What does Tom feel about
removing this? Is there some way the planner could have known which
was the smarter/faster order of application?

Ross


Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

From
Tommi Maekitalo
Date:
Am Mittwoch, 13. November 2002 07:22 schrieb Ross J. Reedstrom:
> Hey Hackers -
...
>
> CREATE VIEW current_modules AS
>        SELECT * FROM modules m
>               WHERE module_ident =
>                     (SELECT max(module_ident) FROM modules
>                             WHERE m.moduleid = moduleid GROUP BY moduleid);
>
...

I just wonder if you really need the GROUP BY. The subselect should return
exactly one row and so max does without GROUP BY: CREATE VIEW current_modules AS        SELECT * FROM modules m
     WHERE module_ident =                     (SELECT max(module_ident) FROM modules                             WHERE
m.moduleid= moduleid); 


Tommi

--
Dr. Eckhardt + Partner GmbH
http://www.epgmbh.de


Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

From
Tom Lane
Date:
"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> Bingo, that solved it. I'm back to 160 ms. What does Tom feel about
> removing this? Is there some way the planner could have known which
> was the smarter/faster order of application?

As I said in the previous thread, I don't have a lot of patience with
the notion of expecting the planner to promise anything about evaluation
order of WHERE clauses.  I wasn't thrilled with adding the patch, but
I'm even less thrilled with the idea of backing it out now.

There has been some discussion of reordering WHERE clauses based on
estimated cost --- a simple form of this would be to push any clauses
involving subplans to the end of the list.  I haven't done anything
about that yet, mainly because I'm unsure if there are cases where it
would be worse than not doing it.
        regards, tom lane


Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

From
"Ross J. Reedstrom"
Date:
On Wed, Nov 13, 2002 at 08:58:04AM -0500, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> > Bingo, that solved it. I'm back to 160 ms. What does Tom feel about
> > removing this? Is there some way the planner could have known which
> > was the smarter/faster order of application?
> 
> As I said in the previous thread, I don't have a lot of patience with
> the notion of expecting the planner to promise anything about evaluation
> order of WHERE clauses.  I wasn't thrilled with adding the patch, but
> I'm even less thrilled with the idea of backing it out now.

Having read the previous thread, I realized you wouldn't be thrilled
about it, that's why I asked. While I agree in principle (don't promise
a particular order), the pragmatic corollary of that principle would say
if you don't favor a particular order, then don't change the order from
previous stable releases.

Unlike the previous thread, I'm not looking for a particular order:
there're no side-effects I'm trying to exploit, I just want the best
possible performance.

> There has been some discussion of reordering WHERE clauses based on
> estimated cost --- a simple form of this would be to push any clauses
> involving subplans to the end of the list.  I haven't done anything
> about that yet, mainly because I'm unsure if there are cases where it
> would be worse than not doing it.

Me either, though my gut says subplans are expensive. I _can_ trivially
write queries that do the wrong thing (suboptimal order of WHERE clauses)
with or without this patch. 

It's clearly the wrong time to try to do anything fancier, but the
conservative thing to do (in my unbiased opinion ;-) is put it back
the way it was for the last stable release, on the principle of least
surprise - there seems to be no bug fixed or functionality gained by
keeping the change.

Seems like this is at least worth a TODO:

* Examine WHERE clause order optimization possibilities, particularly
with subplans

Ross


Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

From
"Ross J. Reedstrom"
Date:
You're right, I should remove that (cruft left over from when the
subselect wasn't). However, it has no impact on the planner at hand:
removing it does trim 25% from the execution time, but getting the
WHERE clauses used in the right order gains an order of magnitude.

Both apply. Thanks, I'll fix it.

Ross

On Wed, Nov 13, 2002 at 09:28:38AM +0100, Tommi Maekitalo wrote:
> Am Mittwoch, 13. November 2002 07:22 schrieb Ross J. Reedstrom:
> > Hey Hackers -
> ...
> >
> > CREATE VIEW current_modules AS
> >        SELECT * FROM modules m
> >               WHERE module_ident =
> >                     (SELECT max(module_ident) FROM modules
> >                             WHERE m.moduleid = moduleid GROUP BY moduleid);
> >
> ...
> 
> I just wonder if you really need the GROUP BY. The subselect should return 
> exactly one row and so max does without GROUP BY:
>   CREATE VIEW current_modules AS
>          SELECT * FROM modules m
>                 WHERE module_ident =
>                       (SELECT max(module_ident) FROM modules
>                               WHERE m.moduleid = moduleid);