Re: Possible bug in CASE evaluation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Possible bug in CASE evaluation
Date
Msg-id 20130625130152.GA7716@awork2.anarazel.de
Whole thread Raw
In response to Re: Possible bug in CASE evaluation  (Noah Misch <noah@leadboat.com>)
Responses Re: Possible bug in CASE evaluation
Re: Possible bug in CASE evaluation
List pgsql-hackers
On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
> On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
> > On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
> > > On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
> > > > That being said, if we discover a simple-enough fix that performs well, we may
> > > > as well incorporate it.
> > >
> > > What about passing another parameter down eval_const_expressions_mutator
> > > (which is static, so changing the API isn't a problem) that basically
> > > tells us whether we actually should evaluate expressions or just perform
> > > the transformation?
> > > There's seems to be basically a couple of places where we call dangerous
> > > things:
> > > * simplify_function (via ->evaluate_function->evaluate_expr) which is
> > >   called in a bunch of places
> > > * evaluate_expr which is directly called in T_ArrayExpr
> > >   T_ArrayCoerceExpr
> > >
> > > All places I've inspected so far need to deal with simplify_function
> > > returning NULL anyway, so that seems like a viable fix.
> >
> > *Prototype* patch - that seems simple enough - attached. Opinions?
>
> Simple enough, yes.  The other point still stands.

You mean performance? Primarily I still think we should first worry
about correctness first and then about performance. And CASE is the
documented (and really only, without writing procedual code) solution to
use for the cases where evaluation order actually *is* important.

But anyway, the question is to find realistic cases to measure the
performance of. Obviously you can just make arbitrarily expensive
expressions that can be computed full during constant folding. Which I
don't find very interesting, do you?

So, what I've done is to measure the performance difference when doing
full table queries of some CASE containing system views.

best of 5 everytime:
SELECT * FROM pg_stats;
master: 28.287 patched: 28.565

SELECT * FROM information_schema.applicable_roles;
master: 0.757 patched: 0.755

regression=# SELECT * FROM information_schema.attributes:
master: 8.392 patched: 8.555

SELECT * FROM information_schema.column_privileges;
master: 90.853 patched: 88.551

SELECT * FROM information_schema.columns;
master: 259.436 patched: 274.145

SELECT * FROM information_schema.constraint_column_usage ;
master: 14.736 patched 15.005

SELECT * FROM information_schema.parameters;
master: 76.173 patched: 79.850

SELECT * FROM information_schema.routines;
master: 45.102 patched: 46.517 ms

...

So, on those queries there's some difference (I've left out the ones
which are too short), but it's not big.

Now, for the other extreme, the following completely random query I just
typed out:
SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1,
1000000)g(i)) s WHERE f = 0;
 
master: 491.931 patched: 943.629

suffers way much worse because the division is so expensive...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Next
From: Pavel Stehule
Date:
Subject: Re: Possible bug in CASE evaluation