Re: Possible bug in CASE evaluation - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Possible bug in CASE evaluation
Date
Msg-id CAFj8pRDmqx8HgsYPT9h-5gQ_T0ZhCZTbzeA1+TJocKCijaj4xw@mail.gmail.com
Whole thread Raw
In response to Re: Possible bug in CASE evaluation  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
2013/6/25 Andres Freund <andres@2ndquadrant.com>:
> On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
>> On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
>> > On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
>> > > On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
>> > > > That being said, if we discover a simple-enough fix that performs well, we may
>> > > > as well incorporate it.
>> > >
>> > > What about passing another parameter down eval_const_expressions_mutator
>> > > (which is static, so changing the API isn't a problem) that basically
>> > > tells us whether we actually should evaluate expressions or just perform
>> > > the transformation?
>> > > There's seems to be basically a couple of places where we call dangerous
>> > > things:
>> > > * simplify_function (via ->evaluate_function->evaluate_expr) which is
>> > >   called in a bunch of places
>> > > * evaluate_expr which is directly called in T_ArrayExpr
>> > >   T_ArrayCoerceExpr
>> > >
>> > > All places I've inspected so far need to deal with simplify_function
>> > > returning NULL anyway, so that seems like a viable fix.
>> >
>> > *Prototype* patch - that seems simple enough - attached. Opinions?
>>
>> Simple enough, yes.  The other point still stands.
>
> You mean performance? Primarily I still think we should first worry
> about correctness first and then about performance. And CASE is the
> documented (and really only, without writing procedual code) solution to
> use for the cases where evaluation order actually *is* important.
>
> But anyway, the question is to find realistic cases to measure the
> performance of. Obviously you can just make arbitrarily expensive
> expressions that can be computed full during constant folding. Which I
> don't find very interesting, do you?
>
> So, what I've done is to measure the performance difference when doing
> full table queries of some CASE containing system views.
>
> best of 5 everytime:
> SELECT * FROM pg_stats;
> master: 28.287 patched: 28.565
>
> SELECT * FROM information_schema.applicable_roles;
> master: 0.757 patched: 0.755
>
> regression=# SELECT * FROM information_schema.attributes:
> master: 8.392 patched: 8.555
>
> SELECT * FROM information_schema.column_privileges;
> master: 90.853 patched: 88.551
>
> SELECT * FROM information_schema.columns;
> master: 259.436 patched: 274.145
>
> SELECT * FROM information_schema.constraint_column_usage ;
> master: 14.736 patched 15.005
>
> SELECT * FROM information_schema.parameters;
> master: 76.173 patched: 79.850
>
> SELECT * FROM information_schema.routines;
> master: 45.102 patched: 46.517 ms
>
> ...
>
> So, on those queries there's some difference (I've left out the ones
> which are too short), but it's not big.
>
> Now, for the other extreme, the following completely random query I just
> typed out:
> SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1,
1000000)g(i)) s WHERE f = 0;
 
> master: 491.931 patched: 943.629
>
> suffers way much worse because the division is so expensive...

:-(

it is too high price

Pavel


>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Possible bug in CASE evaluation
Next
From: Bruce Momjian
Date:
Subject: Re: Hash partitioning.