Re: Possible bug in CASE evaluation - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Possible bug in CASE evaluation |
Date | |
Msg-id | 20130625130152.GA7716@awork2.anarazel.de Whole thread Raw |
In response to | Re: Possible bug in CASE evaluation (Noah Misch <noah@leadboat.com>) |
Responses |
Re: Possible bug in CASE evaluation
Re: Possible bug in CASE evaluation |
List | pgsql-hackers |
On 2013-06-24 21:35:53 -0400, Noah Misch wrote: > On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote: > > On 2013-06-21 16:45:28 +0200, Andres Freund wrote: > > > On 2013-06-21 09:51:05 -0400, Noah Misch wrote: > > > > That being said, if we discover a simple-enough fix that performs well, we may > > > > as well incorporate it. > > > > > > What about passing another parameter down eval_const_expressions_mutator > > > (which is static, so changing the API isn't a problem) that basically > > > tells us whether we actually should evaluate expressions or just perform > > > the transformation? > > > There's seems to be basically a couple of places where we call dangerous > > > things: > > > * simplify_function (via ->evaluate_function->evaluate_expr) which is > > > called in a bunch of places > > > * evaluate_expr which is directly called in T_ArrayExpr > > > T_ArrayCoerceExpr > > > > > > All places I've inspected so far need to deal with simplify_function > > > returning NULL anyway, so that seems like a viable fix. > > > > *Prototype* patch - that seems simple enough - attached. Opinions? > > Simple enough, yes. The other point still stands. You mean performance? Primarily I still think we should first worry about correctness first and then about performance. And CASE is the documented (and really only, without writing procedual code) solution to use for the cases where evaluation order actually *is* important. But anyway, the question is to find realistic cases to measure the performance of. Obviously you can just make arbitrarily expensive expressions that can be computed full during constant folding. Which I don't find very interesting, do you? So, what I've done is to measure the performance difference when doing full table queries of some CASE containing system views. best of 5 everytime: SELECT * FROM pg_stats; master: 28.287 patched: 28.565 SELECT * FROM information_schema.applicable_roles; master: 0.757 patched: 0.755 regression=# SELECT * FROM information_schema.attributes: master: 8.392 patched: 8.555 SELECT * FROM information_schema.column_privileges; master: 90.853 patched: 88.551 SELECT * FROM information_schema.columns; master: 259.436 patched: 274.145 SELECT * FROM information_schema.constraint_column_usage ; master: 14.736 patched 15.005 SELECT * FROM information_schema.parameters; master: 76.173 patched: 79.850 SELECT * FROM information_schema.routines; master: 45.102 patched: 46.517 ms ... So, on those queries there's some difference (I've left out the ones which are too short), but it's not big. Now, for the other extreme, the following completely random query I just typed out: SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000)g(i)) s WHERE f = 0; master: 491.931 patched: 943.629 suffers way much worse because the division is so expensive... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: