Thread: performance issue: logical operators are slow inside SQL function: missing optimalization?
performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Pavel Stehule
Date:
Hello I testing a simple use case and I was surprised with very slow execution of SQL functions create or replace function empty_string1(text) returns bool as $$ select $1 is NULL or $1 = '' $$ language sql; postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);count --------100000 (1 row) Time: 448.616 ms little bit updated function is much faster create or replace function empty_string2(text) returns bool as $$ select coalesce($1,'') = '' $$ language sql; postgres=# select count(empty_string2(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);count --------100000 (1 row) Time: 64.437 ms just null test function is fast too (or just empty str function) postgres=# create or replace function empty_string1(text) returns bool as $$select $1 is NULL $$ language sql; CREATE FUNCTION Time: 21.929 ms postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);count --------100000 (1 row) Time: 48.554 ms Is strange - so slow function can be replaced by plpgsql function and it's faster postgres=# create or replace function empty_string1(text) returns bool as $$begin return $1 is null or $1 = ''; end$$ language plpgsql immutable; CREATE FUNCTION Time: 70.359 ms postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);count --------100000 (1 row) Time: 220.131 ms Tested on 9.1 without assertions Regards Pavel Stehule
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Marko Tiikkaja
Date:
On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Is strange - so slow function can be replaced by plpgsql function and > it's faster All your SQL language functions were VOLATILE. Regards, Marko Tiikkaja
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Pavel Stehule
Date:
Hello 2010/8/29 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>: > On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >> Is strange - so slow function can be replaced by plpgsql function and >> it's faster > > All your SQL language functions were VOLATILE. > It's not a problem - planner see inside SQL function - so you don't need set a flags. Regards Pavel Stehule > > Regards, > Marko Tiikkaja >
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes: > I testing a simple use case and I was surprised with very slow > execution of SQL functions The one case is inline-able and the other not (because it would result in double evaluation of the volatile function random()). See EXPLAIN VERBOSE. regards, tom lane
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Pavel Stehule
Date:
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> I testing a simple use case and I was surprised with very slow >> execution of SQL functions > > The one case is inline-able and the other not (because it would result > in double evaluation of the volatile function random()). > See EXPLAIN VERBOSE. I understand now. So it means general advice - don't use a boolean operators in SQL function? This issue should be documented somewhere? Regards Pavel Stehule > > regards, tom lane >
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>: >> The one case is inline-able and the other not (because it would result >> in double evaluation of the volatile function random()). >> See EXPLAIN VERBOSE. > I understand now. So it means general advice - don't use a boolean > operators in SQL function? This issue should be documented somewhere? It has nothing to do with boolean operators, just double evaluation. regards, tom lane
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Pavel Stehule
Date:
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>: >>> The one case is inline-able and the other not (because it would result >>> in double evaluation of the volatile function random()). >>> See EXPLAIN VERBOSE. > >> I understand now. So it means general advice - don't use a boolean >> operators in SQL function? This issue should be documented somewhere? > > It has nothing to do with boolean operators, just double evaluation. > sure. I was blind. I have a question. It is possible do following optimalisation? I can write a function CREATE OR REPLACE FUNCTION estring(text) RETURNS bool AS $$ SELECT x IS NULL || x = '' FROM (VALUES($1)) g(x) $$ LANGUAGE sql; Now this function isn't inlined, because optimaliser doesn't know a VALUES clause. But with this knowleade, this can be a protection before double evaluation. Or different way - generate_subplan with parameters - it is still faster, than plpgsql or not inlined sql. p.s. this query is badly planed postgres=# select sum((select x is null or x = '' from (values(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int) FROM generate_series(1,100000); sum --------100000 (1 row) for corect behave a had to append a second variable postgres=# select sum((select x is null or x = '' and i = i from (values(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int) FROM generate_series(1,100000) x(i); sum -------50036 (1 row) Regards Pavel Stehule > regards, tom lane >
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Martijn van Oosterhout
Date:
On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: > > I understand now. So it means general advice - don't use a boolean > > operators in SQL function? This issue should be documented somewhere? > > It has nothing to do with boolean operators, just double evaluation. I was wondering, wouldn't it be possible to avoid the double evaluation by simply creating an extra slot for the intermediate value. So you get: $1 = CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END $2 = $1 IS NULL or $1 = '' Sort of the way WITH works, but then for parts of expressions. I don't believe currently expressions can refer to Vars at the same level (it would make projections somewhat messy) but if you could fix that you could avoid the double evaluation and still have decent performance, right? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote: >> It has nothing to do with boolean operators, just double evaluation. > I was wondering, wouldn't it be possible to avoid the double evaluation > by simply creating an extra slot for the intermediate value. Possibly, but the trick would be to figure out when to evaluate the values so that it would still behave the same as without inlining. I don't think the existing Param mechanism could do this without some additional help. regards, tom lane
Re: performance issue: logical operators are slow inside SQL function: missing optimalization?
From
Pavel Stehule
Date:
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>: > Martijn van Oosterhout <kleptog@svana.org> writes: >> On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote: >>> It has nothing to do with boolean operators, just double evaluation. > >> I was wondering, wouldn't it be possible to avoid the double evaluation >> by simply creating an extra slot for the intermediate value. > > Possibly, but the trick would be to figure out when to evaluate the > values so that it would still behave the same as without inlining. > I don't think the existing Param mechanism could do this without > some additional help. > maybe subject for ToDo? Regards Pavel Stehule > regards, tom lane >