Thread: performance issue: logical operators are slow inside SQL function: missing optimalization?

performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Pavel Stehule

Date:

29 August 2010, 10:21:40

Hello

I testing a simple use case and I was surprised with very slow
execution of SQL functions

create or replace function empty_string1(text)
returns bool as $$
select $1 is NULL or $1 = ''
$$ language sql;

postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);count
--------100000
(1 row)

Time: 448.616 ms

little bit updated function is much faster

create or replace function empty_string2(text)
returns bool as $$
select coalesce($1,'') = ''
$$ language sql;

postgres=# select count(empty_string2(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);count
--------100000
(1 row)

Time: 64.437 ms

just null test function is fast too (or just empty str function)

postgres=# create or replace function empty_string1(text) returns bool
as $$select $1 is NULL $$ language sql;
CREATE FUNCTION
Time: 21.929 ms
postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);count
--------100000
(1 row)

Time: 48.554 ms

Is strange - so slow function can be replaced by plpgsql function and
it's faster

postgres=# create or replace function empty_string1(text) returns bool
as $$begin return $1 is null or $1 = ''; end$$ language plpgsql
immutable;
CREATE FUNCTION
Time: 70.359 ms
postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);count
--------100000
(1 row)

Time: 220.131 ms

Tested on 9.1 without assertions

Regards

Pavel Stehule

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Marko Tiikkaja

Date:

29 August 2010, 10:55:32

On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Is strange - so slow function can be replaced by plpgsql function and
> it's faster

All your SQL language functions were VOLATILE.


Regards,
Marko Tiikkaja

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Pavel Stehule

Date:

29 August 2010, 10:57:16

Hello

2010/8/29 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>:
> On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> Is strange - so slow function can be replaced by plpgsql function and
>> it's faster
>
> All your SQL language functions were VOLATILE.
>

It's not a problem - planner see inside SQL function - so you don't
need set a flags.

Regards

Pavel Stehule

>
> Regards,
> Marko Tiikkaja
>

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Tom Lane

Date:

29 August 2010, 14:31:14

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I testing a simple use case and I was surprised with very slow
> execution of SQL functions

The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.
        regards, tom lane

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Pavel Stehule

Date:

29 August 2010, 15:22:13

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> I testing a simple use case and I was surprised with very slow
>> execution of SQL functions
>
> The one case is inline-able and the other not (because it would result
> in double evaluation of the volatile function random()).
> See EXPLAIN VERBOSE.

I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?

Regards

Pavel Stehule

>
>                        regards, tom lane
>

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Tom Lane

Date:

29 August 2010, 15:23:38

Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
>> The one case is inline-able and the other not (because it would result
>> in double evaluation of the volatile function random()).
>> See EXPLAIN VERBOSE.

> I understand now. So it means general advice - don't use a boolean
> operators in SQL function? This issue should be documented somewhere?

It has nothing to do with boolean operators, just double evaluation.
        regards, tom lane

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Pavel Stehule

Date:

29 August 2010, 15:43:35

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
>>> The one case is inline-able and the other not (because it would result
>>> in double evaluation of the volatile function random()).
>>> See EXPLAIN VERBOSE.
>
>> I understand now. So it means general advice - don't use a boolean
>> operators in SQL function? This issue should be documented somewhere?
>
> It has nothing to do with boolean operators, just double evaluation.
>

sure. I was blind. I have a question. It is possible do following
optimalisation?

I can write a function

CREATE OR REPLACE FUNCTION estring(text)
RETURNS bool AS $$
SELECT x IS NULL || x = ''  FROM (VALUES($1)) g(x)
$$ LANGUAGE sql;

Now this function isn't inlined, because optimaliser doesn't know a
VALUES clause. But with this knowleade, this can be a protection
before double evaluation. Or different way - generate_subplan with
parameters - it is still faster, than plpgsql or not inlined sql.

p.s. this query is badly planed

postgres=# select sum((select x is null or x = '' from (values(CASE
WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int) FROM
generate_series(1,100000); sum
--------100000
(1 row)

for corect behave a had to append a second variable
postgres=# select sum((select x is null or x = '' and i = i from
(values(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int)
FROM generate_series(1,100000) x(i); sum
-------50036
(1 row)

Regards

Pavel Stehule


>                        regards, tom lane
>

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Martijn van Oosterhout

Date:

29 August 2010, 15:46:43

On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
> > I understand now. So it means general advice - don't use a boolean
> > operators in SQL function? This issue should be documented somewhere?
>
> It has nothing to do with boolean operators, just double evaluation.

I was wondering, wouldn't it be possible to avoid the double evaluation
by simply creating an extra slot for the intermediate value. So you
get:

$1 = CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END
$2 = $1 IS NULL or $1 = ''

Sort of the way WITH works, but then for parts of expressions.

I don't believe currently expressions can refer to Vars at the same
level (it would make projections somewhat messy) but if you could
fix that you could avoid the double evaluation and still have decent
performance, right?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Tom Lane

Date:

29 August 2010, 18:09:05

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:
>> It has nothing to do with boolean operators, just double evaluation.

> I was wondering, wouldn't it be possible to avoid the double evaluation
> by simply creating an extra slot for the intermediate value.

Possibly, but the trick would be to figure out when to evaluate the
values so that it would still behave the same as without inlining.
I don't think the existing Param mechanism could do this without
some additional help.
        regards, tom lane

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

From

Pavel Stehule

Date:

29 August 2010, 18:16:36

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>> On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:
>>> It has nothing to do with boolean operators, just double evaluation.
>
>> I was wondering, wouldn't it be possible to avoid the double evaluation
>> by simply creating an extra slot for the intermediate value.
>
> Possibly, but the trick would be to figure out when to evaluate the
> values so that it would still behave the same as without inlining.
> I don't think the existing Param mechanism could do this without
> some additional help.
>

maybe subject for ToDo?

Regards

Pavel Stehule

>                        regards, tom lane
>