Thread: BUG #13798: Unexpected multiple exection of user defined function with out parameters

BUG #13798: Unexpected multiple exection of user defined function with out parameters

From
mike.lang1010@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      13798
Logged by:          Michael Lang
Email address:      mike.lang1010@gmail.com
PostgreSQL version: 9.4.5
Operating system:   Ubuntu 12.04
Description:

I've found that when a user defined function has
out parameters, it is invoked once per out parameter if invoked with the
syntax:

`SELECT (udf()).*`

Is this the expected behavior?  It seems like it shouldn't.

This syntax is undesireable because it is the only way I've found so far to
get the postgresql backend to return all of the out parameters together
as a row, together with the parameters type information, instead of
returning the out parameters together as the text representation of
the composite type that they form together.

To demonstrate, take the function as follows:
```
CREATE FUNCTION demo(
  OUT param1 text,
  OUT param2 text,
  OUT param3 text
) AS $$
BEGIN
  param1 := 'foo';
  param2 := 'bar';
  param3 := 'baz';
END;
$$ LANGUAGE plpgsql
```

The query `SELECT demo();` produces the result
```
testdb=# SELECT demo();
     demo
---------------
 (foo,bar,baz)
(1 row)
```
Whereas the query `SELECT (demo()).*` produce the result
```
testdb=# SELECT (demo()).*;
 param1 | param2 | param3
--------+--------+--------
 foo    | bar    | baz
(1 row)
```

I've yet to find another means to get postgresql to produce the result
in such a form.

Unfortunately, I've found that the `SELECT (udf()).*` form executes the
udf once per out parameter.  This is undesirable for both performance
reasons and unacceptable for functions that cause side effects.  To
demonstrate that this is happening I've provided the following example:

```
CREATE TABLE test (
  i integer
);

INSERT into test (i) VALUES (0);

CREATE FUNCTION reproduceBehavior(
  OUT message1 text,
  OUT message2 text,
  OUT message3 text,
  OUT message4 text
)
AS $$
DECLARE t integer;
BEGIN
  SELECT i INTO t FROM test limit 1;
  IF t = 0 THEN
    update test set i=1;
    message1 := 'The value of i is now 1';
  END IF;
  IF t = 1 THEN
    update test set i=2;
    message2 := 'The value of i is now 2';
  END IF;
  IF t = 2 THEN
    update test set i=3;
    message3 := 'The value of i is now 3';
  END IF;
  IF t = 3 THEN
    update test set i=4;
    message4 := 'The value of i is now 4';
  END IF;
  RETURN;
END;
$$ LANGUAGE plpgsql;

SELECT (reproduceBehavior()).*;
```

Which produces the result:

```
        message1         |        message2         |        message3
|        message4
-------------------------+-------------------------+-------------------------+-------------------------
 The value of i is now 1 | The value of i is now 2 | The value of i is now 3
| The value of i is now 4
(1 row)
```

I've reproduced this behavior on:
PostgreSQL 9.3.10 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

and

 PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
On Sat, Dec 5, 2015 at 12:30 AM,  <mike.lang1010@gmail.com> wrote:

> I've found that when a user defined function has
> out parameters, it is invoked once per out parameter if invoked with the
> syntax:
>
> `SELECT (udf()).*`
>
> Is this the expected behavior?  It seems like it shouldn't.

You can see why this happens if you use EXPLAIN this way:

test=# explain (analyze, buffers, verbose) SELECT (reproduceBehavior()).*;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..1.01 rows=1 width=0) (actual time=1.931..1.932
rows=1 loops=1)
   Output: (reproducebehavior()).message1,
(reproducebehavior()).message2, (reproducebehavior()).message3,
(reproducebehavior()).message4
   Buffers: shared hit=17
 Planning time: 0.038 ms
 Execution time: 1.968 ms
(5 rows)

That shows that the * causes expansion to the following query:

SELECT
    (reproducebehavior()).message1,
    (reproducebehavior()).message2,
    (reproducebehavior()).message3,
    (reproducebehavior()).message4;

From that you can see why it is not surprising that the function is
executed once per OUT parameter, especially if it is VOLATILE.

You seem to be expecting it to behave like this:

SELECT * FROM (SELECT * FROM reproduceBehavior()) x;

Which is interpreted as:

SELECT x.message1, x.message2, x.message3, x.message4
  FROM (SELECT * FROM reproduceBehavior()) x;

To avoid surprises, avoid using *.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

From
"David G. Johnston"
Date:
On Fri, Dec 4, 2015 at 11:30 PM, <mike.lang1010@gmail.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      13798
> Logged by:          Michael Lang
> Email address:      mike.lang1010@gmail.com
> PostgreSQL version: 9.4.5
> Operating system:   Ubuntu 12.04
> Description:
>
> I've found that when a user defined function has
> out parameters, it is invoked once per out parameter if invoked with the
> syntax:
>
> `SELECT (udf()).*`
>
> Is this the expected behavior?


=E2=80=8BYes, it is.

Using the recently introduced "LATERAL" feature is the preferred method.

=E2=80=8B(i think...)=E2=80=8B
=E2=80=8BSELECT *=E2=80=8B
=E2=80=8BFROM src
LATERAL func_call(src.col);=E2=80=8B

If that is not possible you can use a CTE.

WITH func_cte AS (
SELECT function_call(...)  --do not expand this yet; pass it out as a
composite
)
SELECT (func_cte.function_call).*  --now we expand the composite; not the
unusual parens - they are required.
FROM func_cte;

David J.

Re: BUG #13798: Unexpected multiple exection of user defined function with out parameters

From
"David G. Johnston"
Date:
On Tue, Dec 8, 2015 at 9:08 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

> On Sat, Dec 5, 2015 at 12:30 AM,  <mike.lang1010@gmail.com> wrote:
>
> > I've found that when a user defined function has
> > out parameters, it is invoked once per out parameter if invoked with th=
e
> > syntax:
> >
> > `SELECT (udf()).*`
> >
> > Is this the expected behavior?  It seems like it shouldn't.
>
> You can see why this happens if you use EXPLAIN this way:
>

=E2=80=8BYou can see "what" happens.  The why is still a mystery...


> test=3D# explain (analyze, buffers, verbose) SELECT (reproduceBehavior())=
.*;
>                                                                 QUERY PLA=
N
>
> -------------------------------------------------------------------------=
-----------------------------------------------------------------
>  Result  (cost=3D0.00..1.01 rows=3D1 width=3D0) (actual time=3D1.931..1.9=
32
> rows=3D1 loops=3D1)
>    Output: (reproducebehavior()).message1,
> (reproducebehavior()).message2, (reproducebehavior()).message3,
> (reproducebehavior()).message4
>    Buffers: shared hit=3D17
>  Planning time: 0.038 ms
>  Execution time: 1.968 ms
> (5 rows)
>
> That shows that the * causes expansion to the following query:
>
> SELECT
>     (reproducebehavior()).message1,
>     (reproducebehavior()).message2,
>     (reproducebehavior()).message3,
>     (reproducebehavior()).message4;
>
> From that you can see why it is not surprising that the function is
> executed once per OUT parameter, especially if it is VOLATILE.
>
>
=E2=80=8BWhile you've explained how to see what is happening it doesn't rem=
ove the
POLA violation that has occurred here.=E2=80=8B

You seem to be expecting it to behave like this:
>
> SELECT * FROM (SELECT * FROM reproduceBehavior()) x;
>
> Which is interpreted as:
>
> SELECT x.message1, x.message2, x.message3, x.message4
>   FROM (SELECT * FROM reproduceBehavior()) x;
>
> To avoid surprises, avoid using *
>

=E2=80=8BOK - but how is one supposed to do that?  There is no good way to =
explode
a composite type, especially one created using a function, without using *.

I've responded to the original thread with two possible alternative query
forms.  The CTE one is a hack while the implementation of LATERAL finally
provided a non-hackey means to accomplish the goal.  The behavior of
(SELECT (func_call()).*) will likely never change but I'd still argue that
not repeatedly invoking the function would be the better implementation and
the least astonishing one.

David J.
On Tue, Dec 8, 2015 at 10:24 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Tue, Dec 8, 2015 at 9:08 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

>> [...] * causes expansion to the following query:
>>
>> SELECT
>>     (reproducebehavior()).message1,
>>     (reproducebehavior()).message2,
>>     (reproducebehavior()).message3,
>>     (reproducebehavior()).message4;
>>
>> From that you can see why it is not surprising that the function is
>> executed once per OUT parameter, especially if it is VOLATILE.
>
> While you've explained how to see what is happening it doesn't remove
> the POLA violation that has occurred here.

I tend to think of x.* as being something that is resolved at parse
analysis time, before any consideration is given to how to execute;
thus, when x is a function I didn't find it at all astonishing that
it resolved as above.  If you think of * as a macro that expands in
the above manner before planning, the fact that it does not
optimize to a single call is not surprising for a VOLATILE
function; that is required behavior.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@gmail.com> writes:
> On Tue, Dec 8, 2015 at 10:24 AM, David G. Johnston
> <david.g.johnston@gmail.com> wrote:
>> While you've explained how to see what is happening it doesn't remove
>> the POLA violation that has occurred here.

> I tend to think of x.* as being something that is resolved at parse
> analysis time, before any consideration is given to how to execute;
> thus, when x is a function I didn't find it at all astonishing that
> it resolved as above.

As far as table references go, it's effectively required by the SQL spec
that "tab.*" in a SELECT list is expanded to "tab.c1, tab.c2, etc" at
parse time.  (I draw this conclusion from the parts of the spec that say
that tab.*'s meaning in a view referencing tab does not change if tab
subsequently gains new columns.)

It's definitely annoying that (foo()).* is expanded similarly; but to fix
that we'd have to introduce an additional layer of evaluation into SELECT
lists, and I think that might create some visible semantic oddities of its
own.  Between backwards-compatibility worries and the existence of the
LATERAL workaround, it's unlikely we'll ever change this.

I'm not sure how well this point is documented, though.  Might be worth
some effort in that direction.

            regards, tom lane
On Tue, Dec 8, 2015 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Kevin Grittner <kgrittn@gmail.com> writes:
> > On Tue, Dec 8, 2015 at 10:24 AM, David G. Johnston
> > <david.g.johnston@gmail.com> wrote:
> >> While you've explained how to see what is happening it doesn't remove
> >> the POLA violation that has occurred here.
>
> > I tend to think of x.* as being something that is resolved at parse
> > analysis time, before any consideration is given to how to execute;
> > thus, when x is a function I didn't find it at all astonishing that
> > it resolved as above.
>
> As far as table references go, it's effectively required by the SQL spec
> that "tab.*" in a SELECT list is expanded to "tab.c1, tab.c2, etc" at
> parse time.  (I draw this conclusion from the parts of the spec that say
> that tab.*'s meaning in a view referencing tab does not change if tab
> subsequently gains new columns.)
>
> It's definitely annoying that (foo()).* is expanded similarly; but to fix
> that we'd have to introduce an additional layer of evaluation into SELECT
> lists, and I think that might create some visible semantic oddities of its
> own.  Between backwards-compatibility worries and the existence of the
> LATERAL workaround, it's unlikely we'll ever change this.
>
> I'm not sure how well this point is documented, though.  Might be worth
> some effort in that direction.
>
>                         regards, tom lane
>

Thanks to everyone for the attention given, and the illuminating
responses.  I'm not sure how I passed over the SELECT * FROM udf(); form
suggested by Oleksandr Shulgin, but it produces the desired result -
executing the function once while returning the out parameters as a row
rather than as the text literal representation of the composite type.  It's
an odd quirk that (udf()).* may execute the function multiple times, but
the workarounds make it simple enough to avoid.

Thanks!
Mike Lang