Thread: Function returning 2 columns evaluated twice when both columns are needed

Function returning 2 columns evaluated twice when both columns are needed

From
Gerhard Wiesinger
Date:
Hello,

I'm having a problem with the following:
CREATE TYPE Sums AS (sum_m1 double precision, sum_m2 double precision);
CREATE TYPE date_m1_m2 AS (cur_date date, sum_m1 double precision, sum_m2 double precision);


CREATE OR REPLACE FUNCTION getSums(IN start_ts timestamp with time
zone, IN stop_ts timestamp with time zone) RETURNS Sums AS $$
...

CREATE OR REPLACE FUNCTION getsumInterval(date, date) RETURNS SETOF date_m1_m2 AS $$
   SELECT
     cur_date,
     (getSums(start_ts, stop_ts)).* -- No optimal since function is evaluated 2 times => 24s
     -- getSums(start_ts, stop_ts) -- in one column and not usable as I need 2 columns, but takes only 12s
   FROM
     getDatesTimestamps($1, $2)
   ;
$$ LANGUAGE SQL;

Since getSums() is a cursor and is complex and takes long time getSums
should only be evaluated once. Is there a better solution available to
get both columns from the function in the select?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Gerhard Wiesinger <lists@wiesinger.com> writes:
> Since getSums() is a cursor and is complex and takes long time getSums
> should only be evaluated once. Is there a better solution available to
> get both columns from the function in the select?

You need a sub-select, along the lines of

SELECT
  cur_date,
  (gs).sum_m1,
  (gs).sum_m2
  FROM
  (
   SELECT
     cur_date,
     getSums(start_ts, stop_ts) AS gs
   FROM
     getDatesTimestamps($1, $2)
   OFFSET 0
  ) AS ss
   ;

The OFFSET bit is a kluge, but is needed to keep the planner from
flattening the subquery and undoing your work.

            regards, tom lane

Re: Function returning 2 columns evaluated twice when both columns are needed

From
Gerhard Wiesinger
Date:
On Sun, 18 Oct 2009, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> Since getSums() is a cursor and is complex and takes long time getSums
>> should only be evaluated once. Is there a better solution available to
>> get both columns from the function in the select?
>
> You need a sub-select, along the lines of
>
> SELECT
>  cur_date,
>  (gs).sum_m1,
>  (gs).sum_m2
>  FROM
>  (
>   SELECT
>     cur_date,
>     getSums(start_ts, stop_ts) AS gs
>   FROM
>     getDatesTimestamps($1, $2)
>   OFFSET 0
>  ) AS ss
>   ;
>
> The OFFSET bit is a kluge, but is needed to keep the planner from
> flattening the subquery and undoing your work.
>

Thnx Tom. It also works without the OFFSET kludge. Any ideas why?
Looks also reasonable to me because there is no SETOF returned.

BTW: Why is the function in the original statement evaluated twice? On
"SELECT table.*" I guess query is also executed once and not n times (for
each column).

PG is version 8.3.8.

Thnx.

Ciao,
Gerhard

Gerhard Wiesinger <lists@wiesinger.com> writes:
> On Sun, 18 Oct 2009, Tom Lane wrote:
>> The OFFSET bit is a kluge, but is needed to keep the planner from
>> flattening the subquery and undoing your work.

> Thnx Tom. It also works without the OFFSET kludge. Any ideas why?

Probably because you have the function declared VOLATILE.

            regards, tom lane

Re: Function returning 2 columns evaluated twice when both columns are needed

From
Gerhard Wiesinger
Date:
On Mon, 19 Oct 2009, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> On Sun, 18 Oct 2009, Tom Lane wrote:
>>> The OFFSET bit is a kluge, but is needed to keep the planner from
>>> flattening the subquery and undoing your work.
>
>> Thnx Tom. It also works without the OFFSET kludge. Any ideas why?
>
> Probably because you have the function declared VOLATILE.
>

None of the function is declared VOLATILE. Any other idea?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Function returning 2 columns evaluated twice when both columns are needed

From
Christophe Pettus
Date:
On Oct 19, 2009, at 10:49 AM, Gerhard Wiesinger wrote:
> None of the function is declared VOLATILE. Any other idea?


If they are not declared with a volatility category at all, the
default is VOLATILE.  Is that a possibility?

--
-- Christophe Pettus
    xof@thebuild.com


Gerhard Wiesinger <lists@wiesinger.com> writes:
> On Mon, 19 Oct 2009, Tom Lane wrote:
>> Probably because you have the function declared VOLATILE.

> None of the function is declared VOLATILE. Any other idea?

[ shrug... ]  There are other possible reasons why the planner would
fail to flatten a subquery, but none of them apply to the example you
showed.  And your example function *was* VOLATILE, by default.

            regards, tom lane

Re: Function returning 2 columns evaluated twice when both columns are needed

From
Pavel Stehule
Date:
Hello

2009/10/19 Tom Lane <tgl@sss.pgh.pa.us>:
> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> On Mon, 19 Oct 2009, Tom Lane wrote:
>>> Probably because you have the function declared VOLATILE.
>
>> None of the function is declared VOLATILE. Any other idea?
>
> [ shrug... ]  There are other possible reasons why the planner would
> fail to flatten a subquery, but none of them apply to the example you
> showed.  And your example function *was* VOLATILE, by default.

I checked this on 8.5 and function is evaluated more time although is immutable.

postgres=# create or replace function foo(out a int, out b int)
returns record as $$
begin
raise notice 'start foo';
a := 10; b := 20;
return;
end;
$$ language plpgsql immutable;
CREATE FUNCTION

postgres=# select (foo()).*;
NOTICE:  start foo
NOTICE:  start foo
 a  │ b
────┼────
 10 │ 20
(1 row)

I was surprised, there are necessary subselect, but "offset" is optional:

postgres=# select (foo).* from (select foo()) f;
NOTICE:  start foo
 a  │ b
────┼────
 10 │ 20
(1 row)

postgres=# select (foo).* from (select foo() offset 0) f;
NOTICE:  start foo
 a  │ b
────┼────
 10 │ 20
(1 row)

regards
Pavel Stehule
>
>                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Function returning 2 columns evaluated twice when both columns are needed

From
Merlin Moncure
Date:
On Tue, Oct 20, 2009 at 5:12 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> [ shrug... ]  There are other possible reasons why the planner would
>> fail to flatten a subquery, but none of them apply to the example you
>> showed.  And your example function *was* VOLATILE, by default.
>
> I checked this on 8.5 and function is evaluated more time although is immutable.
>
> postgres=# create or replace function foo(out a int, out b int)
> returns record as $$
> begin
> raise notice 'start foo';
> a := 10; b := 20;
> return;
> end;
> $$ language plpgsql immutable;
> CREATE FUNCTION
>
> postgres=# select (foo()).*;

This is because select (func()).* is expanded to mean:
select func(f1), func(f2) ... func(fn);

This is a general issue with '*' because in postgres it means:
'evaluate me for each field of me', not 'return all fields of me'. I
don't think our behavior in this regard is correct (afaict i'm in the
minority though).

merlin

Re: Function returning 2 columns evaluated twice when both columns are needed

From
Pavel Stehule
Date:
2009/10/21 Merlin Moncure <mmoncure@gmail.com>:
> On Tue, Oct 20, 2009 at 5:12 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> [ shrug... ]  There are other possible reasons why the planner would
>>> fail to flatten a subquery, but none of them apply to the example you
>>> showed.  And your example function *was* VOLATILE, by default.
>>
>> I checked this on 8.5 and function is evaluated more time although is immutable.
>>
>> postgres=# create or replace function foo(out a int, out b int)
>> returns record as $$
>> begin
>> raise notice 'start foo';
>> a := 10; b := 20;
>> return;
>> end;
>> $$ language plpgsql immutable;
>> CREATE FUNCTION
>>
>> postgres=# select (foo()).*;
>
> This is because select (func()).* is expanded to mean:
> select func(f1), func(f2) ... func(fn);
>
> This is a general issue with '*' because in postgres it means:
> 'evaluate me for each field of me', not 'return all fields of me'. I
> don't think our behavior in this regard is correct (afaict i'm in the
> minority though).
>

I understand to this mechanism. This is only correction some previous
messages. This behave isn't depend on function immutability or
volatility. But I agree with you, so this is really problem - it is
very silent. Maybe we could to raise some warning or we could to move
funccall to subselect

like
SELECT (foo()).* to SELECT ( (SELECT foo()) ).*

Regards
Pavel Stehule

> merlin
>

Re: Function returning 2 columns evaluated twice when both columns are needed

From
Merlin Moncure
Date:
On Wed, Oct 21, 2009 at 12:37 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2009/10/21 Merlin Moncure <mmoncure@gmail.com>:
>> On Tue, Oct 20, 2009 at 5:12 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>>> [ shrug... ]  There are other possible reasons why the planner would
>>>> fail to flatten a subquery, but none of them apply to the example you
>>>> showed.  And your example function *was* VOLATILE, by default.
>>>
>>> I checked this on 8.5 and function is evaluated more time although is immutable.
>>>
>>> postgres=# create or replace function foo(out a int, out b int)
>>> returns record as $$
>>> begin
>>> raise notice 'start foo';
>>> a := 10; b := 20;
>>> return;
>>> end;
>>> $$ language plpgsql immutable;
>>> CREATE FUNCTION
>>>
>>> postgres=# select (foo()).*;
>>
>> This is because select (func()).* is expanded to mean:
>> select func(f1), func(f2) ... func(fn);
>>
>> This is a general issue with '*' because in postgres it means:
>> 'evaluate me for each field of me', not 'return all fields of me'. I
>> don't think our behavior in this regard is correct (afaict i'm in the
>> minority though).
>>
>
> I understand to this mechanism. This is only correction some previous
> messages. This behave isn't depend on function immutability or
> volatility. But I agree with you, so this is really problem - it is
> very silent. Maybe we could to raise some warning or we could to move
> funccall to subselect
>
> like
> SELECT (foo()).* to SELECT ( (SELECT foo()) ).*

If we are going to change I think ultimately the best answer is that
'(me).*' should mean: 'return all the fields of me', whatever 'me' is,
meaning that:

create view v as select (foo).* from foo;

should now change the definition of v if we add a column to foo.  As opposed to:

create view v as select * from foo;

which would not (and shouldn't).

merlin