Thread: BUG #16368: Incorrect function inlining in the presence of a window function

BUG #16368: Incorrect function inlining in the presence of a window function

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16368
Logged by:          Elvis Pranskevichus
Email address:      elprans@gmail.com
PostgreSQL version: 12.2
Operating system:   Gentoo Linux
Description:

Consider the following function:

CREATE OR REPLACE FUNCTION intfmt(input text, fmt text)
 RETURNS bigint
 LANGUAGE sql
AS $$ 
SELECT 
  CASE WHEN fmt IS NULL 
    THEN input::bigint 
    ELSE to_number(input, fmt)::bigint 
  END; 
$$;

And the following query:

SELECT
    intfmt('123,456', q.fmt) AS "out"
FROM
    (SELECT
            ("v" = first_value("v") OVER ()),
            '999,999' AS "fmt"
        FROM
            (SELECT 1 AS "v") AS "q2"
    ) AS "q";

The expected result is the integer 123456, but the query fails with:

ERROR:  invalid input syntax for type bigint: "123,456"
CONTEXT:  SQL function "intfmt" during inlining

Which means that somehow during inlining of "intfmt" Postgres incorrectly
takes the first branch in the `CASE` expression. This only happens in the
presence of the "first_value" window call in the nested query.

                Elvis


Re: BUG #16368: Incorrect function inlining in the presence of awindow function

From
"David G. Johnston"
Date:
On Wed, Apr 15, 2020 at 11:07 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      16368
Logged by:          Elvis Pranskevichus
Email address:      elprans@gmail.com
PostgreSQL version: 12.2
Operating system:   Gentoo Linux
Description:       

Consider the following function:

CREATE OR REPLACE FUNCTION intfmt(input text, fmt text)
[...]
SELECT
  CASE WHEN fmt IS NULL
    THEN input::bigint
    ELSE to_number(input, fmt)::bigint
  END;
[...]
SELECT
[...] 
    intfmt('123,456', q.fmt) AS "out"
 

The expected result is the integer 123456, but the query fails with:

ERROR:  invalid input syntax for type bigint: "123,456"
CONTEXT:  SQL function "intfmt" during inlining

Which means that somehow during inlining of "intfmt" Postgres incorrectly
takes the first branch in the `CASE` expression.

During inlining the case expression becomes:

CASE WHEN q.fmt IS NULL
THEN '123,456'::bigint
ELSE to_number('123,456', q.fmt)
END;

It doesn't "take" a branch - it turns variables into constants and, as written, some of those constants are invalid for the types they are being assigned to.
 
This only happens in the
presence of the "first_value" window call in the nested query.

The ability to optimize, and how, depends on the whole query.

I don't actually know whether this is a bug or just an expected downside to using inline-able functions and case statements to avoid malformed data parsing.

Writing the function in pl/pgsql prevents the inlining and stabilizes the query.

David J.

Re: BUG #16368: Incorrect function inlining in the presence of a window function

From
Elvis Pranskevichus
Date:
On Wednesday, April 15, 2020 11:32:54 A.M. PDT David G. Johnston wrote:
> During inlining the case expression becomes:
> 
> CASE WHEN q.fmt IS NULL
> THEN '123,456'::bigint
> ELSE to_number('123,456', q.fmt)
> END;
> 
> It doesn't "take" a branch - it turns variables into constants and, as
> written, some of those constants are invalid for the types they are
> being assigned to.
> 
> > This only happens in the
> > presence of the "first_value" window call in the nested query.
> 
> The ability to optimize, and how, depends on the whole query.
> 
> I don't actually know whether this is a bug or just an expected
> downside to using inline-able functions and case statements to avoid
> malformed data parsing.
> 
> Writing the function in pl/pgsql prevents the inlining and stabilizes
> the query.

IMO, an optimization that leads to wrong query results is unquestionably 
a bug.  And "use pl/pgsql" (which is much slower) is arguably not the 
best answer here.

> inline-able functions and case statements to avoid
> malformed data parsing.

Consider any other case where an error is guarded by a "CASE WHEN", such 
as division by zero.  I think the use of "CASE WHEN" should disqualify 
the function from being inlined, or, maybe, constant folding should be 
disabled in the branches of "CASE WHEN" when inlining and when the 
optimizer is unable to reason about the "CASE" condition.

Thoughts?

                                 Elvis





Re: BUG #16368: Incorrect function inlining in the presence of awindow function

From
"David G. Johnston"
Date:
On Wed, Apr 15, 2020 at 1:08 PM Elvis Pranskevichus <elprans@gmail.com> wrote:
On Wednesday, April 15, 2020 11:32:54 A.M. PDT David G. Johnston wrote:
> During inlining the case expression becomes:
>
> CASE WHEN q.fmt IS NULL
> THEN '123,456'::bigint
> ELSE to_number('123,456', q.fmt)
> END;
>
> It doesn't "take" a branch - it turns variables into constants and, as
> written, some of those constants are invalid for the types they are
> being assigned to.
>
> > This only happens in the
> > presence of the "first_value" window call in the nested query.
>
> The ability to optimize, and how, depends on the whole query.
>
> I don't actually know whether this is a bug or just an expected
> downside to using inline-able functions and case statements to avoid
> malformed data parsing.
>
> Writing the function in pl/pgsql prevents the inlining and stabilizes
> the query.

IMO, an optimization that leads to wrong query results is unquestionably
a bug.  And "use pl/pgsql" (which is much slower) is arguably not the
best answer here.

Maybe not, but if you want something that works it is a solution.


> inline-able functions and case statements to avoid
> malformed data parsing.

Consider any other case where an error is guarded by a "CASE WHEN", such
as division by zero.

There aren't all that many and throwing out a perfectly good optimization for boundary cases that will error, as opposed to returning but including invalid results, isn't that desirable either.

  I think the use of "CASE WHEN" should disqualify
the function from being inlined, or, maybe, constant folding should be
disabled in the branches of "CASE WHEN" when inlining and when the
optimizer is unable to reason about the "CASE" condition.

Thoughts?


Outside my wheelhouse as to what is practical.  I suspect we'll get a better answer on that front in due time.
The world isn't perfect and I sure cannot write a patch to change this behavior and for my money its something I could live with if it was decided that this is the best option at this time.

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wed, Apr 15, 2020 at 1:08 PM Elvis Pranskevichus <elprans@gmail.com>
> wrote:
>> Consider any other case where an error is guarded by a "CASE WHEN", such
>> as division by zero.

> There aren't all that many and throwing out a perfectly good optimization
> for boundary cases that will error, as opposed to returning but including
> invalid results, isn't that desirable either.

In point of fact, there are many ways in which CASE and related constructs
fail to guarantee evaluation order, as noted in
https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

The particular case mentioned there seems to be about the same as here:
constant-folding happens even in CASE arms that will never be reached at
runtime.  Problems could happen earlier than that, too.  It's not hard to
imagine cases that wouldn't execute "as desired" unless we didn't even do
parse analysis (e.g, subexpression type determination) on a CASE arm until
it's reached at runtime.  But we're not going to make that sort of thing
happen; it's just too much contortion and inefficiency for too little
benefit.  In particular, people for whom the current implementation works
OK would be quite unhappy with the performance impact of de-optimizing
CASE that way.

            regards, tom lane



Re: BUG #16368: Incorrect function inlining in the presence of a window function

From
Elvis Pranskevichus
Date:
On Wednesday, April 15, 2020 5:14:05 P.M. PDT Tom Lane wrote:

> In point of fact, there are many ways in which CASE and related
> constructs fail to guarantee evaluation order, as noted in
> https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EX
> PRESS-EVAL
>
> The particular case mentioned there seems to be about the same as
> here: constant-folding happens even in CASE arms that will never be
> reached at runtime. 

Yes, but function arguments aren't constants are they?  At least the 
documentation makes no effort to mention that.

> would be quite unhappy with the performance impact of de-optimizing
> CASE that way.

I'm not arguing for the general de-optimization for CASE, just for not 
treating arguments of inlined functions as constants in the CASE 
statement.  For arguments of a prepared statement this optimization 
makes even less sense.  

                                Elvis





Re: BUG #16368: Incorrect function inlining in the presence of a window function

From
Elvis Pranskevichus
Date:
> The particular case mentioned there seems to be about the same as
> here: constant-folding happens even in CASE arms that will never be
> reached at runtime.  

Since the actual issue here is false positive exceptions, why can't we 
treat errors raised during const eval as a signal to cancel the constant 
folding, i.e. ignore them and keep the failing expression unoptimized?

                              Elvis





Re: BUG #16368: Incorrect function inlining in the presence of awindow function

From
Kyotaro Horiguchi
Date:
At Wed, 15 Apr 2020 18:17:30 -0700, Elvis Pranskevichus <elprans@gmail.com> wrote in 
> On Wednesday, April 15, 2020 5:14:05 P.M. PDT Tom Lane wrote:
> 
> > In point of fact, there are many ways in which CASE and related
> > constructs fail to guarantee evaluation order, as noted in
> > https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EX
> > PRESS-EVAL
> >
> > The particular case mentioned there seems to be about the same as
> > here: constant-folding happens even in CASE arms that will never be
> > reached at runtime. 
> 
> Yes, but function arguments aren't constants are they?  At least the 
> documentation makes no effort to mention that.
> 
> > would be quite unhappy with the performance impact of de-optimizing
> > CASE that way.
> 
> I'm not arguing for the general de-optimization for CASE, just for not 
> treating arguments of inlined functions as constants in the CASE 
> statement.  For arguments of a prepared statement this optimization 
> makes even less sense.  

Perhaps there's seems to be no nice way to detect the case where an
arm in a case expression cannot even be executed although the
expression is seemingly constant-foldable. (Note that all arms of the
case has a chance to be reached since the case-expression is NOT a
constant at the parse time.)  Isn't there any workaround usable for
you? I'm not sure what is your real issue, but it seems to me
unnatural that the first argument of intfmt is a literal string.

For example, the following query works.

SELECT
    intfmt(q.data, q.fmt) AS "out"
FROM
    (SELECT
            ("v" = first_value("v") OVER ()),
            '123,456' AS data, '999,999' AS fmt
        FROM
            (SELECT 1 AS "v") AS "q2"
    ) AS "q";

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center