Thread: Odd Shortcut behaviour in PG14

Odd Shortcut behaviour in PG14

From
Zahir Lalani
Date:

Hello all

 

Got a really weird problem with shortcut processing on one server.

 

We have just upgraded to PG14 from PG11. The following code works as expected on our primary Dev server, and we recently upgraded our QA server to the same level. However in this case the shortcut processing seems broken.

 

Here is the code in question:

 

LEFT JOIN lateral (

    SELECT

        CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 'utf8')::JSON ELSE NULL END AS edata

        FROM crypto_secretbox_open(

                sc.data,

                sc.nonce,

                boxkey)

) enc ON true and (ekey > 0)

 

This code has worked up till now and works on the dev server. The expectation being that if ekey=0 the lateral join will be ignored. However on the new QA server this is hit and miss. It fails many times as it seems to ignore the shortcut even though ekey is 0. We can make it work by doing this:

 

LEFT JOIN lateral (

    SELECT

        CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 'utf8')::JSON ELSE NULL END AS edata

        FROM crypto_secretbox_open(

                sc.data,

                sc.nonce,

                boxkey)

) enc ON (ekey > 0) and true

 

This should theoretically be no different – but it solves the issue 100%. Any guidance on why this would be the case?

 

 

Zahir Lalani
Director of Enterprise Systems

signature_2000159722 

151 Rosebery Avenue

London EC1R 4AB

 

t: +44 (0)7956 455 168

e: zahirlalani@oliver.agency

 

Thank you for your email. For a faster response please take note of the below:

 

  1. If your query is regarding a hosting issue/new request, please direct your request to Devops.support@oliver.agency
  2. If your request is regarding the data warehouse or zoho, please direct your request to mi_reporting@oliver.agency
  3. If your request is regarding OMG issues/configuration, please direct your request to omgsupport@oliver.agency

 

For all other requests I will try and respond as soon as I can. Please note our response SLA’s

  • Critical (P1) – systems down or unresponsive = within 1 hour
  • High (P2) – Issues with data accuracy/access/functional = within 4 business hours
  • Medium (P3) – new requests / amendments to current setups etc = within 2 business days
  • Low – general requests regarding any system = best endeavour

 

Attachment

Re: Odd Shortcut behaviour in PG14

From
Adrian Klaver
Date:
On 11/23/23 08:55, Zahir Lalani wrote:
> Hello all
> 
> Got a really weird problem with shortcut processing on one server.
> 
> We have just upgraded to PG14 from PG11. The following code works as 
> expected on our primary Dev server, and we recently upgraded our QA 
> server to the same level. However in this case the shortcut processing 
> seems broken.

To be clear both the Dev and QA servers are at 14.x now?

If so are they at same minor version?

Same OS and version?

Using same data set?

Also from here:

https://www.postgresql.org/docs/14/sql-expressions.html#SYNTAX-EXPRESS-EVAL

"
4.2.14. Expression Evaluation Rules

The order of evaluation of subexpressions is not defined. In particular, 
the inputs of an operator or function are not necessarily evaluated 
left-to-right or in any other fixed order.

Furthermore, if the result of an expression can be determined by 
evaluating only some parts of it, then other subexpressions might not be 
evaluated at all. For instance, if one wrote:

SELECT true OR somefunc();

then somefunc() would (probably) not be called at all. The same would be 
the case if one wrote:

SELECT somefunc() OR true;

Note that this is not the same as the left-to-right “short-circuiting” 
of Boolean operators that is found in some programming languages.
"

This also held in version 11.

> 
> Here is the code in question:
> 
> LEFT JOIN lateral (
> 
>      SELECT
> 
>          CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 
> 'utf8')::JSON ELSE NULL END AS edata
> 
>          FROM crypto_secretbox_open(
> 
>                  sc.data,
> 
>                  sc.nonce,
> 
>                  boxkey)
> 
> ) *enc ON true and (ekey > 0)*
> 
> This code has worked up till now and works on the dev server. The 
> expectation being that if ekey=0 the lateral join will be ignored. 
> However on the new QA server this is hit and miss. It fails many times 
> as it seems to ignore the shortcut even though ekey is 0. We can make it 
> work by doing this:
> 
> LEFT JOIN lateral (
> 
>      SELECT
> 
>          CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 
> 'utf8')::JSON ELSE NULL END AS edata
> 
>          FROM crypto_secretbox_open(
> 
>                  sc.data,
> 
>                  sc.nonce,
> 
>                  boxkey)
> 
> *) enc ON (ekey > 0) and true*
> 
> This should theoretically be no different – but it solves the issue 
> 100%. Any guidance on why this would be the case?
> 
> *Zahir Lalani**
> *Director of Enterprise Systems
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Odd Shortcut behaviour in PG14

From
Tom Lane
Date:
Zahir Lalani <ZahirLalani@oliver.agency> writes:
> Got a really weird problem with shortcut processing on one server.

This question is unanswerable as given.  You have not even defined
what you mean by "fail" (error? wrong query result?), let alone
provided enough detail for someone else to reproduce the problem.

            regards, tom lane



Odd Shortcut behaviour in PG14

From
"David G. Johnston"
Date:
On Thursday, November 23, 2023, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Zahir Lalani <ZahirLalani@oliver.agency> writes:
> Got a really weird problem with shortcut processing on one server.

This question is unanswerable as given.  You have not even defined
what you mean by "fail" (error? wrong query result?), let alone
provided enough detail for someone else to reproduce the problem.

The OP complains about the apparent inconsistent optimizing away of the SRF function call of crypto_secretbox_open in the lateral when the join predicate is known to be false.

I agree that they could make others lives easier by giving more evidence and self-contained example; but the assertions being made are complete.

David J.

Re: Odd Shortcut behaviour in PG14

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thursday, November 23, 2023, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This question is unanswerable as given.  You have not even defined
>> what you mean by "fail" (error? wrong query result?), let alone
>> provided enough detail for someone else to reproduce the problem.

> The OP complains about the apparent inconsistent optimizing away of the SRF
> function call of crypto_secretbox_open in the lateral when the join
> predicate is known to be false.

Hmm, your crystal ball apparently works better than mine today,
because I sure do not see where that information was presented.

            regards, tom lane



Re: Odd Shortcut behaviour in PG14

From
"David G. Johnston"
Date:


On Thursday, November 23, 2023, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thursday, November 23, 2023, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This question is unanswerable as given.  You have not even defined
>> what you mean by "fail" (error? wrong query result?), let alone
>> provided enough detail for someone else to reproduce the problem.

> The OP complains about the apparent inconsistent optimizing away of the SRF
> function call of crypto_secretbox_open in the lateral when the join
> predicate is known to be false.

Hmm, your crystal ball apparently works better than mine today,
because I sure do not see where that information was presented.

                       

My interpretation of:

 The expectation being that if ekey=0 the lateral join will be ignored. “

David J.
 

Re: Odd Shortcut behaviour in PG14

From
Ron Johnson
Date:
Out of curiosity, what is the point of adding the "true" predicate no matter the position?  Maybe I've created an incorrect truth table, but "true AND" (and "AND true") don't make any logical difference when added to (ekey > 0).


On Thu, Nov 23, 2023 at 11:56 AM Zahir Lalani <ZahirLalani@oliver.agency> wrote:

Hello all

 

Got a really weird problem with shortcut processing on one server.

 

We have just upgraded to PG14 from PG11. The following code works as expected on our primary Dev server, and we recently upgraded our QA server to the same level. However in this case the shortcut processing seems broken.

 

Here is the code in question:

 

LEFT JOIN lateral (

    SELECT

        CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 'utf8')::JSON ELSE NULL END AS edata

        FROM crypto_secretbox_open(

                sc.data,

                sc.nonce,

                boxkey)

) enc ON true and (ekey > 0)

 

[snip] 

LEFT JOIN lateral (

    SELECT

        CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 'utf8')::JSON ELSE NULL END AS edata

        FROM crypto_secretbox_open(

                sc.data,

                sc.nonce,

                boxkey)

) enc ON (ekey > 0) and true

 

This should theoretically be no different – but it solves the issue 100%. Any guidance on why this would be the case?

 

Re: Odd Shortcut behaviour in PG14

From
Tom Lane
Date:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> Out of curiosity, what is the point of adding the "true" predicate no
> matter the position?  Maybe I've created an incorrect truth table, but
> "true AND" (and "AND true") don't make any logical difference when added
> to (ekey > 0)*.*

Not only does it not make any logical difference, but it shouldn't
make any practical difference either, because the useless "true"
subclause will be thrown away very early in planning, before any
decisions would be taken on the strength of what is in the ON clause.
So I was skeptical that the details presented were even correct.
I think whatever is causing the behavioral change is something else
that the OP hasn't identified/controlled for.  Hard to tell with
such fragmentary details.

            regards, tom lane



RE: Odd Shortcut behaviour in PG14

From
Zahir Lalani
Date:

> -----Original Message-----
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Sent: Thursday, November 23, 2023 7:45 PM
> To: Ron Johnson <ronljohnsonjr@gmail.com>
> Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
> Subject: Re: Odd Shortcut behaviour in PG14
>
> Ron Johnson <ronljohnsonjr@gmail.com> writes:
> > Out of curiosity, what is the point of adding the "true" predicate no
> > matter the position?  Maybe I've created an incorrect truth table, but
> > "true AND" (and "AND true") don't make any logical difference when
> > added to (ekey > 0)*.*
>
> Not only does it not make any logical difference, but it shouldn't make any
> practical difference either, because the useless "true"
> subclause will be thrown away very early in planning, before any decisions
> would be taken on the strength of what is in the ON clause.
> So I was skeptical that the details presented were even correct.
> I think whatever is causing the behavioral change is something else that the OP
> hasn't identified/controlled for.  Hard to tell with such fragmentary details.
>
>                         regards, tom lane
>

Apologies but I am struggling to make this happen in isolation. A few things:

The true was an error - we previously did not have the AND so needed the true. The devs added the AND of ekey but
shouldhave removed the true. We have done that now. 

There is a difference between the PG versions. The Dev server which works is 14.9 and the QA server which fails is
14.10

Looking at the application logs this function is being called once per display row - it is running successfully around
10times with the same input params. When it fails, it is with the same params! All I can say for sure is that the first
10it ignores the lateral join as the (ekey > 0) evaluates to (0>0). However on the next invocation, for whatever
reason,even though ekey is still 0, it decides to evaluate the lateral join and fails. 

If we change the code from (ekey > 0) to (0 > 0) - it always works!! So the planner seems to be making different
choicesbased on other functions being called higher up in the chain. We need a way of telling the planner not to run
thelateral join if ekey=0. 

Sorry I can't provide a testable case - have not been able to reproduce just running this one function multiple times
ina loop. 

Thx

Z



Re: Odd Shortcut behaviour in PG14

From
Ron Johnson
Date:
On Fri, Nov 24, 2023 at 8:01 AM Zahir Lalani <ZahirLalani@oliver.agency> wrote:


> -----Original Message-----
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Sent: Thursday, November 23, 2023 7:45 PM
> To: Ron Johnson <ronljohnsonjr@gmail.com>
> Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
> Subject: Re: Odd Shortcut behaviour in PG14
>
> Ron Johnson <ronljohnsonjr@gmail.com> writes:
> > Out of curiosity, what is the point of adding the "true" predicate no
> > matter the position?  Maybe I've created an incorrect truth table, but
> > "true AND" (and "AND true") don't make any logical difference when
> > added to (ekey > 0)*.*
>
> Not only does it not make any logical difference, but it shouldn't make any
> practical difference either, because the useless "true"
> subclause will be thrown away very early in planning, before any decisions
> would be taken on the strength of what is in the ON clause.
> So I was skeptical that the details presented were even correct.
> I think whatever is causing the behavioral change is something else that the OP
> hasn't identified/controlled for.  Hard to tell with such fragmentary details.
>
>                         regards, tom lane
>

Apologies but I am struggling to make this happen in isolation. A few things:

The true was an error - we previously did not have the AND so needed the true. The devs added the AND of ekey but should have removed the true. We have done that now.

There is a difference between the PG versions. The Dev server which works is 14.9 and the QA server which fails is 14.10

Looking at the application logs this function is being called once per display row - it is running successfully around 10 times with the same input params. When it fails, it is with the same params! All I can say for sure is that the first 10 it ignores the lateral join as the (ekey > 0) evaluates to (0>0). However on the next invocation, for whatever reason, even though ekey is still 0, it decides to evaluate the lateral join and fails.

If we change the code from (ekey > 0) to (0 > 0) - it always works!! So the planner seems to be making different choices based on other functions being called higher up in the chain. We need a way of telling the planner not to run the lateral join if ekey=0.

Sorry I can't provide a testable case - have not been able to reproduce just running this one function multiple times in a loop.

A PL/PgSQL loop, or a bash loop?  Because if the same code, with the same parameters, runs the exact same 10 times in a row, then There's Some Other Problem You're Not Telling Us.
 
Put the SELECT statement in a file with EXPLAIN ANALYZE prefixed to it.

Then run this bash loop:
for x in `seq 0 9`; do psql --host=foo -U bar -af thequery.sql &> thequery_analyzed_$x.out; done

diff them.

I'm betting that what the developers wrote relies on something that SQL doesn't say is reliable, and which changed between 14.9 and 14.10.

Or did you also upgrade the OS when upgrading to .10?  That could cause a problem; always rebuild all indices with CHAR, VARCHAR & TEXT fields after upgrading glibc.

Re: Odd Shortcut behaviour in PG14

From
Tom Lane
Date:
Zahir Lalani <ZahirLalani@oliver.agency> writes:
> Looking at the application logs this function is being called once per
> display row - it is running successfully around 10 times with the same
> input params. When it fails, it is with the same params!

You *still* haven't defined what you mean by "fails".  We can't help
you effectively with such tiny dribs and drabs of information.
At the very least I'd like to see the whole query, because the
fragment you've shown us does not reveal what ekey is or why you
think the system should believe that it is or is not zero.
But it's also unclear why that should matter.

Having said that ... if the statement is being executed with a cached
plan (via a named statement, or PREPARE, or inside plpgsql) then
maybe the problem occurs if the plan switches from custom to generic?
If so, messing with the plan_cache_mode setting might provide a
workaround.

            regards, tom lane



RE: Odd Shortcut behaviour in PG14

From
Zahir Lalani
Date:

> -----Original Message-----
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Sent: Friday, November 24, 2023 3:35 PM
> To: Zahir Lalani <ZahirLalani@oliver.agency>
> Cc: Ron Johnson <ronljohnsonjr@gmail.com>; pgsql-
> generallists.postgresql.org <pgsql-general@lists.postgresql.org>
> Subject: Re: Odd Shortcut behaviour in PG14


> You *still* haven't defined what you mean by "fails".  We can't help you
> effectively with such tiny dribs and drabs of information.
> At the very least I'd like to see the whole query, because the fragment you've
> shown us does not reveal what ekey is or why you think the system should
> believe that it is or is not zero.
> But it's also unclear why that should matter.
>
> Having said that ... if the statement is being executed with a cached plan (via a
> named statement, or PREPARE, or inside plpgsql) then maybe the problem
> occurs if the plan switches from custom to generic?
> If so, messing with the plan_cache_mode setting might provide a workaround.
>
>                         regards, tom lane

Sorry Tom - let me try and clarify:

(ekey is a variable passed into the function)

    RETURN QUERY
        SELECT
            CASE WHEN (ekey = 0) THEN person.first_name ELSE (enc.edata->>'firstname')::text END AS first_name,
            CASE WHEN (ekey = 0) THEN person.last_name ELSE (enc.edata->>'lastname')::text END AS last_name,
            CASE WHEN (ekey = 0) THEN person.email ELSE (enc.edata->>'emailaddress')::text END AS email,
            CASE WHEN (ekey = 0) THEN person.first_name || ' ' || person.last_name ELSE (enc.edata->>'firstname')::text
||' ' || (enc.edata->>'lastname')::text END AS full_name 

        FROM "public".person AS person

        -- join to decrypt person data
        LEFT OUTER JOIN "public".secure AS sc ON
            CASE WHEN (ekey > 0) THEN sc.fk_org_id = org_id and sc.fk_entity_id = 6 and sc.fk_entity_obj_id = person.id
ELSEfalse END 
        LEFT JOIN lateral (
            SELECT
                CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 'utf8')::JSON ELSE NULL END AS edata
            FROM crypto_secretbox_open(
                sc.data,
                sc.nonce,
                boxkey)
        ) enc ON (ekey > 0)
        WHERE person.id = person_id

-----------------------
We have certain clients with encrypted data and this query needs to apply the decryption only if the record is deemed
tobe encrypted - this is determined by the variable ekey being > 0. The "failure" is the invocation of the lateral join
whenekey is 0 - i.e. its not encrypted data. The crypto function will fail if sc.data is null - as will be the case for
anon-encrypted record. 

Our assumption - probably incorrectly it seems, is that if the expression (ekey > 0) is false, PG will not invoke the
lateraljoin and hence not execute crypto. This has worked in PG11 and worked in 14.9. What is very odd is that in 14.10
thisstill seems to work in many cases, but not all. This particular function works in psql, and sometimes in the
applicationbut not always. The parameters are identical for when it works as expected and for when it does not - i.e,
whenit invokes the lateral even though ekey = 0 

To prove that the issue lies in this area alone, we have split the above function and run specific code for the two
casesof ekey which guarantees that it will never do the unexpected - this fixes the issue, but I would still like to
knowif our fundamental assumption about when the lateral runs was misguided 

Hope that gives more context

Z





Re: Odd Shortcut behaviour in PG14

From
Tom Lane
Date:
Zahir Lalani <ZahirLalani@oliver.agency> writes:
> Sorry Tom - let me try and clarify:
> (ekey is a variable passed into the function)

OK, so if this is a plpgsql function and ekey is a function
variable, the planner will definitely perceive this as a query
parameterized by the value of "ekey".  We will consider a
"custom" plan where the value is directly substituted into the
query (allowing plan-time folding based on whether ekey is zero
or not), but we will also consider a "generic" plan where the
value of ekey is not known at plan time so no such folding occurs,
and that's probably where your failure is happening.  Replanning
for every query execution is expensive so there's a preference
for using generic plans if we can.

I don't really understand why you wrote

>             SELECT
>                 CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open, 'utf8')::JSON ELSE NULL END AS edata
>             FROM crypto_secretbox_open(
>                 sc.data,
>                 sc.nonce,
>                 boxkey)

rather than just

            SELECT
                CASE WHEN (ekey > 0) THEN convert_from(
                     crypto_secretbox_open(sc.data,
                                           sc.nonce,
                                           boxkey),
                     'utf8')::JSON ELSE NULL END AS edata

I see no reason why you should feel entitled to assume that
crypto_secretbox_open won't get called in the first formulation.
The normal understanding of such a SELECT is that we evaluate
FROM and then apply the SELECT expressions to its result, so the
existence of a CASE in the SELECT expression doesn't cause the
function call in FROM to get bypassed.

Likewise, the fact that the JOIN ON condition is false seems
like a poor reason to assume that the join's input relation
won't get evaluated.

Another approach could be to force matters in the plpgsql logic:

    IF ekey > 0 THEN
        RETURN QUERY query-with-decryption;
    ELSE
        RETURN QUERY query-without-decryption;
    END IF;

which seems a good deal safer than relying on undocumented details
of planner optimization behavior.

I also wonder why you don't make crypto_secretbox_open a bit
more robust --- at the very least mark it strict (RETURNS NULL
ON NULL INPUT).

            regards, tom lane



RE: Odd Shortcut behaviour in PG14

From
Zahir Lalani
Date:
> -----Original Message-----
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Sent: Friday, November 24, 2023 6:44 PM
> To: Zahir Lalani <ZahirLalani@oliver.agency>
> Cc: Ron Johnson <ronljohnsonjr@gmail.com>; pgsql-
> generallists.postgresql.org <pgsql-general@lists.postgresql.org>
> Subject: Re: Odd Shortcut behaviour in PG14
>
> OK, so if this is a plpgsql function and ekey is a function variable, the planner
> will definitely perceive this as a query parameterized by the value of "ekey".
> We will consider a "custom" plan where the value is directly substituted into
> the query (allowing plan-time folding based on whether ekey is zero or not),
> but we will also consider a "generic" plan where the value of ekey is not known
> at plan time so no such folding occurs, and that's probably where your failure
> is happening.  Replanning for every query execution is expensive so there's a
> preference for using generic plans if we can.
>
> I don't really understand why you wrote
>
> >             SELECT
> >                 CASE WHEN (ekey > 0) THEN convert_from(crypto_secretbox_open,
> 'utf8')::JSON ELSE NULL END AS edata
> >             FROM crypto_secretbox_open(
> >                 sc.data,
> >                 sc.nonce,
> >                 boxkey)
>
> rather than just
>
>             SELECT
>                 CASE WHEN (ekey > 0) THEN convert_from(
>                      crypto_secretbox_open(sc.data,
>                                            sc.nonce,
>                                            boxkey),
>                      'utf8')::JSON ELSE NULL END AS edata
>
> I see no reason why you should feel entitled to assume that
> crypto_secretbox_open won't get called in the first formulation.
> The normal understanding of such a SELECT is that we evaluate FROM and
> then apply the SELECT expressions to its result, so the existence of a CASE in
> the SELECT expression doesn't cause the function call in FROM to get
> bypassed.
>
> Likewise, the fact that the JOIN ON condition is false seems like a poor reason
> to assume that the join's input relation won't get evaluated.
>
> Another approach could be to force matters in the plpgsql logic:
>
>         IF ekey > 0 THEN
>             RETURN QUERY query-with-decryption;
>         ELSE
>             RETURN QUERY query-without-decryption;
>         END IF;
>
> which seems a good deal safer than relying on undocumented details of
> planner optimization behavior.
>
> I also wonder why you don't make crypto_secretbox_open a bit more robust -
> -- at the very least mark it strict (RETURNS NULL ON NULL INPUT).
>
>                         regards, tom lane


Thank you for the detailed explanation Tom - much appreciated.

So our assumption was made as it worked for so long - ok - so we have learnt not to do that again....

>         IF ekey > 0 THEN
>             RETURN QUERY query-with-decryption;
>         ELSE
>             RETURN QUERY query-without-decryption;
>         END IF;

This is indeed what we have done - we were trying to keep code maintenance down as the above requires updates in two
placesrather than 1 - but better that than a broken system! 

> I also wonder why you don't make crypto_secretbox_open a bit more robust -
> -- at the very least mark it strict (RETURNS NULL ON NULL INPUT).

This is a third party plugin (pgsodium).

Thank you to this group for your expertise!

Z