Thread: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17623
Logged by:          Stamatis Zampetakis
Email address:      zabetak@gmail.com
PostgreSQL version: 14.5
Operating system:   Debian 10.2.1-6
Description:

Steps to reproduce:
CREATE TABLE emp (empno INT, name VARCHAR, deptno INT);
INSERT INTO emp VALUES (0, 'Alex', 0);
INSERT INTO emp VALUES (10, 'Bob', 1);

CREATE TABLE dept (deptno INT);
INSERT INTO dept VALUES (1);

SELECT e.name
FROM emp e 
INNER JOIN dept d ON e.deptno = d.deptno
WHERE (10 / e.empno) = 1

Actual output:
ERROR:  division by zero

Expected output:
Bob

The error is caused since the filter condition in the WHERE clause is
evaluated before the join. Filter push-down is a very common and powerful
optimization but when there are operators in the WHERE clause that may throw
(such as division, cast, etc) this optimization is unsafe. 

The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
applied to the result of FROM so in the case above pushing filters below the
join seems to violate the standard. 

Citing the standard:
"If all optional clauses are omitted, then the result of the <table
expression> is the same as the result of the
<from clause>. Otherwise, each specified clause is applied to the result of
the previously specified clause
and the result of the <table expression> is the result of the application of
the last specified clause."

One of the optional clauses mentioned in the previous paragraph is the
<where clause>. There seems to be a clearly defined order between the <from
clause>, which includes inner joins, and the <where clause>.


Re: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

From
"David G. Johnston"
Date:
On Wed, Sep 28, 2022 at 5:29 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      17623
Logged by:          Stamatis Zampetakis
Email address:      zabetak@gmail.com
PostgreSQL version: 14.5
Operating system:   Debian 10.2.1-6
Description:       

Steps to reproduce:
CREATE TABLE emp (empno INT, name VARCHAR, deptno INT);
INSERT INTO emp VALUES (0, 'Alex', 0);
INSERT INTO emp VALUES (10, 'Bob', 1);

CREATE TABLE dept (deptno INT);
INSERT INTO dept VALUES (1);

SELECT e.name
FROM emp e
INNER JOIN dept d ON e.deptno = d.deptno
WHERE (10 / e.empno) = 1

Actual output:
ERROR:  division by zero

Expected output:
Bob

The error is caused since the filter condition in the WHERE clause is
evaluated before the join. Filter push-down is a very common and powerful
optimization but when there are operators in the WHERE clause that may throw
(such as division, cast, etc) this optimization is unsafe.

 
The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
applied to the result of FROM so in the case above pushing filters below the
join seems to violate the standard.

The failure to document such a deviation from the standard can be considered a bug but not the deviation itself.  That is intentional.  In terms of trade-offs the current behavior seems reasonable.  You'd need a real example motivating a desire to make a change that will likely add complexity and cost to every query most of which work just fine with relevant clauses pushed down to restrict the volume of data that needs to be joined.

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
>> The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
>> applied to the result of FROM so in the case above pushing filters below
>> the join seems to violate the standard.

> The failure to document such a deviation from the standard can be
> considered a bug but not the deviation itself.  That is intentional.  In
> terms of trade-offs the current behavior seems reasonable.

Enforcing such a restriction would be absolutely disastrous for the
performance of a large number of real-world queries.  I doubt that
any other RDBMS takes that spec wording literally either.

I'd suggest something like

    CASE WHEN e.empno = 0 THEN NULL ELSE 10 / e.empno END

if you need to program around the lack of guarantees about the
order of evaluation.  A WITH clause can also be used as an
optimization fence when you need one.

BTW, this *is* documented, see for example

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

            regards, tom lane



Re: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

From
Stamatis Zampetakis
Date:
I agree that brutally disabling filter pushdown is out of question. However, it may not be so bad in terms of performance if done selectively only for operators that may raise an error.

If for instance you have a big conjunction and only one conjunct contains a cast you can push everything below the join and keep only the cast on the top.

There may be other ways to retain performance while being inline with standard SQL semantics; I haven't given too much though about it.

I didn't find a relevant discussion in the archives so I thought it was worth putting this down for people who may bump into this in the future.

I don't have a strong opinion if it is worth trying to fix this or not cause I don't really know much about Postgres internals.

In any case, it may be good to document the current behavior somewhere.

Note that the workaround with CASE WHEN can address this simplistic example but it can easily get very complicated if you want to perform a cast based on the result of a complex join; every join condition has to become part of the WHEN clause.

There are probably other workarounds such as wrapping the join in a sub query, putting the risky computation in project, and the filtering in a outer query but these syntactic rewritings may become meaningless in the eyes of the query optimizer. I don't know what Postgres will do in this case but I know other optimizers that will happily pushdown the filter all the way down and on top of the scan.

All that to say that the SQL semantics that say that WHERE is applied on the result of FROM are quite well-known so when exceptions come into play things get tricky.

Apologies for the long message and many thanks for taking the time to look into this.



On Wed, Sep 28, 2022, 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
>> The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
>> applied to the result of FROM so in the case above pushing filters below
>> the join seems to violate the standard.

> The failure to document such a deviation from the standard can be
> considered a bug but not the deviation itself.  That is intentional.  In
> terms of trade-offs the current behavior seems reasonable.

Enforcing such a restriction would be absolutely disastrous for the
performance of a large number of real-world queries.  I doubt that
any other RDBMS takes that spec wording literally either.

I'd suggest something like

        CASE WHEN e.empno = 0 THEN NULL ELSE 10 / e.empno END

if you need to program around the lack of guarantees about the
order of evaluation.  A WITH clause can also be used as an
optimization fence when you need one.

BTW, this *is* documented, see for example

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

                        regards, tom lane

Re: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

From
Stamatis Zampetakis
Date:
Just for the sake of completeness, (in case it was not clear in my previous message), I would like to highlight that this problem is different from the evaluation order of predicates in the WHERE clause, which is implementation specific. Due to that, the documentation below (shared by Tom) does not apply here.


If we want to document this deviation from the standard it should be done in the page below:


Best,
Stamatis

On Wed, Sep 28, 2022 at 5:44 PM Stamatis Zampetakis <zabetak@gmail.com> wrote:
I agree that brutally disabling filter pushdown is out of question. However, it may not be so bad in terms of performance if done selectively only for operators that may raise an error.

If for instance you have a big conjunction and only one conjunct contains a cast you can push everything below the join and keep only the cast on the top.

There may be other ways to retain performance while being inline with standard SQL semantics; I haven't given too much though about it.

I didn't find a relevant discussion in the archives so I thought it was worth putting this down for people who may bump into this in the future.

I don't have a strong opinion if it is worth trying to fix this or not cause I don't really know much about Postgres internals.

In any case, it may be good to document the current behavior somewhere.

Note that the workaround with CASE WHEN can address this simplistic example but it can easily get very complicated if you want to perform a cast based on the result of a complex join; every join condition has to become part of the WHEN clause.

There are probably other workarounds such as wrapping the join in a sub query, putting the risky computation in project, and the filtering in a outer query but these syntactic rewritings may become meaningless in the eyes of the query optimizer. I don't know what Postgres will do in this case but I know other optimizers that will happily pushdown the filter all the way down and on top of the scan.

All that to say that the SQL semantics that say that WHERE is applied on the result of FROM are quite well-known so when exceptions come into play things get tricky.

Apologies for the long message and many thanks for taking the time to look into this.



On Wed, Sep 28, 2022, 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
>> The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
>> applied to the result of FROM so in the case above pushing filters below
>> the join seems to violate the standard.

> The failure to document such a deviation from the standard can be
> considered a bug but not the deviation itself.  That is intentional.  In
> terms of trade-offs the current behavior seems reasonable.

Enforcing such a restriction would be absolutely disastrous for the
performance of a large number of real-world queries.  I doubt that
any other RDBMS takes that spec wording literally either.

I'd suggest something like

        CASE WHEN e.empno = 0 THEN NULL ELSE 10 / e.empno END

if you need to program around the lack of guarantees about the
order of evaluation.  A WITH clause can also be used as an
optimization fence when you need one.

BTW, this *is* documented, see for example

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

                        regards, tom lane