Re: [HACKERS] Removing LEFT JOINs in more cases - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Removing LEFT JOINs in more cases
Date
Msg-id 31978.1520176772@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Removing LEFT JOINs in more cases  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Removing LEFT JOINs in more cases  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> select distinct nextval('foo') from a left join b ...
>> The presence of the DISTINCT again doesn't excuse changing how often
>> nextval() gets called.

> While working on the cases where the join removal should be disallowed
> I discovered that the existing code is not too careful about this
> either:
> [ a volatile function can be removed in a case like this ]
> select t1.a from t1 left join t1 t2 on t1.a = t2.a and notice(t2.a) = t1.a;
> Should this be fixed? or is this case somehow not worth worrying about?

I don't find that example troubling.  The execution of functions in WHERE
has never been particularly constrained.  Would you insist, say, that
notice() be evaluated for every pair of rows in the cross product of
t1 and t2, even the ones that don't pass the t1.a = t2.a condition?
Or for a more interesting example, consider these two cases:

select * from t1, t2 where notice(t2.a) = t1.a;
select * from t1, t2 where notice(t2.a) = t2.b;

With our current implementation, the first will result in executing
notice() for every row pair in the cross product, while the second
will evaluate it only once per row of t2, because the condition is
pushed down to the scan level.  Should we stop doing that?

In short, the number of executions of functions in WHERE or JOIN/ON
isn't at all implementation-independent.  We document this in
https://www.postgresql.org/docs/devel/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
where it says

     It is particularly dangerous to rely on side effects or evaluation
     order in WHERE and HAVING clauses, since those clauses are
     extensively reprocessed as part of developing an execution
     plan.

Maybe we ought to be more verbose there, but I don't care to abandon
the principle that we can reorder WHERE clauses, or skip the evaluation
of unnecessary clauses, as much as we want.

The case I was complaining about upthread involved volatiles in
the SELECT target list, which *does* have a well-defined number
of executions, ie once per row produced by the FROM/WHERE clause.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: select_parallel test failure: gather sometimes losing tuples(maybe during rescans)?
Next
From: Pavel Stehule
Date:
Subject: Fwd: automatic disable unicode line style when terminal is not unicode