Thread: [HACKERS] Removing LEFT JOINs in more cases

[HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
Hackers,

Normally we'll only ever remove a LEFT JOIN relation if it's unused
and there's no possibility that the join would cause row duplication.
To check that the join wouldn't cause row duplicate we make use of
proofs, such as unique indexes, or for sub-queries, we make use of
DISTINCT and GROUP BY clauses.

There's another case that we don't handle, and it's VERY simple to test for.

Quite simply, it seems we could remove the join in cases such as:

create table t1 (id int primary key);
create table t2 (id int primary key, b int not null);

insert into t2 values(1,1),(2,1);
insert into t1 values(1);

select distinct t1.* from t1 left join t2 on t1.id=t2.b;

and

select t1.id from t1 left join t2 on t1.id=t2.b GROUP BY t1.id;

but not:

select t1.id,count(*) from t1 left join t2 on t1.id=t2.b GROUP BY t1.id;

In this case, the join *can* cause row duplicates, but the distinct or
group by would filter these out again anyway, so in these cases, we'd
not only get the benefit of not joining but also not having to remove
the duplicate rows caused by the join.

Given how simple the code is to support this, it seems to me to be
worth handling.

A patch to do this is attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Removing LEFT JOINs in more cases

From
Ashutosh Bapat
Date:
On Wed, Nov 1, 2017 at 5:39 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

> In this case, the join *can* cause row duplicates, but the distinct or
> group by would filter these out again anyway, so in these cases, we'd
> not only get the benefit of not joining but also not having to remove
> the duplicate rows caused by the join.

+1.

>
> Given how simple the code is to support this, it seems to me to be
> worth handling.
>

I find this patch very simple and still useful.

@@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root,
RelOptInfo *rel)
+        if (root->parse->distinctClause != NIL)
+            return true;
+
+        if (root->parse->groupClause != NIL && !root->parse->hasAggs)
+            return true;
+

The other callers of rel_supports_distinctness() are looking for distinctness
of the given relation, whereas the code change here are applicable to any
relation, not just the given relation. I find that confusing. Looking at the
way we are calling rel_supports_distinctness() in join_is_removable() this
change looks inevitable, but still if we could reduce the confusion, that will
be good. Also if we could avoid duplication of comment about unique index, that
will be good.

DISTINCT ON allows only a subset of columns being selected to be listed in that
clause. I initially thought that specifying only a subset would be a problem
and we should check whether the DISTINCT applies to all columns being selected.
But that's not true, since DISTINCT ON would eliminate any duplicates in the
columns listed in that clause, effectively deduplicating the row being
selected. So, we are good there. May be you want to add a testcase with
DISTINCT ON.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Removing LEFT JOINs in more cases

From
Michael Paquier
Date:
On Wed, Nov 22, 2017 at 10:30 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Nov 1, 2017 at 5:39 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>
>> In this case, the join *can* cause row duplicates, but the distinct or
>> group by would filter these out again anyway, so in these cases, we'd
>> not only get the benefit of not joining but also not having to remove
>> the duplicate rows caused by the join.
>
> +1.
>
>>
>> Given how simple the code is to support this, it seems to me to be
>> worth handling.
>>
>
> I find this patch very simple and still useful.
>
> @@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root,
> RelOptInfo *rel)
> +        if (root->parse->distinctClause != NIL)
> +            return true;
> +
> +        if (root->parse->groupClause != NIL && !root->parse->hasAggs)
> +            return true;
> +
>
> The other callers of rel_supports_distinctness() are looking for distinctness
> of the given relation, whereas the code change here are applicable to any
> relation, not just the given relation. I find that confusing. Looking at the
> way we are calling rel_supports_distinctness() in join_is_removable() this
> change looks inevitable, but still if we could reduce the confusion, that will
> be good. Also if we could avoid duplication of comment about unique index, that
> will be good.
>
> DISTINCT ON allows only a subset of columns being selected to be listed in that
> clause. I initially thought that specifying only a subset would be a problem
> and we should check whether the DISTINCT applies to all columns being selected.
> But that's not true, since DISTINCT ON would eliminate any duplicates in the
> columns listed in that clause, effectively deduplicating the row being
> selected. So, we are good there. May be you want to add a testcase with
> DISTINCT ON.

I am counting that as a review, which got no replies yet. The thing is
somewhat fresh so I am moving it to next CF with waiting on author as
status.
-- 
Michael


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
Thanks for looking over this and my apologies for the delay in my
response. I've been on leave and out of range of the internet for most
of that time.

On 23 November 2017 at 02:30, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
> @@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root,
> RelOptInfo *rel)
> +        if (root->parse->distinctClause != NIL)
> +            return true;
> +
> +        if (root->parse->groupClause != NIL && !root->parse->hasAggs)
> +            return true;
> +
>
> The other callers of rel_supports_distinctness() are looking for distinctness
> of the given relation, whereas the code change here are applicable to any
> relation, not just the given relation. I find that confusing. Looking at the
> way we are calling rel_supports_distinctness() in join_is_removable() this
> change looks inevitable, but still if we could reduce the confusion, that will
> be good. Also if we could avoid duplication of comment about unique index, that
> will be good.

When I put this together I'd considered leaving
rel_supports_distinctness() alone since the other uses of the function
can't make use of the new checks. Since you mention this, then I think
it's likely better just to go with that and just special case the code
in join_is_removable() to check for rel_supports_distinctness() and
the DISTINCT/GROUP BY clauses too. This will mean less false positives
for usages such as in innerrel_is_unique() which would end up causing
a bit of extra work before possibly failing a bit later on. I've made
changes in the attached to do things this way.

> May be you want to add a testcase with DISTINCT ON.

Good idea. I've also added a test case for this, and also one to
ensure non-RTE_RELATION relations can be removed too.

Updated patch attached.

Thanks again for the review.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [HACKERS] Removing LEFT JOINs in more cases

From
Ashutosh Bapat
Date:
On Thu, Nov 30, 2017 at 12:20 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Thanks for looking over this and my apologies for the delay in my
> response. I've been on leave and out of range of the internet for most
> of that time.
>
> On 23 November 2017 at 02:30, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>>
>> @@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root,
>> RelOptInfo *rel)
>> +        if (root->parse->distinctClause != NIL)
>> +            return true;
>> +
>> +        if (root->parse->groupClause != NIL && !root->parse->hasAggs)
>> +            return true;
>> +
>>
>> The other callers of rel_supports_distinctness() are looking for distinctness
>> of the given relation, whereas the code change here are applicable to any
>> relation, not just the given relation. I find that confusing. Looking at the
>> way we are calling rel_supports_distinctness() in join_is_removable() this
>> change looks inevitable, but still if we could reduce the confusion, that will
>> be good. Also if we could avoid duplication of comment about unique index, that
>> will be good.
>
> When I put this together I'd considered leaving
> rel_supports_distinctness() alone since the other uses of the function
> can't make use of the new checks. Since you mention this, then I think
> it's likely better just to go with that and just special case the code
> in join_is_removable() to check for rel_supports_distinctness() and
> the DISTINCT/GROUP BY clauses too. This will mean less false positives
> for usages such as in innerrel_is_unique() which would end up causing
> a bit of extra work before possibly failing a bit later on. I've made
> changes in the attached to do things this way.
>
>> May be you want to add a testcase with DISTINCT ON.
>
> Good idea. I've also added a test case for this, and also one to
> ensure non-RTE_RELATION relations can be removed too.
>
> Updated patch attached.
>
> Thanks again for the review.

Thanks for considering those suggestions. The patch looks good to me.
It applies cleanly, compiles on my laptop without warnings or errors.
make check does not show any failures. Marking it as ready for
committer.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 1 December 2017 at 01:40, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> The patch looks good to me.
> It applies cleanly, compiles on my laptop without warnings or errors.
> make check does not show any failures. Marking it as ready for
> committer.

Great. Many thanks for the review!


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Removing LEFT JOINs in more cases

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> [ remove_left_join_distinct_2017-11-30.patch ]

I looked at this, and it seems like an interesting idea, but I'm
unconvinced that the specific conditions here are OK.

One clear problem is that you consider a DISTINCT to be an automatic
get-out-of-jail-free card, but that doesn't work in the presence of
aggregates:

     select distinct count(*) from a left join b on ...

So at the very least it needs to be more like

    if ((root->parse->distinctClause != NIL ||
         root->parse->groupClause != NIL) && !root->parse->hasAggs)

I believe it is probably necessary to reject this optimization if window
functions are present, as well, because DISTINCT would not happen till
after the wfunc calculation and wfuncs are definitely sensitive to the
number of rows they see.  (But maybe wfuncs + GROUP BY are OK.)

Another case that seems less than safe is if the proposed-to-be-dropped
left join is within the LHS of any LATERAL join.  For instance,

    select distinct ... from (a left join b ...), lateral nextval('foo');

The row duplication from b would affect how often nextval gets called,
and the presence of the DISTINCT later on doesn't entitle us to change
that.  (Maybe it'd be all right to perform the optimization if the
lateral function is known stable/immutable, but I think that's getting
too far afield; I'd be inclined to just drop the idea as soon as we
see a lateral dependency.)

Actually, that same case occurs for volatile functions in the tlist,
ie

    select distinct nextval('foo') from a left join b ...

The presence of the DISTINCT again doesn't excuse changing how often
nextval() gets called.

I kinda doubt this list of counterexamples is exhaustive, either;
it's just what occurred to me in five or ten minutes thought.
So maybe you can make this idea work, but you need to think much
harder about what the counterexamples are.

As far as code structure goes, it's surely not going to be sane to
have two copies of the checking logic, much less try to cram one
of them into a single if.  I'd suggest factoring it out into a
separate function that can be called from both places.

I'll set the patch back to Waiting on Author.

            regards, tom lane


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'll set the patch back to Waiting on Author.

Many thanks for looking at this. I'll try to resolve the things you've
mentioned this coming weekend.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Removing LEFT JOINs in more cases

From
Andres Freund
Date:
On 2018-01-10 11:14:27 +1300, David Rowley wrote:
> On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I'll set the patch back to Waiting on Author.
> 
> Many thanks for looking at this. I'll try to resolve the things you've
> mentioned this coming weekend.

This hasn't happened yet. As the last CF is already in progress I'm
unfortunately inclined to mark this returned with feedback?

Andres Freund


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 2 March 2018 at 09:49, Andres Freund <andres@anarazel.de> wrote:
> On 2018-01-10 11:14:27 +1300, David Rowley wrote:
>> On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > I'll set the patch back to Waiting on Author.
>>
>> Many thanks for looking at this. I'll try to resolve the things you've
>> mentioned this coming weekend.
>
> This hasn't happened yet. As the last CF is already in progress I'm
> unfortunately inclined to mark this returned with feedback?

I'm planning on making the required changes at the weekend.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Removing LEFT JOINs in more cases

From
Andres Freund
Date:
On 2018-03-02 09:53:38 +1300, David Rowley wrote:
> On 2 March 2018 at 09:49, Andres Freund <andres@anarazel.de> wrote:
> > On 2018-01-10 11:14:27 +1300, David Rowley wrote:
> >> On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > I'll set the patch back to Waiting on Author.
> >>
> >> Many thanks for looking at this. I'll try to resolve the things you've
> >> mentioned this coming weekend.
> >
> > This hasn't happened yet. As the last CF is already in progress I'm
> > unfortunately inclined to mark this returned with feedback?
> 
> I'm planning on making the required changes at the weekend.

Sorry if I'm grumpy, but why shouldn't this just mean it's slipped to
the next fest?

- Andres


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 2 March 2018 at 09:56, Andres Freund <andres@anarazel.de> wrote:
> On 2018-03-02 09:53:38 +1300, David Rowley wrote:
>> On 2 March 2018 at 09:49, Andres Freund <andres@anarazel.de> wrote:
>> > On 2018-01-10 11:14:27 +1300, David Rowley wrote:
>> >> On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> > I'll set the patch back to Waiting on Author.
>> >>
>> >> Many thanks for looking at this. I'll try to resolve the things you've
>> >> mentioned this coming weekend.
>> >
>> > This hasn't happened yet. As the last CF is already in progress I'm
>> > unfortunately inclined to mark this returned with feedback?
>>
>> I'm planning on making the required changes at the weekend.
>
> Sorry if I'm grumpy, but why shouldn't this just mean it's slipped to
> the next fest?

Because in most of the world it's day 1 of the commitfest.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Removing LEFT JOINs in more cases

From
Andres Freund
Date:
On 2018-03-02 09:57:35 +1300, David Rowley wrote:
> On 2 March 2018 at 09:56, Andres Freund <andres@anarazel.de> wrote:
> > On 2018-03-02 09:53:38 +1300, David Rowley wrote:
> >> I'm planning on making the required changes at the weekend.
> >
> > Sorry if I'm grumpy, but why shouldn't this just mean it's slipped to
> > the next fest?
> 
> Because in most of the world it's day 1 of the commitfest.

If a patch hasn't been updated after moved waiting-on-author from the
last CF, and the next CF started, that seems too late. Particularly in
the last CF.

Greetings,

Andres Freund


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 2 March 2018 at 09:59, Andres Freund <andres@anarazel.de> wrote:
> If a patch hasn't been updated after moved waiting-on-author from the
> last CF, and the next CF started, that seems too late. Particularly in
> the last CF.

I understand that I should have resolved these issues before the
commitfest, so please do what you see fit, but...

One observation is that it appears you're already running damage
control to triage the patches that will certainly not make it. I don't
think this particular patch is overly complex, so is that really a
good thing to do on the first of the month? It's certainly not that
encouraging for patch authors... I'd have imagined encouragement to be
better tactic at this stage, but will defer to your better judgment.

For the patch, it's a weekender patch for me, not related to 2q work.
Time is just harder to find, but I happen to have some this weekend
which I'd really like to dedicate to the patch. If you're willing to
hold fire until Sunday, if I've not done the work before then I'll
personally return it with feedback.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Removing LEFT JOINs in more cases

From
Andres Freund
Date:
On 2018-03-02 10:14:52 +1300, David Rowley wrote:
> One observation is that it appears you're already running damage
> control to triage the patches that will certainly not make it. I don't
> think this particular patch is overly complex, so is that really a
> good thing to do on the first of the month? It's certainly not that
> encouraging for patch authors... I'd have imagined encouragement to be
> better tactic at this stage, but will defer to your better judgment.

Well, we have a 250 patch fest, with relatively few people doing active
review and commit work.  We're going to make unpleasant choice :(

I definitely agree it'd be a lot nicer if weren't as pressed for serious
review and committer cycles :(


> For the patch, it's a weekender patch for me, not related to 2q work.
> Time is just harder to find, but I happen to have some this weekend
> which I'd really like to dedicate to the patch. If you're willing to
> hold fire until Sunday, if I've not done the work before then I'll
> personally return it with feedback.

Ok.

Greetings,

Andres Freund


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>         select distinct nextval('foo') from a left join b ...
>
> The presence of the DISTINCT again doesn't excuse changing how often
> nextval() gets called.
>
> I kinda doubt this list of counterexamples is exhaustive, either;
> it's just what occurred to me in five or ten minutes thought.
> So maybe you can make this idea work, but you need to think much
> harder about what the counterexamples are.

While working on the cases where the join removal should be disallowed
I discovered that the existing code is not too careful about this
either:

drop table if exists t1;

create table t1 (a int);
insert into t1 values(1);

create or replace function notice(pn int) returns int as $$
begin
raise notice '%', pn;
return pn;
end;
$$ volatile language plpgsql;

create unique index t1_a_uidx on t1(a);

explain (costs off, analyze, timing off, summary off)
select t1.a from t1 left join t1 t2 on t1.a = t2.a and notice(t2.a) = t1.a;
               QUERY PLAN
----------------------------------------
 Seq Scan on t1 (actual rows=1 loops=1)
(1 row)

drop index t1_a_uidx; -- drop the index to disallow left join removal.

explain (costs off, analyze, timing off, summary off)
select t1.a from t1 left join t1 t2 on t1.a = t2.a and notice(t2.a) = t1.a;
NOTICE:  1
                        QUERY PLAN
----------------------------------------------------------
 Nested Loop Left Join (actual rows=1 loops=1)
   Join Filter: ((t1.a = t2.a) AND (notice(t2.a) = t1.a))
   ->  Seq Scan on t1 (actual rows=1 loops=1)
   ->  Seq Scan on t1 t2 (actual rows=1 loops=1)
(4 rows)

Should this be fixed? or is this case somehow not worth worrying about?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 4 March 2018 at 18:35, David Rowley <david.rowley@2ndquadrant.com> wrote:
> drop table if exists t1;
>
> create table t1 (a int);
> insert into t1 values(1);
>
> create or replace function notice(pn int) returns int as $$
> begin
> raise notice '%', pn;
> return pn;
> end;
> $$ volatile language plpgsql;
>
> create unique index t1_a_uidx on t1(a);
>
> explain (costs off, analyze, timing off, summary off)
> select t1.a from t1 left join t1 t2 on t1.a = t2.a and notice(t2.a) = t1.a;
>                QUERY PLAN
> ----------------------------------------
>  Seq Scan on t1 (actual rows=1 loops=1)
> (1 row)
>
> drop index t1_a_uidx; -- drop the index to disallow left join removal.
>
> explain (costs off, analyze, timing off, summary off)
> select t1.a from t1 left join t1 t2 on t1.a = t2.a and notice(t2.a) = t1.a;
> NOTICE:  1
>                         QUERY PLAN
> ----------------------------------------------------------
>  Nested Loop Left Join (actual rows=1 loops=1)
>    Join Filter: ((t1.a = t2.a) AND (notice(t2.a) = t1.a))
>    ->  Seq Scan on t1 (actual rows=1 loops=1)
>    ->  Seq Scan on t1 t2 (actual rows=1 loops=1)
> (4 rows)
>
> Should this be fixed? or is this case somehow not worth worrying about?

Please find attached two patches. The first of which is intended to
resolve the issue mentioned above with consideration that it may need
to be back-patched to where LEFT JOIN removals where introduced.

Patch two is intended to implement LEFT JOIN removal for cases that
any duplicates rows that the join causes would be subsequently removed
again via a GROUP BY or DISTINCT clause.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [HACKERS] Removing LEFT JOINs in more cases

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 10 January 2018 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> select distinct nextval('foo') from a left join b ...
>> The presence of the DISTINCT again doesn't excuse changing how often
>> nextval() gets called.

> While working on the cases where the join removal should be disallowed
> I discovered that the existing code is not too careful about this
> either:
> [ a volatile function can be removed in a case like this ]
> select t1.a from t1 left join t1 t2 on t1.a = t2.a and notice(t2.a) = t1.a;
> Should this be fixed? or is this case somehow not worth worrying about?

I don't find that example troubling.  The execution of functions in WHERE
has never been particularly constrained.  Would you insist, say, that
notice() be evaluated for every pair of rows in the cross product of
t1 and t2, even the ones that don't pass the t1.a = t2.a condition?
Or for a more interesting example, consider these two cases:

select * from t1, t2 where notice(t2.a) = t1.a;
select * from t1, t2 where notice(t2.a) = t2.b;

With our current implementation, the first will result in executing
notice() for every row pair in the cross product, while the second
will evaluate it only once per row of t2, because the condition is
pushed down to the scan level.  Should we stop doing that?

In short, the number of executions of functions in WHERE or JOIN/ON
isn't at all implementation-independent.  We document this in
https://www.postgresql.org/docs/devel/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
where it says

     It is particularly dangerous to rely on side effects or evaluation
     order in WHERE and HAVING clauses, since those clauses are
     extensively reprocessed as part of developing an execution
     plan.

Maybe we ought to be more verbose there, but I don't care to abandon
the principle that we can reorder WHERE clauses, or skip the evaluation
of unnecessary clauses, as much as we want.

The case I was complaining about upthread involved volatiles in
the SELECT target list, which *does* have a well-defined number
of executions, ie once per row produced by the FROM/WHERE clause.

            regards, tom lane


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 5 March 2018 at 04:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> select * from t1, t2 where notice(t2.a) = t1.a;
> select * from t1, t2 where notice(t2.a) = t2.b;
>
> With our current implementation, the first will result in executing
> notice() for every row pair in the cross product, while the second
> will evaluate it only once per row of t2, because the condition is
> pushed down to the scan level.  Should we stop doing that?

I think we'd get complaints.

I admit to finding it difficult to know what the rules are here. For
example, qual_is_pushdown_safe() refuses to push volatile quals down
into subqueries. I'm a bit unsure why that's disallowed, but
reordering WHERE clause items containing volatile functions is
allowed.

Volatile functions in a subquery targetlist are not guaranteed to be
evaluated as we may push down a non-volatile qual into the subquery
causing fewer rows to be produced by the subquery resulting in fewer
evaluations of the targetlist expressions.

Here's an example of that happening:

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM (SELECT id,notice(id) FROM t1 GROUP BY id) t1 WHERE t1.id = -10;
                  QUERY PLAN
----------------------------------------------
 Group (actual rows=0 loops=1)
   Group Key: t1.id
   ->  Seq Scan on t1 (actual rows=0 loops=1)
         Filter: (id = '-10'::integer)
         Rows Removed by Filter: 1
(5 rows)

-- use OFFSET 0 to block the pushdown of the non-volatile qual.
EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM (SELECT id,notice(id) FROM t1 GROUP BY id OFFSET 0) t1
WHERE t1.id = -10;
NOTICE:  1
                       QUERY PLAN
---------------------------------------------------------
 Subquery Scan on t1 (actual rows=0 loops=1)
   Filter: (t1.id = '-10'::integer)
   Rows Removed by Filter: 1
   ->  HashAggregate (actual rows=1 loops=1)
         Group Key: t1_1.id
         ->  Seq Scan on t1 t1_1 (actual rows=1 loops=1)
(6 rows)

> In short, the number of executions of functions in WHERE or JOIN/ON
> isn't at all implementation-independent.  We document this in
> https://www.postgresql.org/docs/devel/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
> where it says
>
>      It is particularly dangerous to rely on side effects or evaluation
>      order in WHERE and HAVING clauses, since those clauses are
>      extensively reprocessed as part of developing an execution
>      plan.
>
> Maybe we ought to be more verbose there, but I don't care to abandon
> the principle that we can reorder WHERE clauses, or skip the evaluation
> of unnecessary clauses, as much as we want.
>
> The case I was complaining about upthread involved volatiles in
> the SELECT target list, which *does* have a well-defined number
> of executions, ie once per row produced by the FROM/WHERE clause.

I'm not so sure that's completely true.  If you add an SRF then we
seem to get an extra evaluation per row.

# select generate_Series(1,2),notice(id) from t1;
NOTICE:  1
NOTICE:  1
NOTICE:  1
 generate_series | notice
-----------------+--------
               1 |      1
               2 |      1
(2 rows)

You also complained about LATERAL joins to volatile functions, but
that too seems not guaranteed and seems to depend on the query plan
chosen, as highlighted in:

CREATE OR REPLACE FUNCTION notice(pn INT) RETURNS INT AS $$
BEGIN
RAISE NOTICE '%', pn;
RETURN pn;
END;
$$ VOLATILE LANGUAGE plpgsql;

CREATE TABLE t1 (id int);
CREATE TABLE t2 (a int);

INSERT INTO t1 values(1);
INSERT INTO t2 values(1),(1);

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.a, LATERAL notice(t1.id);
NOTICE:  1
                            QUERY PLAN
-------------------------------------------------------------------
 Merge Left Join (actual rows=2 loops=1)
   Merge Cond: (t1.id = t2.a)
   ->  Sort (actual rows=1 loops=1)
         Sort Key: t1.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop (actual rows=1 loops=1)
               ->  Seq Scan on t1 (actual rows=1 loops=1)
               ->  Function Scan on notice (actual rows=1 loops=1)
   ->  Sort (actual rows=2 loops=1)
         Sort Key: t2.a
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t2 (actual rows=2 loops=1)
(12 rows)

SET from_collapse_limit = 1;

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.a, LATERAL notice(t1.id);
NOTICE:  1
NOTICE:  1
                        QUERY PLAN
----------------------------------------------------------
 Nested Loop (actual rows=2 loops=1)
   ->  Merge Left Join (actual rows=2 loops=1)
         Merge Cond: (t1.id = t2.a)
         ->  Sort (actual rows=1 loops=1)
               Sort Key: t1.id
               Sort Method: quicksort  Memory: 25kB
               ->  Seq Scan on t1 (actual rows=1 loops=1)
         ->  Sort (actual rows=2 loops=1)
               Sort Key: t2.a
               Sort Method: quicksort  Memory: 25kB
               ->  Seq Scan on t2 (actual rows=2 loops=1)
   ->  Function Scan on notice (actual rows=1 loops=2)
(12 rows)

RESET from_collapse_limit;

In short, I'm just a little confused at our guarantees.  The best I
can see is that we guarantee a volatile function will be evaluated if
it's in the outer query's target list, providing the targetlist does
not have any set-returning functions present, which seems a long way
short of what the documents warn users about.

For now, if we ignore patch 0001, as neither of us is particularly
concerned about that. Patch 0002 might need to have the volatility
checks removed from the GROUP BY clause, but I'm a bit uncertain about
that given the examples I showed above.

Thanks for your feedback so far.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 4 March 2018 at 21:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Please find attached two patches. The first of which is intended to
> resolve the issue mentioned above with consideration that it may need
> to be back-patched to where LEFT JOIN removals where introduced.
>
> Patch two is intended to implement LEFT JOIN removal for cases that
> any duplicates rows that the join causes would be subsequently removed
> again via a GROUP BY or DISTINCT clause.

I've attached an updated version of this patch. This time there's only
1 patch. Per the discussion above about volatile functions, there's no
need to disallow the join removal when there are volatile functions in
the join clauses.

The only changes to the 0002 patch are now that it's named 0001 and
the tests were changed slightly as a plpgsql function was defined in
the old 0001 which we need for the new 0001.

http://commitfest.cputube.org was also complaining about the tests
failing. This was due to a plan changed caused by 1007b0a1. The plan
change was in the old 0001 patch, which I'm now dropping, so
hopefully, a new email with just the 1 patch will let cputube know
about me dropping the other one.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [HACKERS] Removing LEFT JOINs in more cases

From
Antonin Houska
Date:
David Rowley <david.rowley@2ndquadrant.com> wrote:

> I've attached an updated version of this patch.

Following are the findings from my review.

On the LATERAL references:

This query (proposed upthread by Tom and adjusted by me so it can be executed
on the your test tables)

select distinct t1.id from t1 left join t2 on t1.id=t2.b, lateral nextval('foo');

is subject to join removal because in rel_distinctified_after_join() you check
brel->lateral_vars, which is NIL in this case.

On the other hand I'm not sure that user should expect any number of
executions of the function in this particular case because the actual join
order can change: it can happen that the function scan is first joined to t1
and t2 is joined to the result. The case that requires more caution is that
the function references both t1 and t2, i.e. *all* tables of the left join
from which you want to remove the inner relation. Maybe that's the reason you
introduced the test for brel->lateral_vars, but you forgot to check the actual
variables in the list.


On the volatile function in the targetlist:

Volatile function in the DISTINCT ON expression isn't necessarily noticed by
your checks:

select distinct on (nextval('foo')) t1.id from t1 left join t2 on t1.id=t2.b;

Perhaps you should simply check the whole targetlist if there's no GROUP BY
clause.


On coding:

  * join_is_removable(): Just like rel_distinctified_after_join() is called
     before rel_is_distinct_for() - obviously because it's cheaper - I suggest
     to call query_distinctifies_rows() before rel_supports_distinctness().

  * header comment of query_distinctifies_rows() mentions rel_tuples_needed,
    but I can't find such a function.

  * rel_distinctified_after_join() does not really use the "rel"
    argument. Besides removing it you probably should rename the function.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


Re: [HACKERS] Removing LEFT JOINs in more cases

From
David Rowley
Date:
On 18 September 2018 at 20:02, Antonin Houska <ah@cybertec.at> wrote:
> Following are the findings from my review.

Thanks for reviewing this.  Time is short in this commitfest to make
any changes, so I'm going to return this with feedback with the
intention of addressing the items from your review for the next 'fest.

Cheers

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services