Re: [HACKERS] Removing LEFT JOINs in more cases - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Removing LEFT JOINs in more cases
Date
Msg-id CAFjFpReA5GJ5s039iDc9EOWEP3j2k_GjrGKJ59XAVot8HFKt7g@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Removing LEFT JOINs in more cases  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Removing LEFT JOINs in more cases  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] Removing LEFT JOINs in more cases  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Wed, Nov 1, 2017 at 5:39 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

> In this case, the join *can* cause row duplicates, but the distinct or
> group by would filter these out again anyway, so in these cases, we'd
> not only get the benefit of not joining but also not having to remove
> the duplicate rows caused by the join.

+1.

>
> Given how simple the code is to support this, it seems to me to be
> worth handling.
>

I find this patch very simple and still useful.

@@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root,
RelOptInfo *rel)
+        if (root->parse->distinctClause != NIL)
+            return true;
+
+        if (root->parse->groupClause != NIL && !root->parse->hasAggs)
+            return true;
+

The other callers of rel_supports_distinctness() are looking for distinctness
of the given relation, whereas the code change here are applicable to any
relation, not just the given relation. I find that confusing. Looking at the
way we are calling rel_supports_distinctness() in join_is_removable() this
change looks inevitable, but still if we could reduce the confusion, that will
be good. Also if we could avoid duplication of comment about unique index, that
will be good.

DISTINCT ON allows only a subset of columns being selected to be listed in that
clause. I initially thought that specifying only a subset would be a problem
and we should check whether the DISTINCT applies to all columns being selected.
But that's not true, since DISTINCT ON would eliminate any duplicates in the
columns listed in that clause, effectively deduplicating the row being
selected. So, we are good there. May be you want to add a testcase with
DISTINCT ON.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: "Taylor, Nathaniel N."
Date:
Subject: Logical Replication and PgPool
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] [POC] Faster processing at Gather node