Re: [HACKERS] Removing LEFT JOINs in more cases - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Removing LEFT JOINs in more cases
Date
Msg-id CAB7nPqRxD4GywWXZ4jRDZm9EZGzOj7mVA=Y9YKgn3_yUUMECbA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Removing LEFT JOINs in more cases  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On Wed, Nov 22, 2017 at 10:30 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Nov 1, 2017 at 5:39 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>
>> In this case, the join *can* cause row duplicates, but the distinct or
>> group by would filter these out again anyway, so in these cases, we'd
>> not only get the benefit of not joining but also not having to remove
>> the duplicate rows caused by the join.
>
> +1.
>
>>
>> Given how simple the code is to support this, it seems to me to be
>> worth handling.
>>
>
> I find this patch very simple and still useful.
>
> @@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root,
> RelOptInfo *rel)
> +        if (root->parse->distinctClause != NIL)
> +            return true;
> +
> +        if (root->parse->groupClause != NIL && !root->parse->hasAggs)
> +            return true;
> +
>
> The other callers of rel_supports_distinctness() are looking for distinctness
> of the given relation, whereas the code change here are applicable to any
> relation, not just the given relation. I find that confusing. Looking at the
> way we are calling rel_supports_distinctness() in join_is_removable() this
> change looks inevitable, but still if we could reduce the confusion, that will
> be good. Also if we could avoid duplication of comment about unique index, that
> will be good.
>
> DISTINCT ON allows only a subset of columns being selected to be listed in that
> clause. I initially thought that specifying only a subset would be a problem
> and we should check whether the DISTINCT applies to all columns being selected.
> But that's not true, since DISTINCT ON would eliminate any duplicates in the
> columns listed in that clause, effectively deduplicating the row being
> selected. So, we are good there. May be you want to add a testcase with
> DISTINCT ON.

I am counting that as a review, which got no replies yet. The thing is
somewhat fresh so I am moving it to next CF with waiting on author as
status.
-- 
Michael


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Proposal: generic WAL compression
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] plpgsql - additional extra checks