Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date
Msg-id CAExHW5sG2Q7aPAh4vpk85QhnuFfDBJYc3yFNGb43x6vc498rsA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Erase the distinctClause if the result is unique by definition  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Mon, Feb 10, 2020 at 10:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
>> On Sat, Feb 8, 2020 at 12:53 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> Do you mean adding some information into PlannerInfo,  and when we create
>> a node for Unique/HashAggregate/Group,  we can just create a dummy node?

> Not so much as PlannerInfo but something on lines of PathKey. See PathKey
> structure and related code. What I envision is PathKey class is also
> annotated with the information whether that PathKey implies uniqueness.
> E.g. a PathKey derived from a Primary index would imply uniqueness also. A
> PathKey derived from say Group operation also implies uniqueness. Then just
> by looking at the underlying Path we would be able to say whether we need
> Group/Unique node on top of it or not. I think that would make it much
> wider usecase and a very useful optimization.

FWIW, that doesn't seem like a very prudent approach to me, because it
confuses sorted-ness with unique-ness.  PathKeys are about sorting,
but it's possible to have uniqueness guarantees without having sorted
anything, for instance via hashed grouping.

I haven't looked at this patch, but I'd expect it to use infrastructure
related to query_is_distinct_for(), and that doesn't deal in PathKeys.

Thanks for the pointer. I think there's another problem with my approach. PathKeys are specific to paths since the order of the result depends upon the Path. But uniqueness is a property of the result i.e. relation and thus should be attached to RelOptInfo as query_is_distinct_for() does. I think uniquness should bubble up the RelOptInfo tree, annotating each RelOptInfo with the minimum set of TLEs which make the result from that relation unique. Thus we could eliminate extra Group/Unique node if the underlying RelOptInfo's unique column set is subset of required uniqueness.
--
--
Best Wishes,
Ashutosh Bapat

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Next
From: Ashutosh Bapat
Date:
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition