On Mon, Feb 10, 2020 at 10:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes: >> On Sat, Feb 8, 2020 at 12:53 PM Andy Fan <zhihui.fan1213@gmail.com> wrote: >> Do you mean adding some information into PlannerInfo, and when we create >> a node for Unique/HashAggregate/Group, we can just create a dummy node?
> Not so much as PlannerInfo but something on lines of PathKey. See PathKey > structure and related code. What I envision is PathKey class is also > annotated with the information whether that PathKey implies uniqueness. > E.g. a PathKey derived from a Primary index would imply uniqueness also. A > PathKey derived from say Group operation also implies uniqueness. Then just > by looking at the underlying Path we would be able to say whether we need > Group/Unique node on top of it or not. I think that would make it much > wider usecase and a very useful optimization.
FWIW, that doesn't seem like a very prudent approach to me, because it confuses sorted-ness with unique-ness. PathKeys are about sorting, but it's possible to have uniqueness guarantees without having sorted anything, for instance via hashed grouping.
I haven't looked at this patch, but I'd expect it to use infrastructure related to query_is_distinct_for(), and that doesn't deal in PathKeys.
Thanks for the pointer. I think there's another problem with my approach. PathKeys are specific to paths since the order of the result depends upon the Path. But uniqueness is a property of the result i.e. relation and thus should be attached to RelOptInfo as query_is_distinct_for() does. I think uniquness should bubble up the RelOptInfo tree, annotating each RelOptInfo with the minimum set of TLEs which make the result from that relation unique. Thus we could eliminate extra Group/Unique node if the underlying RelOptInfo's unique column set is subset of required uniqueness.