I thought about the patch from the perspective of "grouped relations"
(especially [1]). When looking for the appropriate context within the thread,
I picked this message.
David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > It seems like the major intellectual complexity here is to figure out
> > how to detect inner-side-unique at reasonable cost.  I see that for
> > LEFT joins you're caching that in the SpecialJoinInfos, which is probably
> > fine.  But for INNER joins it looks like you're just doing it over again
> > for every candidate join, and that seems mighty expensive.
> ... I'll look into that.
>
> The other thing I thought of was to add a dedicated list for unique
> indexes in RelOptInfo, this would also allow
> rel_supports_distinctness() to do something a bit smarter than just
> return false if there's no indexes. That might not buy us much though,
> but at least relations tend to have very little unique indexes, even
> when they have lots of indexes.
I'm thinking of a concept of "unique keys", similar to path keys that the
planner already uses. Besides the current evaluation of uniqueness of the
inner side of a join, the planner would (kind of) union the unique keys of the
joined rels, ie compute a list of expressions which generates an unique row
throughout the new join result. (Requirement is that each key must be usable
in join expression, as opposed to filter.)
To figure out whether at most one inner row exists per outer row, each unique
key of the inner relation which references the outer relation needs to match
an unique key of the outer relation (but it's probably wrong if multiple
unique keys of the inner rel reference the same key of the outer rel).
Like path key, the unique key would also point to an equivalence class. Thus
mere equality of the EC pointers could perhaps be used to evaluate the match
of the inner and outer keys.
Given that rel_is_distinct_for() currently does not accept joins, this change
would make the patch more generic. (BTW, with this approach, unique_rels and
non_unique_rels caches would have to be stored per-relation (RelOptInfo), as
opposed to PlannerInfo.)
The reason I'd like the unique keys is that - from the "grouped relation"
point of view - the relation uniqueness also needs to be checked against the
GROUP BY clause. Thus the "unique keys" concept seem to me like an useful
abstraction.
Does this proposal seem to have a serious flaw?
[1]
https://www.postgresql.org/message-id/CAKJS1f_h1CLff92B%3D%2BbdrMK2Nf3EfGWaJu2WbzQUYcSBUi02ag%40mail.gmail.com
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at