Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Performance improvement for joins where outer side is unique
Date
Msg-id 11488.1460054978@sss.pgh.pa.us
Whole thread Raw
In response to Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Performance improvement for joins where outer side is unique  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> [ unique_joins_2016-04-07.patch ]

Just had a thought about this, which should have crystallized a long
time ago perhaps.  Where I'd originally imagined you were going with
this idea is to do what the thread title actually says, and check for
joins in which the *outer* side is unique.  I can't see that that's
of any value for nestloop or hash joins, but for merge joins, knowing
that the outer side is unique could be extremely valuable because
we could skip doing mark/restore backups on the inner side, hugely
reducing the cost when the inner side has many duplicates.  Now I'm
not asking for the initially-committed version of the patch to do that,
but I think we need to design it to be readily extensible to do so.

The problem with this is that it blows the current factorization around
add_paths_to_joinrel out of the water.  What we'd want is for the caller
(make_join_rel) to determine uniqueness on both sides, and pass that info
down to each of its two calls of add_paths_to_joinrel; otherwise we have
to do double the work because each run of add_paths_to_joinrel will have
to make those same two determinations.

This probably also means that encoding the uniqueness into JoinType is
a lost cause.  Expanding JOIN_INNER into four variants depending on
whether either or both sides are known unique, and ditto for JOIN_LEFT,
doesn't seem attractive at all.  I suspect we want to go back to your
original design with a separate bool flag (well, two bools now, but
anyway separate from JoinType).  Or maybe the variant JoinTypes still
are better, since they'd fit into switch() tests more naturally, but
it's a lot more debatable as to whether that notation is a win.

I apologize for leading you down the wrong path on the notational aspect,
but sometimes the right answer isn't clear till you've tried all the
possibilities.

Anyway, while refactoring the make_join_rel/add_paths_to_joinrel division
of labor wouldn't be such a big deal in itself, I don't want to commit a
change to JoinType only to undo it later; that would be too much churn.
So I think we need to resolve this question before we can move forward.
I don't know if you have time to look at this now --- my clock says it's
already Friday morning in New Zealand.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Timeline following for logical slots
Next
From: Alvaro Herrera
Date:
Subject: Re: Performance improvement for joins where outer side is unique