Re: [HACKERS] Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Performance improvement for joins where outer side is unique
Date
Msg-id 21765.1489431029@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
[ getting back to this patch finally... ]

David Rowley <david.rowley@2ndquadrant.com> writes:
> I've attached a patch which implements this, though only for
> MergeJoin, else I'd imagine we'd also need to ensure all proofs used
> for testing the uniqueness were also hash-able too. I added some XXX
> comments in analyzejoin.c around the mergeopfamilies == NIL tests to
> mention that Merge Join depends on all the unique proof quals having
> mergeopfamilies.  This also assumes we'll never use some subset of
> mergejoin-able quals for a merge join, which could be an interesting
> area in the future, as we might have some btree index on a subset of
> those columns to provide pre-sorted input. In short, it's a great
> optimisation, but I'm a little scared we might break it one day.

Umm ... it's broken already isn't it?  See the comment for
generate_mergejoin_paths:
* We generate mergejoins if mergejoin clauses are available.  We have* two ways to generate the inner path for a
mergejoin:sort the cheapest* inner path, or use an inner path that is already suitably ordered for the* merge.  If we
haveseveral mergeclauses, it could be that there is no inner* path (or only a very expensive one) for the full list of
mergeclauses,but* better paths exist if we truncate the mergeclause list (thereby discarding* some sort key
requirements). So, we consider truncations of the* mergeclause list as well as the full list.  (Ideally we'd consider
all*subsets of the mergeclause list, but that seems way too expensive.)
 

There's another, more subtle, way in which it could fail in
sort_inner_and_outer():
* Each possible ordering of the available mergejoin clauses will generate* a differently-sorted result path at
essentiallythe same cost.  We have* no basis for choosing one over another at this level of joining, but* some sort
ordersmay be more useful than others for higher-level* mergejoins, so it's worth considering multiple orderings.**
Actually,it's not quite true that every mergeclause ordering will* generate a different path order, because some of the
clausesmay be* partially redundant (refer to the same EquivalenceClasses).  Therefore,* what we do is convert the
mergeclauselist to a list of canonical* pathkeys, and then consider different orderings of the pathkeys.
 

I'm fairly sure that it's possible to end up with fewer pathkeys than
there are mergeclauses in this code path.  Now, you might be all right
anyway given that the mergeclauses must refer to the same ECs in such a
case --- maybe they're fully redundant and we can take testing the
included clause as proving the omitted one(s) too.  I'm not certain
right now what I meant by "partially redundant" in this comment.
But in any case, it's moot for the present purpose because
generate_mergejoin_paths certainly breaks your assumption.
        regards, tom lane



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: [HACKERS] PATCH: Configurable file mode mask
Next
From: Kevin Grittner
Date:
Subject: Re: [HACKERS] delta relations in AFTER triggers