Re: Performance issue in foreign-key-aware join estimation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Performance issue in foreign-key-aware join estimation
Date
Msg-id 22399.1552071950@sss.pgh.pa.us
Whole thread Raw
In response to Re: Performance issue in foreign-key-aware join estimation  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Performance issue in foreign-key-aware join estimation
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 21 Feb 2019 at 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, I rebased the POC patch up to HEAD, just in case anyone
>> still wants to play with it.

> Cool. Thanks.

I haven't done any of the performance testing that this patch
clearly needs, but just in a quick look through the code:

* I seriously dislike the usage of root->eq_classes, primarily the
underdocumented way that it means one thing in some phases of the
planner and something else in other phases.  You seem to be doing that
in hopes of saving some memory, but is the index data structure really
large enough to matter?  This scheme is certainly unlikely to continue
to work if we add any additional uses of EquivalenceClassIndexes.
So I think it might be better to pass them around as explicit
arguments and/or attach them someplace else than PlannerInfo.

* I'm also not very excited about having both a fast and slow path
in places like has_relevant_eclass_joinclause() depending on whether
the index exists or not.  IMO, if we're going to do this at all,
we should just go over to requiring the index to exist when needed,
and get rid of the slow paths.  (A possible variant to that is "build
the index structure on-demand", though you'd have to be careful about
GEQO memory management.)  Otherwise we'll forever be fighting hidden
planner-performance bugs of the form "if you call function xyz from
here, it'll be way slower than you expected".

* There's not much point in making EquivalenceClassIndex a Node type
if you're not going to wire it up to any of the Node infrastructure
(particularly outfuncs.c, which might be useful for debug purposes).
But I'm not really sure that it ought to be a distinct data structure
at all --- maybe this requirement should be more tightly integrated
with the ECs themselves?  One idea of what that might look like is to
let each base RelOptInfo have a list of associated EquivalenceClasses,
so that you'd only have to search through directly-relevant ECs when
trying to prove something.  But I'm just handwaving here.

* Spell check please, particularly EQUIVALANCE.

* Documentation of the data structure is pretty weak, eg what is
"this relation" in the comment about ei_index?  And how do you
know how long the arrays are, or what they're indexed by?  And
there's no explicit statement that ei_flags is a bitmask of those
other symbols, much less any real statement of what each flag means.

Setting the CF entry to WOA for now.  I wonder though if we should
just push it out to v13 immediately --- are you intending to do more
with it in the near future?

            regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Next
From: Magnus Hagander
Date:
Subject: Re: Reaping Temp tables to avoid XID wraparound