Re: Performance issue in foreign-key-aware join estimation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Performance issue in foreign-key-aware join estimation
Date
Msg-id 15112.1552856892@sss.pgh.pa.us
Whole thread Raw
In response to Re: Performance issue in foreign-key-aware join estimation  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Performance issue in foreign-key-aware join estimation
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> [ eclass_indexes_v3.patch ]

I looked at this for a little bit.  I'm on board with the basic idea
of assigning integer indexes to the ECs and keeping bitmapsets of
relevant EC indexes in RelOptInfos.  However ... man, is that
delete_eclass() thing ugly.  And slow, and fragile-feeling.

I think that maybe we could improve that situation by not trying to
maintain the RelOptInfo.eclass_indexes sets at all during the initial
construction of the ECs, but only setting them up after we've completed
doing that.  We already have an assumption that we know when EC merging
is done and it's safe to make pathkeys that reference the "canonical"
(merged) ECs, so it seems like that would be a point where we could
build the indexing bitmapsets safely.  This hinges on not needing the
bitmapsets till after that point, but it'd be easy to arrange some
Asserts verifying that we don't try to use them before that.

If that doesn't work (because we need the index info sooner), maybe we
could consider never removing ECs from eq_classes, so that their indices
never change, then teaching most/all of the loops over eq_classes to
ignore entries with non-null ec_merged.  But we probably want the indexing
bitmapsets to reflect canonical EC numbers, so we'd still have to do some
updating I fear -- or else be prepared to chase up the ec_merged links
when using the index bitmaps.

Stepping back a little bit, I wonder whether now is the time to rethink
the overall EC data structure, as foreseen in this old comment:

 * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
 * problem, for which there exist better data structures than simple lists.
 * If this code ever proves to be a bottleneck then it could be sped up ---
 * but for now, simple is beautiful.

I'm not very sure what a better data structure would really look like ---
after perusing Sedgewick's section on UNION-FIND, it seems like the
ec_merged links are more or less what he's recommending anyway, and the
expensive part of this is probably all the equal() tests to find whether
two proposed EC member expressions are already in the set of known
expressions.  So I'm just handwaving here.  But my point is that we
needn't feel wedded to the idea of keeping the ECs in a list, if we can
think of some better data structure for them.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: CREATE OR REPLACE AGGREGATE?
Next
From: Chapman Flack
Date:
Subject: Re: Fix XML handling with DOCTYPE