Home > mailing lists

Re: Performance issue in foreign-key-aware join estimation - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Performance issue in foreign-key-aware join estimation
Date	March 18, 2019 01:06:19
Msg-id	CAKJS1f8YLCxx2=XS+VP0niWN_oYw_62KjL0eSL5iOkqnh-Ww2A@mail.gmail.com Whole thread
In response to	Re: Performance issue in foreign-key-aware join estimation (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Performance issue in foreign-key-aware join estimation
List	pgsql-hackers

Tree view

Thanks for looking at this.

On Mon, 18 Mar 2019 at 10:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I looked at this for a little bit.  I'm on board with the basic idea
> of assigning integer indexes to the ECs and keeping bitmapsets of
> relevant EC indexes in RelOptInfos.  However ... man, is that
> delete_eclass() thing ugly.  And slow, and fragile-feeling.

Yeah. When I discovered we remove eclasses from the list I was a
little annoyed as the code was pretty simple before having to account
for that.  I'll grant you ugly and slow, but I don't quite see the
fragile part.

> I think that maybe we could improve that situation by not trying to
> maintain the RelOptInfo.eclass_indexes sets at all during the initial
> construction of the ECs, but only setting them up after we've completed
> doing that.  We already have an assumption that we know when EC merging
> is done and it's safe to make pathkeys that reference the "canonical"
> (merged) ECs, so it seems like that would be a point where we could
> build the indexing bitmapsets safely.  This hinges on not needing the
> bitmapsets till after that point, but it'd be easy to arrange some
> Asserts verifying that we don't try to use them before that.

I don't think that's really that easily workable.  Consider, for
example, when add_child_rel_equivalences() is called, this is well
after the location I think you have in mind. Probably this function
should be modified to use the indexes, but it must also update the
indexes too.

> If that doesn't work (because we need the index info sooner), maybe we
> could consider never removing ECs from eq_classes, so that their indices
> never change, then teaching most/all of the loops over eq_classes to
> ignore entries with non-null ec_merged.  But we probably want the indexing
> bitmapsets to reflect canonical EC numbers, so we'd still have to do some
> updating I fear -- or else be prepared to chase up the ec_merged links
> when using the index bitmaps.

That's probably a better solution. Perhaps we can continue to nullify
the ec_members, then just skip eclasses with a NIL ec_members.  I
avoided that in the patch because I was worried about what extension
might be doing, but if you think it's okay, then I can change the
patch.

> Stepping back a little bit, I wonder whether now is the time to rethink
> the overall EC data structure, as foreseen in this old comment:
>
>  * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
>  * problem, for which there exist better data structures than simple lists.
>  * If this code ever proves to be a bottleneck then it could be sped up ---
>  * but for now, simple is beautiful.

I've thought for a while now that it wouldn't be too hard to have
equal(), or some version of it return -1, 0, 1 to allow us to binary
search or build a binary search tree of nodes.  I imagine it wouldn't
be too hard to create a compact binary search tree inside an array
with indexes to the left and right sub-tree instead of pointers. That
would likely be a bit more cache friendly and allow simple
non-recursive traversals of the entire tree.  However, that does not
sound like something this patch should be doing.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Peter Geoghegan
Date: 18 March 2019, 00:59:25
Subject: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons

From: "Jamison, Kirk"
Date: 18 March 2019, 01:08:34
Subject: RE: Timeout parameters

Re: Performance issue in foreign-key-aware join estimation - Mailing list pgsql-hackers

Previous

Next