Re: Performance issue in foreign-key-aware join estimation - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Performance issue in foreign-key-aware join estimation |
Date | |
Msg-id | CAKJS1f8YLCxx2=XS+VP0niWN_oYw_62KjL0eSL5iOkqnh-Ww2A@mail.gmail.com Whole thread Raw |
In response to | Re: Performance issue in foreign-key-aware join estimation (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Performance issue in foreign-key-aware join estimation
|
List | pgsql-hackers |
Thanks for looking at this. On Mon, 18 Mar 2019 at 10:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked at this for a little bit. I'm on board with the basic idea > of assigning integer indexes to the ECs and keeping bitmapsets of > relevant EC indexes in RelOptInfos. However ... man, is that > delete_eclass() thing ugly. And slow, and fragile-feeling. Yeah. When I discovered we remove eclasses from the list I was a little annoyed as the code was pretty simple before having to account for that. I'll grant you ugly and slow, but I don't quite see the fragile part. > I think that maybe we could improve that situation by not trying to > maintain the RelOptInfo.eclass_indexes sets at all during the initial > construction of the ECs, but only setting them up after we've completed > doing that. We already have an assumption that we know when EC merging > is done and it's safe to make pathkeys that reference the "canonical" > (merged) ECs, so it seems like that would be a point where we could > build the indexing bitmapsets safely. This hinges on not needing the > bitmapsets till after that point, but it'd be easy to arrange some > Asserts verifying that we don't try to use them before that. I don't think that's really that easily workable. Consider, for example, when add_child_rel_equivalences() is called, this is well after the location I think you have in mind. Probably this function should be modified to use the indexes, but it must also update the indexes too. > If that doesn't work (because we need the index info sooner), maybe we > could consider never removing ECs from eq_classes, so that their indices > never change, then teaching most/all of the loops over eq_classes to > ignore entries with non-null ec_merged. But we probably want the indexing > bitmapsets to reflect canonical EC numbers, so we'd still have to do some > updating I fear -- or else be prepared to chase up the ec_merged links > when using the index bitmaps. That's probably a better solution. Perhaps we can continue to nullify the ec_members, then just skip eclasses with a NIL ec_members. I avoided that in the patch because I was worried about what extension might be doing, but if you think it's okay, then I can change the patch. > Stepping back a little bit, I wonder whether now is the time to rethink > the overall EC data structure, as foreseen in this old comment: > > * Note: constructing merged EquivalenceClasses is a standard UNION-FIND > * problem, for which there exist better data structures than simple lists. > * If this code ever proves to be a bottleneck then it could be sped up --- > * but for now, simple is beautiful. I've thought for a while now that it wouldn't be too hard to have equal(), or some version of it return -1, 0, 1 to allow us to binary search or build a binary search tree of nodes. I imagine it wouldn't be too hard to create a compact binary search tree inside an array with indexes to the left and right sub-tree instead of pointers. That would likely be a bit more cache friendly and allow simple non-recursive traversals of the entire tree. However, that does not sound like something this patch should be doing. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: