Re: Performance issue in foreign-key-aware join estimation - Mailing list pgsql-hackers

From David Rowley
Subject Re: Performance issue in foreign-key-aware join estimation
Date
Msg-id CAKJS1f8w+1a51L1pmiE_-YSUmV5qwQKXsfueQb7=Nnhft4_F0A@mail.gmail.com
Whole thread Raw
In response to Re: Performance issue in foreign-key-aware join estimation  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Performance issue in foreign-key-aware join estimation  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Sat, 22 Dec 2018 at 10:53, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Back in [1], I mentioned that I'd like to start moving away from our
> linked list based implementation of List and start using an array
> based version instead. If we had this we could easily further improve
> this code fkey matching code to not even look near equivalence classes
> that don't contain members for the relations in question with a design
> something like:
>
> 1. Make PlannerInfo->eq_classes an array list instead of an array,
> this will significantly improve the performance of list_nth().
> 2. Have a Bitmapset per relation that indexes which items in
> eq_classes have ec_members for the given relation.
> 3. In match_eclasses_to_foreign_key_col() perform a bms_overlap() on
> the eq_classes index bitmapsets for the relations at either side of
> the foreign key then perform a bms_next_member() type loop over the
> result of that in order to skip over the eq_classes items that can't
> match.

Using the above idea, but rather than going to the trouble of storing
PlannerInfo->eq_classes as an array type list, if we build the array
on the fly inside match_foreign_keys_to_quals(), then build a
Bitmapset type index to mark which of the eclasses contains members
for each relation, then I can get the run-time for the function down
to just 0.89%.  Looking at other functions appearing high on the
profile I also see; have_relevant_eclass_joinclause() (14%),
generate_join_implied_equalities_for_ecs() (23%).

I think if we seriously want to improve planning performance when
there are many stored equivalence classes, then we need to have
indexing along the lines of what I've outlined above.

I've attached the patch I used to test this idea.  It might be
possible to develop this into something committable, perhaps if we
invent a new function in equivclass.c that builds the index into a
single struct and we pass a pointer to that down to the functions that
require the index.  Such a function could also optionally skip
indexing eclasses such as ones with ec_has_volatile or ec_has_const
when the use case for the index requires ignoring such eclasses.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Function to track shmem reinit time
Next
From: Noah Misch
Date:
Subject: Move regression.diffs of pg_upgrade test suite