Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers

From David Rowley
Subject Re: [PoC] Reducing planning time when tables have many partitions
Date
Msg-id CAApHDvqMut8wjN8vgHAot6DoHZ1FcGjtuRGaeCrGVvfWVqcmdQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Reducing planning time when tables have many partitions  (Yuya Watari <watari.yuya@gmail.com>)
Responses Re: [PoC] Reducing planning time when tables have many partitions
List pgsql-hackers
On Fri, 26 Aug 2022 at 12:40, Yuya Watari <watari.yuya@gmail.com> wrote:
> This performance degradation is due to the heavy processing of the
> get_ec***_indexes***() functions. These functions are the core part of
> the optimization we are working on in this thread, but they are
> relatively heavy when the number of partitions is small.
>
> I noticed that these functions were called repeatedly with the same
> arguments. During planning Query B with one partition, the
> get_ec_source_indexes_strict() function was called 2087 times with
> exactly the same parameters. Such repeated calls occurred many times
> in a single query.

How about instead of doing this caching like this, why don't we code
up some iterators that we can loop over to fetch the required EMs.

I'll attempt to type out my thoughts here without actually trying to
see if this works:

typedef struct EquivalenceMemberIterator
{
   EquivalenceClass *ec;
   Relids relids;
   Bitmapset *em_matches;
   int   position; /* last found index of em_matches or -1 */
   bool use_index;
   bool with_children;
   bool with_norel_members;
} EquivalenceMemberIterator;

We'd then have functions like:

static void
get_ecmember_indexes_iterator(EquivalenceMemberIterator *it,
PlannerInfo *root, EquivalenceClass *ec, Relids relids, bool
with_children, bool with_norel_members)
{
    it->ec = ec;
    it->relids = relids;
    it->position = -1;

    it->use_index = (root->simple_rel_array_size > 32); /* or whatever
threshold is best */
    it->with_children = with_children;
    it->with_norel_members = with_norel_members;

    if (it->use_index)
        it->em_matches = get_ecmember_indexes(root, ec, relids,
with_children, with_norel_members);
   else
       it->em_matches = NULL;
}

static EquivalenceMember *
get_next_matching_member(PlannerInfo *root, EquivalenceMemberIterator *it)
{
   if (it->use_index)
   {
        it->position = bms_next_member(it->ec_matches, it->position);
        if (it->position >= 0)
             return list_nth(root->eq_members, it->position);
        return NULL;
    }
    else
    {
         int i = it->position;
         while ((i = bms_next_member(it->ec->ec_member_indexes, i) >= 0)
          {
                /* filter out the EMs we don't want here "break" when
we find a match */
          }
          it->position = i;
          if (i >= 0)
             return list_nth(root->eq_members, i);
          return NULL;
    }
}

Then the consuming code will do something like:

EquivalenceMemberIterator iterator;
get_ecmember_indexes_iterator(&iterator, root, ec, relids, true, false);

while ((cur_em = get_next_matching_member(root, &it)) != NULL)
{
 // do stuff
}

David



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying
Next
From: Nathan Bossart
Date:
Subject: Re: use SSE2 for is_valid_ascii