Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers

From David Rowley
Subject Re: [PoC] Reducing planning time when tables have many partitions
Date
Msg-id CAApHDvpvofFuGLyv0b_HsZXoRxa6rmG6=BSE65VbN5datyhAJQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Reducing planning time when tables have many partitions  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: [PoC] Reducing planning time when tables have many partitions
List pgsql-hackers
esOn Tue, 9 Aug 2022 at 19:10, David Rowley <dgrowleyml@gmail.com> wrote:
> I've not had a chance to look at the 0003 patch yet.

I've looked at the 0003 patch now.

The performance numbers look quite impressive, however, there were a
few things about the patch that I struggled to figure what they were
done the way you did them:

+ root->eq_not_children_indexes = bms_add_member(root->eq_not_children_indexes,

Why is that in PlannerInfo rather than in the EquivalenceClass?

  if (bms_equal(rel->relids, em->em_relids))
  {
      rel->eclass_member_indexes =
bms_add_member(rel->eclass_member_indexes, em_index);
  }

Why are you only adding the eclass_member_index to the RelOptInfo when
the em_relids contain a singleton relation?

I ended up going and fixing the patch to be more how I imagined it.

I've ended up with 3 Bitmapset fields in EquivalenceClass;
ec_member_indexes, ec_nonchild_indexes, ec_norel_indexes.  I also
trimmed the number of helper functions down for obtaining the minimal
set of matching EquivalenceMember indexes to just:

Bitmapset *
get_ecmember_indexes(PlannerInfo *root, EquivalenceClass *ec, Relids relids,
bool with_children, bool with_norel_members)

Bitmapset *
get_ecmember_indexes_strict(PlannerInfo *root, EquivalenceClass *ec,
Relids relids, bool with_children,
bool with_norel_members)

I'm not so much a fan of the bool parameters, but it seemed better
than having 8 different functions with each combination of the bool
paramters instead of 2.

The "strict" version of the function takes the intersection of
eclass_member_indexes for each rel mentioned in relids, whereas the
non-strict version does a union of those.  Each then intersect that
with all members in the 'ec', or just the non-child members when
'with_children' is false.  They both then optionally bms_add_members()
the ec_norel_members if with_norel_members is true.  I found it
difficult to figure out the best order to do the intersection. That
really depends on if the particular query has many EquivalenceClasses
with few EquivalenceMembers or few EquivalenceClasses with many
EquivalenceMembers. bms_int_members() always recycles the left input.
Ideally, that would always be the smallest Bitmapset. Maybe it's worth
inventing a new version of bms_int_members() that recycles the input
with the least nwords. That would give the subsequent
bms_next_member() calls an easier time. Right now they'll need to loop
over a bunch of 0 words at the end for many queries.

A few problems I ran into along the way:

1. generate_append_tlist() generates Vars with varno=0.  That causes
problems when we add Exprs from those in add_eq_member() as there is
no element at root->simple_rel_array[0] to add eclass_member_indexes
to.
2. The existing comment for EquivalenceMember.em_relids claims "all
relids appearing in em_expr", but that's just not true when it comes
to em_is_child members.

So far, I fixed #1 by adding a hack to setup_simple_rel_arrays() to do
"root->simple_rel_array[0] = makeNode(RelOptInfo);" I'm not suggesting
that's the correct fix. It might be possible to set the varnos to the
varnos from the first Append child instead.

The fact that #2 is not true adds quite a bit of complexity to the
patch and I think the patch might even misbehave as a result. It seems
there are cases where a child em_relids can contain additional relids
that are not present in the em_expr. For example, when a UNION ALL
child has a Const in the targetlist, as explained in a comment in
add_child_rel_equivalences(). However, there also seem to be cases
where the opposite is true.  I had to add the following code in
add_eq_member() to stop a regression test failing:

if (is_child)
    expr_relids = bms_add_members(expr_relids, relids);

That's to make sure we add eclass_member_indexes to each RelOptInfo
mentioned in the em_expr.

After doing all that, I noticed that your benchmark was showing that
create_join_clause() was the new bottleneck. This was due to having to
loop so many times over the ec_sources to find an already built
RestrictInfo. I went off and added some new code to optimize the
lookup of those in a similar way by adding a new Bitmapset field in
RelOptInfo to index which ec_sources it mentioned, which meant having
to move ec_sources into PlannerInfo. I don't think this part of the
patch is quite right yet as the code I have relies on em_relids being
the same as the ones mentioned in the RestrictInfo. That seems not
true for em_is_child EMs, so I think we probably need to add a new
field to EquivalenceMember that truly is just pull_varnos from
em_expr, or else look into some way to make em_relids mean that (like
the comment claims).

Here are my results from running your benchmark on master (@f6c750d31)
with and without the attached patch.

npart master (ms) patched (ms) speedup
2       0.28             0.29                95.92%
4       0.37             0.38                96.75%
8       0.53             0.56                94.43%
16     0.92             0.91                100.36%
32     1.82             1.70                107.57%
64     4.05             3.26                124.32%
128   10.83           6.69                161.89%
256   42.63           19.46               219.12%
512   194.31          42.60              456.14%
1024 1104.02        98.37              1122.33%

This resulted in some good additional gains in planner performance.
The 1024 partition case is now about 11x faster on my machine instead
of 4x.  The 2 partition does regress slightly. There might be a few
things we can do about that, for example, move ec_collation up 1 to
shrink EquivalenceClass back down closer to the size it was before.
[1] might be enough to make up for the remainder.

I've attached a draft patch with my revisions.

David

[1] https://commitfest.postgresql.org/39/3810/

Attachment

pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Patch proposal: New hooks in the connection path
Next
From: Daniel Gustafsson
Date:
Subject: Re: Propose a new function - list_is_empty