Home > mailing lists

Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: [PoC] Reducing planning time when tables have many partitions
Date	August 9, 2023 11:54:26
Msg-id	CAApHDvqp0UrCf9FLGyPWruiuU6BfPPquDxE2Wxw7ZS2S1FNH0A@mail.gmail.com Whole thread Raw
In response to	Re: [PoC] Reducing planning time when tables have many partitions (Yuya Watari <watari.yuya@gmail.com>)
Responses	Re: [PoC] Reducing planning time when tables have many partitions
List	pgsql-hackers

Tree view

On Wed, 9 Aug 2023 at 20:15, Yuya Watari <watari.yuya@gmail.com> wrote:
> I agree with your opinion that my patch lacks some explanations, so I
> will consider adding more comments. However, I received the following
> message from David in March.
>
> On Thu, Mar 9, 2023 at 6:23 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > For the main patch, I've been starting to wonder if it should work
> > completely differently.  Instead of adding members for partitioned and
> > inheritance children, we could just translate the Vars from child to
> > top-level parent and find the member that way. I wondered if this
> > method might be even faster as it would forego
> > add_child_rel_equivalences(). I think we'd still need em_is_child for
> > UNION ALL children.  So far, I've not looked into this in detail. I
> > was hoping to find an idea that would allow some means to have the
> > planner realise that a LIST partition which allows a single Datum
> > could skip pushing base quals which are constantly true. i.e:
> >
> > create table lp (a int) partition by list(a);
> > create table lp1 partition of lp for values in(1);
> > explain select * from lp where a = 1;
> >
> >  Seq Scan on lp1 lp  (cost=0.00..41.88 rows=13 width=4)
> >    Filter: (a = 1)
>
> I am concerned that fixing the current patch will conflict with
> David's idea. Of course, I am now trying to experiment with the above
> idea, but I should avoid the conflict if he is working on this. David,
> what do you think about this? Is it OK to post a new patch to address
> the review comments? I am looking forward to your reply.

So, I have three concerns with this patch.

1) I really dislike the way eclass_member_iterator_next() has to check
bms_overlap() to filter out unwanted EMs.  This is required because of
how add_child_rel_equivalences() does not pass the "relids" parameter
in add_eq_member() as equivalent to pull_varnos(expr).  See this code
in master:

/*
* Transform em_relids to match.  Note we do *not* do
* pull_varnos(child_expr) here, as for example the
* transformation might have substituted a constant, but we
* don't want the child member to be marked as constant.
*/
new_relids = bms_difference(cur_em->em_relids,
top_parent_relids);
new_relids = bms_add_members(new_relids, child_relids);

I understand this is done to support Consts in UNION ALL parents, e.g
the following query prunes the n=2 UNION ALL branch

postgres=# explain select * from (select 1 AS n,* from pg_Class c1
union all select 2 AS n,* from pg_Class c2) where n=1;
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on pg_class c1  (cost=0.00..18.13 rows=413 width=277)
(1 row)

... but the following (existing) comment is just a lie:

Relids em_relids; /* all relids appearing in em_expr */

This means that there's some weirdness on which RelOptInfos we set
eclass_member_indexes.  Do we just set the EM in the RelOptInfos
mentioned in the em_expr, or should it be the ones in em_relids?

You can see the following code I wrote in the 0001 patch which tries
to work around this problem:

+ /*
+ * We must determine the exact set of relids in the expr for child
+ * EquivalenceMembers as what is given to us in 'relids' may differ from
+ * the relids mentioned in the expression.  See add_child_rel_equivalences
+ */
+ if (parent != NULL)
+ expr_relids = pull_varnos(root, (Node *) expr);
+ else
+ {
+ expr_relids = relids;
+ /* We expect the relids to match for non-child members */
+ Assert(bms_equal(pull_varnos(root, (Node *) expr), relids));
+ }

So, you can see we go with the relids from the em_expr rather than
what's mentioned in em_relids.  I believe this means we need the
following line:

+ /*
+ * Don't return members which have no common rels with with_relids
+ */
+ if (!bms_overlap(em->em_relids, iter->with_relids))
+ continue;

I don't quite recall if the em_expr can mention relids that are not in
em_relids or not or if em_expr's relids always is a subset of
em_relids.

I'm just concerned this adds complexity and the risk of mixing up the
meaning (even more than it is already in master). I'm not sure I'm
confident that all this is correct, and I wrote the 0001 patch.

Maybe this can be fixed by changing master so that em_relids always
matches pull_varnos(em_expr)? I'm unsure if there are any other
complexities other than having to ensure we don't set em_is_const for
child members.

2) The 2nd reason is what I hinted at that you quoted in the email I
sent you in March.  I think if it wasn't for UNION ALL and perhaps
table inheritance and we only needed child EMs for partitions of
partitioned tables, then I think we might be able to get away with
just translating Exprs child -> parent before looking up the EM and
likewise when asked to get join quals for child rels, we'd translate
the child relids to their top level parents, find the quals then
translate those back to child form again. EquivalenceClasses would
then only contain a few members and there likely wouldn't be a great
need to do any indexing like we are in the 0001 patch.  I'm sure
someone somewhere probably has a query that would go faster with them,
but it's likely going to be rare therefore probably not worth it.

Unfortunately, I'm not smart enough to just tell you this will or will
not work just off hand. The UNION ALL branch pruning adds complexity
that I don't recall the details of.  To know, someone would either
need to tell me, or I'd need to go try to make it work myself and then
discover the reason it can't be made to work. I'm happy for you to try
this, but if you don't I'm not sure when I can do it.  I think it
would need to be at least explored before I'd ever consider thinking
about committing this patch.

3) I just don't like the way the patch switches between methods of
looking up EMs as it means we could return EMs in a different order
depending on something like how many partitions were pruned or after
the DBA does ATTACH PARTITION.  That could start causing weird
problems like plan changes due to a change in which columns were
selected in generate_implied_equalities_for_column().  I don't have
any examples of actual problems, but it's pretty difficult to prove
there aren't any.

Of course, I do recall the complaint about the regression for more
simple queries and that's why I wrote the iterator code to have it use
the linear search when the number of EMs is small, so we can't exactly
just delete the linear search method as we'd end up with that
performance regression again.

I think the best way to move this forward is to explore not putting
partitioned table partitions in EMs and instead see if we can
translate to top-level parent before lookups.  This might just be too
complex to translate the Exprs all the time and it may add overhead
unless we can quickly determine somehow that we don't need to attempt
to translate the Expr when the given Expr is already from the
top-level parent. If that can't be made to work, then maybe that shows
the current patch has merit.

David

pgsql-hackers by date:

From: Christoph Berg
Date: 09 August 2023, 11:38:07
Subject: Re: A failure in 031_recovery_conflict.pl on Debian/s390x

From: Melih Mutlu
Date: 09 August 2023, 12:02:31
Subject: Separate memory contexts for relcache and catcache

Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers

Previous

Next