Re: [bug?] Missed parallel safety checks, and wrong parallel safety - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Date
Msg-id CA+TgmoZmbD+UOSXn91+pdp_zFZhNgHYKTKeZuYNp5R-Vi2DU0w@mail.gmail.com
Whole thread Raw
In response to Re: [bug?] Missed parallel safety checks, and wrong parallel safety  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: [bug?] Missed parallel safety checks, and wrong parallel safety  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Re: [bug?] Missed parallel safety checks, and wrong parallel safety  (Amit Kapila <amit.kapila16@gmail.com>)
RE: [bug?] Missed parallel safety checks, and wrong parallel safety  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Okay, but I think if we go with your suggested model where whenever
> there is a change in parallel-safety of any function, we need to send
> the new invalidation then I think it won't matter whether the function
> is associated with index expression, check constraint in the table, or
> is used in any other way.

No, it will still matter, because I'm proposing that the
parallel-safety of functions that we only access via operator classes
will not even be checked. Also, if we decided to make the system more
fine-grained - e.g. by invalidating on the specific OID of the
function that was changed rather than doing something that is
database-wide or global - then it matters even more.

> Yeah, dealing with partitioned tables is tricky. I think if we don't
> want to check upfront the parallel safety of all the partitions then
> the other option as discussed could be to ask the user to specify the
> parallel safety of partitioned tables.

Just to be clear here, I don't think it really matters what we *want*
to do. I don't think it's reasonably *possible* to check all the
partitions, because we don't hold locks on them. When we're assessing
a bunch of stuff related to an individual relation, we have a lock on
it. I think - though we should double-check tablecmds.c - that this is
enough to prevent all of the dependent objects - triggers,
constraints, etc. - from changing. So the stuff we care about is
stable. But the situation with a partitioned table is different. In
that case, we can't even examine that stuff without locking all the
partitions. And even if we do lock all the partitions, the stuff could
change immediately afterward and we wouldn't know. So I think it would
be difficult to make it correct.

Now, maybe it could be done, and I think that's worth a little more
thought. For example, perhaps whenever we invalidate a relation, we
could also somehow send some new, special kind of invalidation for its
parent saying, essentially, "hey, one of your children has changed in
a way you might care about." But that's not good enough, because it
only goes up one level. The grandparent would still be unaware that a
change it potentially cares about has occurred someplace down in the
partitioning hierarchy. That seems hard to patch up, again because of
the locking rules. The child can know the OID of its parent without
locking the parent, but it can't know the OID of its grandparent
without locking its parent. Walking up the whole partitioning
hierarchy might be an issue for a number of reasons, including
possible deadlocks, and possible race conditions where we don't emit
all of the right invalidations in the face of concurrent changes. So I
don't quite see a way around this part of the problem, but I may well
be missing something. In fact I hope I am missing something, because
solving this problem would be really nice.

> We can additionally check the
> parallel safety of partitions when we are trying to insert into a
> particular partition and error out if we detect any parallel-unsafe
> clause and we are in parallel-mode. So, in this case, we won't be
> completely relying on the users. Users can either change the parallel
> safe option of the table or remove/change the parallel-unsafe clause
> after error. The new invalidation message as we are discussing would
> invalidate the parallel-safety for individual partitions but not the
> root partition (partitioned table itself). For root partition, we will
> rely on information specified by the user.

Yeah, that may be the best we can do. Just to be clear, I think we
would want to check whether the relation is still parallel-safe at the
start of the operation, but not have a run-time check at each function
call.

> I am not sure if we have a simple way to check the parallel safety of
> partitioned tables. In some way, we need to rely on user either (a) by
> providing an option to specify whether parallel Inserts (and/or other
> DMLs) can be performed, or (b) by providing a guc and/or rel option
> which indicate that we can check the parallel-safety of all the
> partitions. Yet another option that I don't like could be to
> parallelize inserts on non-partitioned tables.

If we figure out a way to check the partitions automatically that
actually works, we don't need a switch for it; we can (and should)
just do it that way all the time. But if we can't come up with a
correct algorithm for that, then we'll need to add some kind of option
where the user declares whether it's OK.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: SSL/TLS instead of SSL in docs
Next
From: Julien Rouhaud
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints