Re: ATTACH/DETACH PARTITION CONCURRENTLY - Mailing list pgsql-hackers

From Robert Haas
Subject Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date
Msg-id CA+TgmoZACOKeUf_ve730EEkzsBv3Zxk-Wu0HVdyo1jBkGDkeKQ@mail.gmail.com
Whole thread Raw
In response to Re: ATTACH/DETACH PARTITION CONCURRENTLY  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: ATTACH/DETACH PARTITION CONCURRENTLY  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Tue, Nov 6, 2018 at 5:09 PM Robert Haas <robertmhaas@gmail.com> wrote:
> - get_partition_dispatch_recurse and ExecCreatePartitionPruneState
> both call RelationGetPartitionDesc.  Presumably, this means that if
> the partition descriptor gets updated on the fly, the tuple routing
> and partition dispatch code could end up with different ideas about
> which partitions exist.  I think this should be fixed somehow, so that
> we only call RelationGetPartitionDesc once per query and use the
> result for everything.

I think there is deeper trouble here.
ExecSetupPartitionTupleRouting() calls find_all_inheritors() to
acquire RowExclusiveLock on the whole partitioning hierarchy.  It then
calls RelationGetPartitionDispatchInfo (as a non-relcache function,
this seems poorly named) which calls get_partition_dispatch_recurse,
which does this:

            /*
             * We assume all tables in the partition tree were already locked
             * by the caller.
             */
            Relation    partrel = heap_open(partrelid, NoLock);

That seems OK at present, because no new partitions can have appeared
since ExecSetupPartitionTupleRouting() acquired locks.  But if we
allow new partitions to be added with only ShareUpdateExclusiveLock,
then I think there would be a problem.  If a new partition OID creeps
into the partition descriptor after find_all_inheritors() and before
we fetch its partition descriptor, then we wouldn't have previously
taken a lock on it and would still be attempting to open it without a
lock, which is bad (cf. b04aeb0a053e7cf7faad89f7d47844d8ba0dc839).

Admittedly, it might be a bit hard to provoke a failure here because
I'm not exactly sure how you could trigger a relcache reload in the
critical window, but I don't think we should rely on that.

More generally, it seems a bit strange that we take the approach of
locking the entire partitioning hierarchy here regardless of which
relations the query actually knows about.  If some relations have been
pruned, presumably we don't need to lock them; if/when we permit
concurrent partition, we don't need to lock any new ones that have
materialized.  We're just going to end up ignoring them anyway because
there's nothing to do with the information that they are or are not
excluded from the query when they don't appear in the query plan in
the first place.

Furthermore, this whole thing looks suspiciously like more of the sort
of redundant locking that f2343653f5b2aecfc759f36dbb3fd2a61f36853e
attempted to eliminate.  In light of that commit message, I'm
wondering whether the best approach would be to [1] get rid of the
find_all_inheritors call altogether and [2] somehow ensure that
get_partition_dispatch_recurse() doesn't open any tables that aren't
part of the query's range table.

Thoughts?  Comments?  Ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: plruby: rb_iterate symbol clash with libruby.so
Next
From: Pavel Raiskup
Date:
Subject: Re: plruby: rb_iterate symbol clash with libruby.so