Re: ATTACH/DETACH PARTITION CONCURRENTLY - Mailing list pgsql-hackers

From Andres Freund
Subject Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date
Msg-id 20180807132925.grxgp3mtg4i6mpib@alap3.anarazel.de
Whole thread Raw
In response to Re: ATTACH/DETACH PARTITION CONCURRENTLY  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: ATTACH/DETACH PARTITION CONCURRENTLY
Re: ATTACH/DETACH PARTITION CONCURRENTLY
Re: ATTACH/DETACH PARTITION CONCURRENTLY
Re: ATTACH/DETACH PARTITION CONCURRENTLY
List pgsql-hackers
On 2018-08-08 01:23:51 +1200, David Rowley wrote:
> On 8 August 2018 at 00:47, Andres Freund <andres@anarazel.de> wrote:
> > On 2018-08-08 00:40:12 +1200, David Rowley wrote:
> >> 1. Obtain a ShareUpdateExclusiveLock on the partitioned table rather
> >> than an AccessExclusiveLock.
> >> 2. Do all the normal partition attach partition validation.
> >> 3. Insert pg_partition record with partvalid = true.
> >> 4. Invalidate relcache entry for the partitioned table
> >> 5. Any loops over a partitioned table's PartitionDesc must check
> >> PartitionIsValid(). This will return true if the current snapshot
> >> should see the partition or not. The partition is valid if partisvalid
> >> = true and the xmin precedes or is equal to the current snapshot.
> >
> > How does this protect against other sessions actively using the relcache
> > entry? Currently it is *NOT* safe to receive invalidations for
> > e.g. partitioning contents afaics.
> 
> I'm not proposing that sessions running older snapshots can't see that
> there's a new partition. The code I have uses PartitionIsValid() to
> test if the partition should be visible to the snapshot. The
> PartitionDesc will always contain details for all partitions stored in
> pg_partition whether they're valid to the current snapshot or not.  I
> did it this way as there's no way to invalidate the relcache based on
> a point in transaction, only a point in time.

I don't think that solves the problem that an arriving relcache
invalidation would trigger a rebuild of rd_partdesc, while it actually
is referenced by running code.

You'd need to build infrastructure to prevent that.

One approach would be to make sure that everything relying on
rt_partdesc staying the same stores its value in a local variable, and
then *not* free the old version of rt_partdesc (etc) when the refcount >
0, but delay that to the RelationClose() that makes refcount reach
0. That'd be the start of a framework for more such concurrenct
handling.

Regards,

Andres Freund


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Next
From: Don Seiler
Date:
Subject: Re: [PATCH] Include application_name in "connection authorized" log message