Re: ATTACH/DETACH PARTITION CONCURRENTLY - Mailing list pgsql-hackers

From Robert Haas
Subject Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date
Msg-id CA+TgmoaoZS1aRu9tW3k5Wrqsti-gcFF9yBiiGwE2Mhu5Gpkp6g@mail.gmail.com
Whole thread Raw
In response to Re: ATTACH/DETACH PARTITION CONCURRENTLY  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: ATTACH/DETACH PARTITION CONCURRENTLY
List pgsql-hackers
On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Namely: how does this handle the case of partition pruning structure
> being passed from planner to executor, if an attach happens in the
> middle of it and puts a partition in between existing partitions?  Array
> indexes of any partitions that appear later in the partition descriptor
> will change.

I finally gotten a little more time to work on this.  It took me a
while to understand that a PartitionedRelPruneInfos assumes that the
indexes of partitions in the PartitionDesc don't change between
planning and execution, because subplan_map[] and subplan_map[] are
indexed by PartitionDesc offset.  I suppose the reason for this is so
that we don't have to go to the expense of copying the partition
bounds from the PartitionDesc into the final plan, but it somehow
seems a little scary to me.  Perhaps I am too easily frightened, but
it's certainly a problem from the point of view of this project, which
wants to let the PartitionDesc change concurrently.

I wrote a little patch that stores the relation OIDs of the partitions
into the PartitionedPruneRelInfo and then, at execution time, does an
Assert() that what it gets matches what existed at plan time.  I
figured that a good start would be to find a test case where this
fails with concurrent DDL allowed, but I haven't so far succeeded in
devising one.  To make the Assert() fail, I need to come up with a
case where concurrent DDL has caused the PartitionDesc to be rebuilt
but without causing an update to the plan.  If I use prepared queries
inside of a transaction block, I can continue to run old plans after
concurrent DDL has changed things, but I can't actually make the
Assert() fail, because the queries continue to use the old plans
precisely because they haven't processed invalidation messages, and
therefore they also have the old PartitionDesc and everything works.
Maybe if I try it with CLOBBER_CACHE_ALWAYS...

I also had the idea of trying to use a cursor, because if I could
start execution of a query, then force a relcache rebuild, then
continue executing the query, maybe something would blow up somehow.
But that's not so easy because I don't think we have any way using SQL
to declare a cursor for a prepared query, so how do I need to get a
query plan that involves run-time pruning without using parameters,
which I'm pretty sure is possible but I haven't figured it out yet.
And even there the PartitionDirectory concept might preserve us from
any damage if the change happens after the executor is initialized,
though I'm not sure if there are any cases where we don't do the first
PartitionDesc lookup for a particular table until mid-execution.

Anyway, I think this idea of passing a list of relation OIDs that we
saw at planning time through to the executor and cross-checking them
might have some legs.  If we only allowed concurrently *adding*
partitions and not concurrently *removing* them, then even if we find
the case(s) where the PartitionDesc can change under us, we can
probably just adjust subplan_map and subpart_map to compensate, since
we can iterate through the old and new arrays of relation OIDs and
just figure out which things have shifted to higher indexes in the
PartitionDesc.  This is all kind of hand-waving at the moment; tips
appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Variable-length FunctionCallInfoData
Next
From: David Fetter
Date:
Subject: Re: crosstab/repivot...any interest?