ATTACH/DETACH PARTITION CONCURRENTLY - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | ATTACH/DETACH PARTITION CONCURRENTLY |
Date | |
Msg-id | CAKJS1f9QjUwQrio20Pi=yCHmnouf4z3SfN8sqXaAcwREG6k0zQ@mail.gmail.com Whole thread Raw |
Responses |
Re: ATTACH/DETACH PARTITION CONCURRENTLY
Re: ATTACH/DETACH PARTITION CONCURRENTLY |
List | pgsql-hackers |
Hi, One of the downsides of declarative partitioning vs old school inheritance partitioning is that a new partition cannot be added to the partitioned table without taking an AccessExclusiveLock on the partitioned table. We've obviously got a bunch of features for various other things where we work a bit harder to get around that problem, e.g creating indexes concurrently. I've started working on allowing partitions to be attached and detached with just a ShareUpdateExclusiveLock on the table. If I'm correct, then we can do this in a similar, but more simple way as to how CREATE INDEX CONCURRENTLY works. We just need to pencil in that the new partition exists, but not yet valid, then wait for snapshots older than our own to finish before marking the partition is valid. One problem I had with doing this is that there was not really a good place to store that "isvalid" flag for partitions. We have pg_index for indexes, but partition details are just spread over pg_inherits and pg_class. So step 1 was to move all that into a new table called pg_partition. I think this is quite nice as it also gets rid of relpartbound out of pg_class. It probably just a matter of time before someone complains that they can't create some partition with some pretty large Datum due to it not being able to fit on a single heap page (pg_class has no TOAST table). I ended up getting rid of pg_class.relispartition replacing it with relpartitionparernt which is just InvalidOid when the table or index is not a partition. This allows various pieces of code to be more efficient since we can look at the relcache instead of scanning pg_inherits all the time. It's now also much faster to get a partitions ancestors. So, patches 0001 is just one I've already submitted for the July 'fest. Nothing new. It was just required to start this work. 0002 migrates partitions out of pg_inherits into pg_partition. This patch is at a stage where it appears to work, but is very unpolished and requires me to stare at it much longer than I've done so far. There's a bunch of code that gets repeated way too many times in tablecmds.c, for example. 0003 does the same for partitioned indexes. The patch is in a similar, maybe slightly worse state than 0002. Various comments will be out of date. 0004 is the early workings of what I have in mind for the concurrent ATTACH code. It's vastly incomplete. It does pass make check but really only because there are no tests doing any concurrent attaches. There's a mountain of code missing that ignores invalid partitions. I just have a very simple case working. Partition-wise joins will be very much broken by what I have so far, and likely a whole bunch of other stuff. About the extent of my tests so far are the following: --setup create table listp (a int) partition by list(a); create table listp1 partition of listp for values in(1); create table listp2 (a int); insert into listp1 values(1); insert into listp2 values(2); -- example 1. start transaction isolation level repeatable read; -- Session 1 select * from listp; -- Session 1 a --- 1 (1 row) alter table listp attach partition concurrently listp2 for values in (2); -- Session 2 (waits for release of session 1's snapshot) select * from listp; -- Session 1 a --- 1 commit; -- session 1 (session 2's alter table now finishes waiting) select * from listp; -- Session 1 (new partition now valid) a --- 1 2 (2 rows) -- example 2. start transaction isolation level read committed; -- session 1; select * from listp; -- session 1 a --- 1 (1 row) alter table listp attach partition concurrently listp2 for values in (2); -- Session 2 completes without waiting. select * from listp; -- Session 1 (new partition visible while in transaction) a --- 1 2 (2 rows) This basically works by: 1. Do all the normal partition attach partition validation. 2. Insert a record into pg_partition with partisvalid=false 3. Obtain a session-level ShareUpdateExclusiveLock on the partitioned table. 4. Obtain a session-level AccessExclusiveLock on the partition being attached. 5. Commit. 6. Start a new transaction. 7. Wait for snapshots older than our own to be released. 8. Mark the partition as valid 9. Invalidate relcache for the partitioned table. 10. release session-level locks. I've disallowed the feature when the partitioned table has a default partition. I don't see how this can be made to work. At the moment ALTER TABLE ... ATTACH PARTITION commands cannot contain any other sub-commands in the ALTER TABLE, so performing the additional transaction commit and begin inside the single sub-command might be okay. It does mean that 'rel' which is passed down to ATExecAttachPartition() must be closed and reopened again which results in the calling function having a pointer into a closed Relation. I worked around this by changing the code so it passes a pointer to the Relation, and I've got ATExecAttachPartition() updating that pointer before returning. It's not particularly pretty, but I didn't really see how else this can be done. I've not yet done anything about the DETACH CONCURRENTLY case. I think it should just be the same steps in some roughly reverse order. We can skip the waiting part of the partition being detached is still marked as invalid from some failed concurrent ATTACH. I've not thought much about pg_dump beyond just have it ignore invalid partitions. I don't think it's very useful to support some command that attaches an invalid partition since there will be no command to revalidate an invalid partition. It's probably best to resolve that with a DETACH followed by a new ATTACH. So probably pg_dump can just do nothing for invalid partitions. So anyway, my intentions of posting this patch now rather than when it's closer to being finished is for design review. I'm interested in hearing objections, comments, constructive criticism for patches 0002-0004. Patch 0001 comments can go to [1] Are there any blockers on this that I've overlooked? [1] https://www.postgresql.org/message-id/CAKJS1f81TpxZ8twugrWCo%3DVDHEkmagxRx7a%2B1z4aaMeQy%3DnA7w%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: