Re: pg_dump partitions can lead to inconsistent state after restore - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: pg_dump partitions can lead to inconsistent state after restore |
Date | |
Msg-id | CA+HiwqFxACGUPeWLfjx-ySzaiRqkMYoMHxFH9-t2MvfT1TGgsg@mail.gmail.com Whole thread Raw |
In response to | Re: pg_dump partitions can lead to inconsistent state after restore (Alvaro Herrera <alvherre@2ndquadrant.com>) |
List | pgsql-hackers |
On Thu, Apr 25, 2019 at 4:01 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > So, while testing this I noticed that pg_restore fails with deadlocks if > you do a parallel restore if the --load-via-partition-root switch was > given to pg_dump. Is that a known bug? Was investigating --load-via-partition-root with a coworker and came across the following note in the documentation: https://www.postgresql.org/docs/11/app-pgdump.html "It is best not to use parallelism when restoring from an archive made with this option, because pg_restore will not know exactly which partition(s) a given archive data item will load data into. This could result in inefficiency due to lock conflicts between parallel jobs, or perhaps even reload failures due to foreign key constraints being set up before all the relevant data is loaded." Apparently, this note was added as a result of the following discussion: https://www.postgresql.org/message-id/flat/13624.1535486019%40sss.pgh.pa.us So, while the documentation doesn't explicitly list deadlocks as possible risk, Tom hinted in the first email that it's possible. I set out to reproduce one and was able to, although I'm not sure if it's the same deadlock as seen by Alvaro. Steps I used to reproduce: # in the source database create table foo (a int primary key); insert into foo select generate_series(1, 1000000); create table ht (a int) partition by hash (a); select 'create table ht' || i || ' partition of ht for values with (modulus 100, remainder ' || i -1 || ');' from generate_series(1, 100) i; \gexec insert into ht select generate_series(1, 1000000); alter table ht add foreign key (a) references foo (a); # in shell pg_dump --load-via-partition-root -Fd -f /tmp/dump createdb targetdb pg_restore -d targetdb -j 2 /tmp/dump The last step reports deadlocks; in the server log: ERROR: deadlock detected DETAIL: Process 14213 waits for RowExclusiveLock on relation 17447 of database 17443; blocked by process 14212. Process 14212 waits for ShareRowExclusiveLock on relation 17507 of database 17443; blocked by process 14213. Process 14213: COPY public.ht (a) FROM stdin; Process 14212: ALTER TABLE public.ht ADD CONSTRAINT ht_a_fkey FOREIGN KEY (a) REFERENCES public.foo(a); Here, the process adding the foreign key has got the lock on the parent and trying to lock a partition to add the foreign key to it. The process doing COPY (via root) has apperently locked the partition and waiting for the lock on the parent to do actual copying. Looking into why the latter had got a lock on the partition at all if it hasn't started the copying yet, I noticed that it was locked when TRUNCATE was executed on it earlier in the same transaction as part of some WAL-related optimization, which is something that only happens in the parallel restore mode. I was under the impression that the TABLE DATA archive item (its TocEntry) would have no trace of the partition if it was dumped with --load-via-partition-root, but that's not the case. --load-via-partition-root only dictates that the command that will be dumped for the item will use the root parent as COPY target, even though the TocEntry itself is owned by the partition. Maybe, a way to prevent the deadlock would be for the process that will do copy-into-given-partition-via-root-parent to do a `LOCK TABLE root_parent` before `TRUNCATE the_partition`, but we'll need to get hold of the root parent from the TocEntry somehow. Turns out it's only present in the TocEntry.copyStmt, from where it will have to parsed out. Maybe that's the only thing we could do without breaking the archive format though. Thoughts? By the way, I couldn't think of ways to reproduce any of the hazards mentioned in the documentations of using parallel mode to restore an archive written with pg_dump --load-via-root-parent, but maybe I just haven't tried hard enough. Thanks, Amit
pgsql-hackers by date: