Re: patch for parallel pg_dump - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: patch for parallel pg_dump |
Date | |
Msg-id | CA+Tgmoa_F0+QMbj31rMMdqBa2bqf_nXN=i=6rXomuD1F53BSoA@mail.gmail.com Whole thread Raw |
In response to | patch for parallel pg_dump (Joachim Wieland <joe@mcknight.de>) |
Responses |
Re: patch for parallel pg_dump
|
List | pgsql-hackers |
On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland <joe@mcknight.de> wrote: > So this is the parallel pg_dump patch, generalizing the existing > parallel restore and allowing parallel dumps for the directory archive > format, the patch works on Windows and Unix. It seems a little unfortunate that we are using threads here on Windows and processes on Linux. Maybe it's too late to revisit that decision, but it seems like a recipe for subtle bugs. > In parallel restore, the master closes its own connection to the > database before forking of worker processes, just as it does now. In > parallel dump however, we need to hold the masters connection open so > that we can detect deadlocks. The issue is that somebody could have > requested an exclusive lock after the master has initially requested a > shared lock on all tables. Therefore, the worker process also requests > a shared lock on the table with NOWAIT and if this fails, we know that > there is a conflicting lock in between and that we need to abort the > dump. I think this is an acceptable limitation, but the window where it can happen seems awfully wide right now. As things stand, it seems like we don't try to lock the table in the child until we're about to access it, which means that, on a large database, we could dump out 99% of the database and then be forced to abort the dump because of a conflicting lock on the very last table. We could fix that by having every child lock every table right at the beginning, so that all possible failures of this type would happen before we do any work, but that will eat up a lot of lock table space. It would be nice if the children could somehow piggyback on the parent's locks, but I don't see any obvious way to make that work. Maybe we just have to live with it the way it is, but I worry that people whose dumps fail 10 hours into a 12 hour parallel dump are going to be grumpy. > The connections of the parallel dump use the synchronized snapshot > feature. However there's also an option --no-synchronized-snapshots > which can be used to dump from an older PostgreSQL version. Right now, you have it set up so that --no-synchronized-snapshots is ignored even if synchronized snapshots are supported, which doesn't make much sense to me. I think you should allow --no-synchronized-snapshots with any release, and error out if it's not specified and the version is too old work without it. It's probably not a good idea to run with --no-synchronized-snapshots ever, and doubly so if they're not available, but I'd rather leave that decision to the user. (Imagine, for example, that we discovered a bug in our synchronized snapshot implementation.) I am tempted to advocate calling the flag --unsynchronized-snapshots, because to me that underscores the danger a little more clearly, but perhaps that is too clever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: