Thread: pg_dump lock timeout
Attached is a patch to add a commandline option to pg_dump to limit how long pg_dump will wait for locks during startup. The intent of this patch is to allow pg_dump to fail if a table lock cannot be taken in a reasonable time. This allows the caller of pg_dump to retry or otherwise correct the situation, without having locks held for long periods, and without pg_dump having a long window during which catalog changes can occur. It works by setting statement_timeout to the user specified delay during the startup phase where it is taking access share locks on all the tables. Once all the locks are taken, it sets statement_timeout back to the default. If a lock table statement times out, the dump fails with the statement timed out error. The orginal motivation was a client who runs heavy batch workloads and uses truncate table and other DML in long transactions. This has created some unhappy interaction scenarios with pg_dump: - pg_dump ends up waiting hours on a DML table lock that is part of a long transaction. Once the lock is released, pg_dump runs only to find some table later in the list has been dropped. So pg_dump fails. - pg_dump waits on a lock while holding access share locks on most of the tables. Other processes that want to do DML wait on pg_dump. After a while, large parts of the application are blocked while pg_dump waits on locks. Eventually the operations staff notice that pg_dump is blocking production and kill the dump. Please have a look and consider it for merging. Thanks -dg -- David Gould If simplicity worked, the world would be overrun with insects.
On Sun, May 11, 2008 at 04:30:47AM -0700, daveg wrote: > > Attached is a patch to add a commandline option to pg_dump to limit how long > pg_dump will wait for locks during startup. Ooops, really attached this time. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
Attachment
On Sun, May 11, 2008 at 06:00:35AM -0700, David Gould wrote: > On Sun, May 11, 2008 at 04:30:47AM -0700, daveg wrote: > > > > Attached is a patch to add a commandline option to pg_dump to > > limit how long pg_dump will wait for locks during startup. > > Ooops, really attached this time. Can we see about getting this into the July commitfest? Dave has presented a use case complete with logs where having this could have prevented a failed backup and consequent data loss. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 5/11/08, daveg <daveg@sonic.net> wrote: > Attached is a patch to add a commandline option to pg_dump to limit how long > pg_dump will wait for locks during startup. My quick review: - It does not seem important enough to waste a short option on. Having only long option should be enough. - It would be more polite to do SET LOCAL instead SET. (Eg. it makes safer to use pg_dump through pooler.) - The statement_timeout is set back with "statement_timeout = default" Maybe it would be better to do "= 0" here? Although such decision would go outside the scope of the patch, I see no sense having any other statement_timeout for actual dumping. -- marko
On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote: > On 5/11/08, daveg <daveg@sonic.net> wrote: > > Attached is a patch to add a commandline option to pg_dump to limit how long > > pg_dump will wait for locks during startup. > > My quick review: > > - It does not seem important enough to waste a short option on. > Having only long option should be enough. Agreed. I'll change it. > - It would be more polite to do SET LOCAL instead SET. > (Eg. it makes safer to use pg_dump through pooler.) Also agreed. Thanks. > - The statement_timeout is set back with "statement_timeout = default" > Maybe it would be better to do "= 0" here? Although such decision > would go outside the scope of the patch, I see no sense having > any other statement_timeout for actual dumping. I'd prefer to leave whatever policy is otherwise in place alone. I can see use cases for either having or not having a timeout for pg_dump, but it does seem outside the scope of this patch. Thanks for you review and comments. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
daveg <daveg@sonic.net> writes: > On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote: >> - The statement_timeout is set back with "statement_timeout = default" >> Maybe it would be better to do "= 0" here? Although such decision >> would go outside the scope of the patch, I see no sense having >> any other statement_timeout for actual dumping. > I'd prefer to leave whatever policy is otherwise in place alone. The policy in place in CVS HEAD is that pg_dump explicitly sets statement_timeout to 0. Setting it to default would break that, and will certainly not be accepted. regards, tom lane
On Thu, Jul 03, 2008 at 05:55:01AM -0700, daveg wrote: > On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote: > > On 5/11/08, daveg <daveg@sonic.net> wrote: > > > Attached is a patch to add a commandline option to pg_dump to limit how long > > > pg_dump will wait for locks during startup. > > > > My quick review: > > > > - It does not seem important enough to waste a short option on. > > Having only long option should be enough. > > Agreed. I'll change it. > > > - It would be more polite to do SET LOCAL instead SET. > > (Eg. it makes safer to use pg_dump through pooler.) > > Also agreed. Thanks. On second glance, pg_dump sets lots of variables without using SET LOCAL. I think fixing that must be the subject of a separate patch as fixing just one of many will only cause confusion. > > - The statement_timeout is set back with "statement_timeout = default" > > Maybe it would be better to do "= 0" here? Although such decision > > would go outside the scope of the patch, I see no sense having > > any other statement_timeout for actual dumping. > > I'd prefer to leave whatever policy is otherwise in place alone. I can see > use cases for either having or not having a timeout for pg_dump, but it does > seem outside the scope of this patch. As it happens, another patch has set the policy to "statement_timeout = 0", so I will follow that. I'm sending in the revised patch today. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
Here is an updated version of this patch against head. It builds, runs and functions as expected. I did not build the sgml. I've made changes based on various comments as follows: - use WAIT_TIME in description consistantly. Reworded for clarity. (Stephan Frost) - Use a separate query buffer in getTables() (Stephan Frost) - sets statement_timeout=0 afterwards per new policy (Tom Lane, Marko Kreen) - has only --long-option to conserve short option letters (Marko Kreen) Regards -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
Attachment
daveg <daveg@sonic.net> writes: > Here is an updated version of this patch against head. It builds, runs and > functions as expected. I did not build the sgml. Applied with mostly minor cosmetic improvements --- the only actual error I noticed was failing to check whether the server version supports statement_timeout. In most cases our policy has been that pg_dumpall should accept and pass through any pg_dump option for which it's sensible to do so. I did not make that happen but it seems it'd be a reasonable follow-on patch. A minor point is that the syntax "-X lock-wait-timeout=n" or "-X lock-wait-timeout n" won't work, although perhaps people used to -X might expect it to. Since we deprecate -X (and don't even document it anymore), I thought that making this work would be much more trouble than it's worth, but perhaps that's open to argument. regards, tom lane
On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: > daveg <daveg@sonic.net> writes: > > Here is an updated version of this patch against head. It builds, runs and > > functions as expected. I did not build the sgml. > > Applied with mostly minor cosmetic improvements --- the only actual > error I noticed was failing to check whether the server version supports > statement_timeout. I chose not to test backend version on the grounds that getting an explicit failure for an explicitly requested option would be preferable to it being silently ignored. However if the user is trying to use the same scripts for many versions then ignoring unsupported but unessential features may be preferred. One of the cosmetic changes made in response to other reviewers was to not reuse lockquery, instead to have a separate query buffer. You have reversed that and eliminated lockquery too. Which seems better. > In most cases our policy has been that pg_dumpall should accept and pass > through any pg_dump option for which it's sensible to do so. I did not > make that happen but it seems it'd be a reasonable follow-on patch. I'll remember that next time. > A minor point is that the syntax "-X lock-wait-timeout=n" or > "-X lock-wait-timeout n" won't work, although perhaps people used to > -X might expect it to. Since we deprecate -X (and don't even document > it anymore), I thought that making this work would be much more trouble > than it's worth, but perhaps that's open to argument. All the other -X options are flags and supported directly by the getopt machinery. Adding a -X option with an argument would require parsing the argument by hand including dealing with '=' or ' ' as the separator and telling getopt you had eaten an extra argument. This seemed a bit too much code for the value of supporting a deprecated format for a brand new option. Finally, you changed the option value and case from 1 to case 2. getopt_long() only returns the value argument when there is no flag to set, so 1 was unique and could have been read as "the first no-short option with an argument". Thanks for checking this in. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
daveg <daveg@sonic.net> writes: > On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: >> In most cases our policy has been that pg_dumpall should accept and pass >> through any pg_dump option for which it's sensible to do so. I did not >> make that happen but it seems it'd be a reasonable follow-on patch. > I'll remember that next time. Er .. actually that was a direct request for you to do it. > Finally, you changed the option value and case from 1 to case 2. getopt_long > only returns the value argument when there is no flag to set, so 1 was unique > and could have been read as "the first no-short option with an argument". Yeah. The code *worked* as you submitted it, but what was bothering me was that the "val = 1" table entries worked in two completely different ways for the different argument types. I first thought that you'd broken the existing long argument options --- you hadn't, but I had to go re-read the getopt_long source to convince myself of that. So it seemed like using a different "val" value might help clarify the difference in behavior for future readers of the code. In particular the next guy who wants to add a long option with parameter value would certainly not be able to use val = 1; but I thought the code as you had it wouldn't give him any clue what to do. If you've got a better idea about how to deal with that, feel free... regards, tom lane
On Mon, Jul 21, 2008 at 03:43:11AM -0400, Tom Lane wrote: > daveg <daveg@sonic.net> writes: > > On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: > >> In most cases our policy has been that pg_dumpall should accept and pass > >> through any pg_dump option for which it's sensible to do so. I did not > >> make that happen but it seems it'd be a reasonable follow-on patch. > > > I'll remember that next time. > > Er .. actually that was a direct request for you to do it. Attached is a the followon patch for pg_dumpall and docs to match pg_dump. On a second topic, is anyone working on a parallel dump/load? I'd be interested in helping. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.