Thread: pg_dump lock timeout

pg_dump lock timeout

From
daveg
Date:
Attached is a patch to add a commandline option to pg_dump to limit how long
pg_dump will wait for locks during startup.

The intent of this patch is to allow pg_dump to fail if a table lock cannot
be taken in a reasonable time. This allows the caller of pg_dump to retry or
otherwise correct the situation, without having locks held for long periods,
and without pg_dump having a long window during which catalog changes can
occur.

It works by setting statement_timeout to the user specified delay during
the startup phase where it is taking access share locks on all the tables.
Once all the locks are taken, it sets statement_timeout back to the default.
If a lock table statement times out, the dump fails with the statement timed
out error.

The orginal motivation was a client who runs heavy batch workloads and uses
truncate table and other DML in long transactions. This has created some
unhappy interaction scenarios with pg_dump:

  - pg_dump ends up waiting hours on a DML table lock that is part of a long
    transaction. Once the lock is released, pg_dump runs only to find
    some table later in the list has been dropped. So pg_dump fails.

  - pg_dump waits on a lock while holding access share locks on most of the
    tables. Other processes that want to do DML wait on pg_dump. After a
    while, large parts of the application are blocked while pg_dump waits
    on locks. Eventually the operations staff notice that pg_dump is
    blocking production and kill the dump.

Please have a look and consider it for merging.

Thanks

-dg

--
David Gould
If simplicity worked, the world would be overrun with insects.

Re: pg_dump lock timeout

From
daveg
Date:
On Sun, May 11, 2008 at 04:30:47AM -0700, daveg wrote:
>
> Attached is a patch to add a commandline option to pg_dump to limit how long
> pg_dump will wait for locks during startup.

Ooops, really attached this time.

-dg


--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Attachment

Re: pg_dump lock timeout

From
David Fetter
Date:
On Sun, May 11, 2008 at 06:00:35AM -0700, David Gould wrote:
> On Sun, May 11, 2008 at 04:30:47AM -0700, daveg wrote:
> >
> > Attached is a patch to add a commandline option to pg_dump to
> > limit how long pg_dump will wait for locks during startup.
>
> Ooops, really attached this time.

Can we see about getting this into the July commitfest?  Dave has
presented a use case complete with logs where having this could have
prevented a failed backup and consequent data loss.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: pg_dump lock timeout

From
"Marko Kreen"
Date:
On 5/11/08, daveg <daveg@sonic.net> wrote:
>  Attached is a patch to add a commandline option to pg_dump to limit how long
>  pg_dump will wait for locks during startup.

My quick review:

- It does not seem important enough to waste a short option on.
  Having only long option should be enough.

- It would be more polite to do SET LOCAL instead SET.
  (Eg. it makes safer to use pg_dump through pooler.)

- The statement_timeout is set back with "statement_timeout = default"
  Maybe it would be better to do "= 0" here?  Although such decision
  would go outside the scope of the patch, I see no sense having
  any other statement_timeout for actual dumping.

--
marko

Re: pg_dump lock timeout

From
daveg
Date:
On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote:
> On 5/11/08, daveg <daveg@sonic.net> wrote:
> >  Attached is a patch to add a commandline option to pg_dump to limit how long
> >  pg_dump will wait for locks during startup.
>
> My quick review:
>
> - It does not seem important enough to waste a short option on.
>   Having only long option should be enough.

Agreed. I'll change it.

> - It would be more polite to do SET LOCAL instead SET.
>   (Eg. it makes safer to use pg_dump through pooler.)

Also agreed. Thanks.

> - The statement_timeout is set back with "statement_timeout = default"
>   Maybe it would be better to do "= 0" here?  Although such decision
>   would go outside the scope of the patch, I see no sense having
>   any other statement_timeout for actual dumping.

I'd prefer to leave whatever policy is otherwise in place alone. I can see
use cases for either having or not having a timeout for pg_dump, but it does
seem  outside the scope of this patch.

Thanks for you review and comments.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: pg_dump lock timeout

From
Tom Lane
Date:
daveg <daveg@sonic.net> writes:
> On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote:
>> - The statement_timeout is set back with "statement_timeout = default"
>> Maybe it would be better to do "= 0" here?  Although such decision
>> would go outside the scope of the patch, I see no sense having
>> any other statement_timeout for actual dumping.

> I'd prefer to leave whatever policy is otherwise in place alone.

The policy in place in CVS HEAD is that pg_dump explicitly sets
statement_timeout to 0.  Setting it to default would break that,
and will certainly not be accepted.

            regards, tom lane

Re: pg_dump lock timeout

From
daveg
Date:
On Thu, Jul 03, 2008 at 05:55:01AM -0700, daveg wrote:
> On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote:
> > On 5/11/08, daveg <daveg@sonic.net> wrote:
> > >  Attached is a patch to add a commandline option to pg_dump to limit how long
> > >  pg_dump will wait for locks during startup.
> >
> > My quick review:
> >
> > - It does not seem important enough to waste a short option on.
> >   Having only long option should be enough.
>
> Agreed. I'll change it.
>
> > - It would be more polite to do SET LOCAL instead SET.
> >   (Eg. it makes safer to use pg_dump through pooler.)
>
> Also agreed. Thanks.

On second glance, pg_dump sets lots of variables without using SET LOCAL.
I think fixing that must be the subject of a separate patch as fixing just
one of many will only cause confusion.

> > - The statement_timeout is set back with "statement_timeout = default"
> >   Maybe it would be better to do "= 0" here?  Although such decision
> >   would go outside the scope of the patch, I see no sense having
> >   any other statement_timeout for actual dumping.
>
> I'd prefer to leave whatever policy is otherwise in place alone. I can see
> use cases for either having or not having a timeout for pg_dump, but it does
> seem  outside the scope of this patch.

As it happens, another patch has set the policy to "statement_timeout = 0",
so I will follow that.

I'm sending in the revised patch today.

-dg


--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: pg_dump lock timeout

From
daveg
Date:
Here is an updated version of this patch against head. It builds, runs and
functions as expected. I did not build the sgml.

I've made changes based on various comments as follows:

  - use WAIT_TIME in description consistantly. Reworded for clarity.
    (Stephan Frost)

  - Use a separate query buffer in getTables() (Stephan Frost)

  - sets statement_timeout=0 afterwards per new policy (Tom Lane, Marko Kreen)

  - has only --long-option to conserve short option letters (Marko Kreen)

Regards

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Attachment

Re: pg_dump lock timeout

From
Tom Lane
Date:
daveg <daveg@sonic.net> writes:
> Here is an updated version of this patch against head. It builds, runs and
> functions as expected. I did not build the sgml.

Applied with mostly minor cosmetic improvements --- the only actual
error I noticed was failing to check whether the server version supports
statement_timeout.

In most cases our policy has been that pg_dumpall should accept and pass
through any pg_dump option for which it's sensible to do so. I did not
make that happen but it seems it'd be a reasonable follow-on patch.

A minor point is that the syntax "-X lock-wait-timeout=n" or
"-X lock-wait-timeout n" won't work, although perhaps people used to
-X might expect it to.  Since we deprecate -X (and don't even document
it anymore), I thought that making this work would be much more trouble
than it's worth, but perhaps that's open to argument.

            regards, tom lane

Re: pg_dump lock timeout

From
daveg
Date:
On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > Here is an updated version of this patch against head. It builds, runs and
> > functions as expected. I did not build the sgml.
>
> Applied with mostly minor cosmetic improvements --- the only actual
> error I noticed was failing to check whether the server version supports
> statement_timeout.

I chose not to test backend version on the grounds that getting an explicit
failure for an explicitly requested option would be preferable to it being
silently ignored. However if the user is trying to use the same scripts for
many versions then ignoring unsupported but unessential features may be
preferred.

One of the cosmetic changes made in response to other reviewers was to
not reuse lockquery, instead to have a separate query buffer. You have
reversed that and eliminated lockquery too. Which seems better.

> In most cases our policy has been that pg_dumpall should accept and pass
> through any pg_dump option for which it's sensible to do so. I did not
> make that happen but it seems it'd be a reasonable follow-on patch.

I'll remember that next time.

> A minor point is that the syntax "-X lock-wait-timeout=n" or
> "-X lock-wait-timeout n" won't work, although perhaps people used to
> -X might expect it to.  Since we deprecate -X (and don't even document
> it anymore), I thought that making this work would be much more trouble
> than it's worth, but perhaps that's open to argument.

All the other -X options are flags and supported directly by the getopt
machinery. Adding a -X option with an argument would require parsing
the argument by hand including dealing with '=' or ' ' as the separator and
telling getopt you had eaten an extra argument. This seemed a bit too much
code for the value of supporting a deprecated format for a brand new option.

Finally, you changed the option value and case from 1 to case 2. getopt_long()
only returns the value argument when there is no flag to set, so 1 was unique
and could have been read as "the first no-short option with an argument".

Thanks for checking this in.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: pg_dump lock timeout

From
Tom Lane
Date:
daveg <daveg@sonic.net> writes:
> On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
>> In most cases our policy has been that pg_dumpall should accept and pass
>> through any pg_dump option for which it's sensible to do so. I did not
>> make that happen but it seems it'd be a reasonable follow-on patch.

> I'll remember that next time.

Er .. actually that was a direct request for you to do it.

> Finally, you changed the option value and case from 1 to case 2. getopt_long
> only returns the value argument when there is no flag to set, so 1 was unique
> and could have been read as "the first no-short option with an argument".

Yeah.  The code *worked* as you submitted it, but what was bothering me
was that the "val = 1" table entries worked in two completely different
ways for the different argument types.  I first thought that you'd
broken the existing long argument options --- you hadn't, but I had to
go re-read the getopt_long source to convince myself of that.  So it
seemed like using a different "val" value might help clarify the
difference in behavior for future readers of the code.  In particular
the next guy who wants to add a long option with parameter value would
certainly not be able to use val = 1; but I thought the code as you
had it wouldn't give him any clue what to do.  If you've got a better
idea about how to deal with that, feel free...

            regards, tom lane

Re: pg_dump lock timeout

From
daveg
Date:
On Mon, Jul 21, 2008 at 03:43:11AM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
> >> In most cases our policy has been that pg_dumpall should accept and pass
> >> through any pg_dump option for which it's sensible to do so. I did not
> >> make that happen but it seems it'd be a reasonable follow-on patch.
>
> > I'll remember that next time.
>
> Er .. actually that was a direct request for you to do it.


Attached is a the followon patch for pg_dumpall and docs to match pg_dump.

On a second topic, is anyone working on a parallel dump/load? I'd be
interested in helping.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Attachment