Thread: Improve shutdown during online backup

Improve shutdown during online backup

From
"Albe Laurenz"
Date:
This follows up on the discussion in
http://archives.postgresql.org/pgsql-hackers/2008-03/msg01033.php

- pg_ctl will refuse a smart shutdown during online backup.
- The postmaster will also refuse to shutdown in smart mode
  in that case and log a message to that effect.
- In fast shutdown mode, the server will rename "backup_label"
  after successfully shutting down and log the fact.

Yours,
Laurenz Albe

Attachment

Re: Improve shutdown during online backup

From
Simon Riggs
Date:
On Tue, 2008-04-01 at 15:34 +0200, Albe Laurenz wrote:
> This follows up on the discussion in
> http://archives.postgresql.org/pgsql-hackers/2008-03/msg01033.php
>
> - pg_ctl will refuse a smart shutdown during online backup.
> - The postmaster will also refuse to shutdown in smart mode
>   in that case and log a message to that effect.
> - In fast shutdown mode, the server will rename "backup_label"
>   after successfully shutting down and log the fact.

Looks good.


Few comments:

* smart shutdown waits for sessions to complete, yet this just ignores
smart shutdowns which is something a little different. I think we
should wait for the backup to complete and then shutdown.

* when we say "online backup cancelled" I think we should say something
more like "online backup mode cancelled". All we are doing is removing
the backup label file, we're not actually cancelling the physical backup
since it is external to the database anyway.

* The #defines at top of postmaster.c are duplicated from xlog.c
If we can't agree on a common header file then we should at least add a
comment to mention they are duplicated (in both locations).

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


Re: Improve shutdown during online backup

From
Simon Riggs
Date:
On Tue, 2008-04-01 at 17:42 +0100, Simon Riggs wrote:

> Few comments:
>
> * smart shutdown waits for sessions to complete, yet this just ignores
> smart shutdowns which is something a little different. I think we
> should wait for the backup to complete and then shutdown.

> * The #defines at top of postmaster.c are duplicated from xlog.c
> If we can't agree on a common header file then we should at least add a
> comment to mention they are duplicated (in both locations).

If we add a function called something like BackupInProgress() to xlog.c,
exported via miscadmin.h then we can use it within the
PostmasterStateMachine() function like this

    if (pmState == PM_WAIT_BACKENDS)
    {
        if (CountChildren() == 0 &&
            StartupPID == 0 &&
            (BgWriterPID == 0 || !FatalError) &&
            WalWriterPID == 0 &&
            AutoVacPID == 0 &&
            !BackupInProgress())   <---- new line

so that the postmaster doesn't need to know about how we do backups.

That way you don't need any of the special cases in your patch, nor is
there any need to duplicate the #defines.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


Re: Improve shutdown during online backup

From
"Albe Laurenz"
Date:
Simon Riggs wrote:
>> Few comments:
>>
>> * smart shutdown waits for sessions to complete, yet this just ignores
>> smart shutdowns which is something a little different. I think we
>> should wait for the backup to complete and then shutdown.

That would be more consistent, I agree.

I'll undo my changes to pg_ctl as well, as they make no more sense then.

>> * The #defines at top of postmaster.c are duplicated from xlog.c
>> If we can't agree on a common header file then we should at least add a
>> comment to mention they are duplicated (in both locations).
>
> If we add a function called something like BackupInProgress()
> to xlog.c,
> exported via miscadmin.h then we can use it within the
> PostmasterStateMachine() function like this
>
>     if (pmState == PM_WAIT_BACKENDS)
>     {
>         if (CountChildren() == 0 &&
>             StartupPID == 0 &&
>             (BgWriterPID == 0 || !FatalError) &&
>             WalWriterPID == 0 &&
>             AutoVacPID == 0 &&
>             !BackupInProgress())   <---- new line
>
> so that the postmaster doesn't need to know about how we do backups.
>
> That way you don't need any of the special cases in your patch, nor is
> there any need to duplicate the #defines.

I realized that duplicating the #defines was ugly, and will do it
like that.

Thanks for the hints.

Yours,
Laurenz Albe

Re: Improve shutdown during online backup

From
"Albe Laurenz"
Date:
Simon Riggs wrote:
>> Few comments:
>>
>> * smart shutdown waits for sessions to complete, yet this just ignores
>> smart shutdowns which is something a little different. I think we
>> should wait for the backup to complete and then shutdown.
>
> If we add a function called something like BackupInProgress() to xlog.c,
> exported via miscadmin.h then we can use it within the
> PostmasterStateMachine() function like this
>
>     if (pmState == PM_WAIT_BACKENDS)
>     {
>         if (CountChildren() == 0 &&
>             StartupPID == 0 &&
>             (BgWriterPID == 0 || !FatalError) &&
>             WalWriterPID == 0 &&
>             AutoVacPID == 0 &&
>             !BackupInProgress())   <---- new line
>
> so that the postmaster doesn't need to know about how we do backups.
>
> That way you don't need any of the special cases in your patch, nor is
> there any need to duplicate the #defines.

I looked at that, and it won't work, for these reasons:

PostmasterStateMachine() is called once after a smart shutdown.
If there are children or a backup is in progress, pmState will remain
PM_WAIT_BACKENDS.

Now whenever a child exits, the reaper() will be called, which in turn
calls PostmasterStateMachine() again and advances pmState if appropriate.
This won't work for backups though, because removal of backup_label will
not send a SIGCHLD to the postmaster.

Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
and nobody can connect and call pg_stop_backup().
So even if I'd add a check for
(pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
ServerLoop(), it wouldn't do much good, because the only way for somebody
to cancel online backup mode would be to manually remove the file.

So the only reasonable thing to do on smart shutdown during an online
backup is to have the shutdown request fail, right? The only alternative being
that a smart shutdown request should interrupt online backup mode.

So - unless you point out a flaw in my reasoning - I'll implement it
that way, but will put all code that handles backup_label files into
xlog.c.

Yours,
Laurenz Albe

Re: Improve shutdown during online backup

From
Heikki Linnakangas
Date:
Albe Laurenz wrote:
> Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
> and nobody can connect and call pg_stop_backup().
> So even if I'd add a check for
> (pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
> ServerLoop(), it wouldn't do much good, because the only way for somebody
> to cancel online backup mode would be to manually remove the file.

Good point.

> So the only reasonable thing to do on smart shutdown during an online
> backup is to have the shutdown request fail, right? The only alternative being
> that a smart shutdown request should interrupt online backup mode.

Or we can add another state, PM_WAIT_BACKUP, before PM_WAIT_BACKENDS,
that allows new connections, and waits until the backup ends.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Improve shutdown during online backup

From
"Albe Laurenz"
Date:
[what should happen if a smart shutdown request is received during online backup mode?
 I'll cc: the hackers list, maybe others have something to say to this]

Heikki Linnakangas wrote:
> Albe Laurenz wrote:
>> Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
>> and nobody can connect and call pg_stop_backup().
>> So even if I'd add a check for
>> (pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
>> ServerLoop(), it wouldn't do much good, because the only way for somebody
>> to cancel online backup mode would be to manually remove the file.
>
> Good point.
>
>> So the only reasonable thing to do on smart shutdown during an online
>> backup is to have the shutdown request fail, right? The only alternative being
>> that a smart shutdown request should interrupt online backup mode.
>
> Or we can add another state, PM_WAIT_BACKUP, before PM_WAIT_BACKENDS,
> that allows new connections, and waits until the backup ends.

That's an option. Maybe it is possible to restrict connections to superusers
(who are the only ones who can call pg_stop_backup() anyway).

Or, we could allow superuser connections in state PM_WAIT_BACKENDS...

Opinions?

Yours,
Laurenz Albe

Re: Improve shutdown during online backup

From
Simon Riggs
Date:
On Tue, 2008-04-08 at 09:16 +0200, Albe Laurenz wrote:

> Heikki Linnakangas wrote:
> > Albe Laurenz wrote:
> >> Moreover, if Shutdown == SmartShutdown, new connections won't be accepted,
> >> and nobody can connect and call pg_stop_backup().
> >> So even if I'd add a check for
> >> (pmState == PM_WAIT_BACKENDS) && !BackupInProgress() somewhere in the
> >> ServerLoop(), it wouldn't do much good, because the only way for somebody
> >> to cancel online backup mode would be to manually remove the file.
> >
> > Good point.
> >
> >> So the only reasonable thing to do on smart shutdown during an online
> >> backup is to have the shutdown request fail, right? The only alternative being
> >> that a smart shutdown request should interrupt online backup mode.
> >
> > Or we can add another state, PM_WAIT_BACKUP, before PM_WAIT_BACKENDS,
> > that allows new connections, and waits until the backup ends.
>
> That's an option. Maybe it is possible to restrict connections to superusers
> (who are the only ones who can call pg_stop_backup() anyway).
>
> Or, we could allow superuser connections in state PM_WAIT_BACKENDS...

That sounds right.

Completely unrelated to backups, if you issue a smart shutdown and it
doesn't, you probably would like to connect and see what is happening
and why. The reason may not be a backup-in-progress.

Personally, I think "smart" shutdown could be even smarter. It should
kick off unwanted sessions, such as an idle pgAdmin session - maybe a
rule like "anything that has been idle for >30 seconds".

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


Re: Improve shutdown during online backup

From
Gregory Stark
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> Personally, I think "smart" shutdown could be even smarter. It should
> kick off unwanted sessions, such as an idle pgAdmin session - maybe a
> rule like "anything that has been idle for >30 seconds".

That's not a bad idea in itself but I don't think it's something the server
should be in the business of doing. One big reason is that the server
shouldn't be imposing arbitrary policy. That should be something the person
running the shutdown is in control over.

What you could do is have a separate program (I would write a client but a
server-side function would work too) to kick off users based on various
criteria you can specify.

Then you can put in your backup scripts two commands, one to kick off idle
users and then do a smart shutdown.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!