Re: 9.2 recovery/startup problems - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: 9.2 recovery/startup problems
Date
Msg-id CAMkU=1zAern2uby+fYveXrO-HY3cfS_uyv8SmBCMNipBXSOiUg@mail.gmail.com
Whole thread Raw
In response to Re: 9.2 recovery/startup problems  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: 9.2 recovery/startup problems  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Dec 2, 2014 at 7:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 26, 2014 at 7:13 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> If I do a pg_ctl stop -mf, then both files go away.  If I do a pg_ctl stop
> -mi, then neither goes away.  It is only with the /sbin/reboot that I get
> the fatal combination of _init being gone but the other still present.

Eh?  That sounds wonky.

I mean, reboot normally kills processes with SIGTERM or SIGKILL, in
which case I'd expect the outcome to match what you get with pg_ctl
stop -mf or pg_ctl stop -mi.  The only way I can see that you'd get a
different behavior is if you did a hard reboot (like echo b >
/proc/sysrq-trigger); if that changes things, then we might have a
missing-fsync bug.  How is that reboot managing to leave the main fork
behind while losing the init fork?

During abort processing after getting a SIGTERM, the back end truncates 59288 to zero size, and unlinks all the other files (including 59288_init).  The actual removal of 59288 is left until the checkpoint.  So if you SIGTERM the backend, then take down the server uncleanly before the next checkpoint completes, you are left with just 59288.

Here is the strace:

open("base/16416/59288", O_RDWR)        = 8
ftruncate(8, 0)                         = 0
close(8)                                = 0
unlink("base/16416/59288.1")            = -1 ENOENT (No such file or directory)
unlink("base/16416/59288_fsm")          = -1 ENOENT (No such file or directory)
unlink("base/16416/59288_vm")           = -1 ENOENT (No such file or directory)
unlink("base/16416/59288_init")         = 0
unlink("base/16416/59288_init.1")       = -1 ENOENT (No such file or directory)

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] HINT: pg_hba.conf changed since last config reload
Next
From: Robert Haas
Date:
Subject: Re: why is PG_AUTOCONF_FILENAME is pg_config_manual.h?