Re: 9.2 recovery/startup problems - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: 9.2 recovery/startup problems
Date
Msg-id CAMkU=1xE+V=uYYqdO3p2bj=_FGLfJyDPFcDNAabWL7VvpkCGDw@mail.gmail.com
Whole thread Raw
In response to Re: 9.2 recovery/startup problems  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: 9.2 recovery/startup problems  (Andres Freund <andres@2ndquadrant.com>)
Re: 9.2 recovery/startup problems  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Wed, Nov 26, 2014 at 5:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jeff Janes wrote:

> This is what I get in the log from the attempted restart:
>
> PST LOG:  database system was interrupted; last known up at 2014-11-25
> 15:40:33 PST
> PST LOG:  database system was not properly shut down; automatic recovery in
> progress
> PST LOG:  redo starts at 84/EF000080
> PST LOG:  record with zero length at 84/EF09AE18
> PST LOG:  redo done at 84/EF09AD28
> PST LOG:  last completed transaction was at log time 2014-11-25
> 15:42:09.173599-08
> PST LOG:  checkpoint starting: end-of-recovery immediate
> PST LOG:  checkpoint complete: wrote 103 buffers (0.2%); 0 transaction log
> file(s) added, 246 removed, 7 recycled; write=0.002 s, sync=0.020 s,
> total=0.526 s; sync files=51, longest=0.003 s, average=0.000 s
> PST FATAL:  could not create file "base/16416/59288": File exists

Maybe fire up pg_xlogdump to see what xlog record references that relfilenode.

pg_xlogdump doesn't exist yet in 9.2  (or can I use a newer one against the older files?).

Immediately after the reboot, base/16416/59288 exists but is empty.

After the failed start up attempt, base/16416/59288 and base/16416/59288_init both exist and are empty, with base/16416/59288 having its pre-startup-attempt timestamp.

So it seems like the code that copies the init fork over the main fork at the end of crash recovery is doing something wrong in this case.  But I don't understand why it then succeeds at starting up the next time I try it.


If I use Snaga's xlogdump (which I'm not sure is entirely correct under 9.2), the end of the stream looks like this:

[cur:85/1008BD88, xid:4547590, rmid:10(Heap), len/tot_len:174/206, info:64, prev:85/1008BD58] hot_update: s/d/r:1663/16416/12636 block 16 off 41 to block 16 off 43
[cur:85/1008BE58, xid:4547590, rmid:2(Storage), len/tot_len:16/48, info:16, prev:85/1008BD88] create rel: s/d/r:1663/16416/59288
[cur:85/1008BE88, xid:4547590, rmid:8(Standby), len/tot_len:16/48, info:0, prev:85/1008BE58] standby
[page:70, xlp_info:5, xlp_tli:2, xlp_pageaddr:85/1008C000] XLP_FIRST_IS_CONTRECORD XLP_BKP_REMOVABLE
[cur:85/1008BEB8, xid:4547590, rmid:10(Heap), len/tot_len:28/7160, info:72, prev:85/1008BE88] hot_update: s/d/r:1663/16416/12636 block 30 off 32 to block 30 off 33
[cur:85/1008BEB8, xid:4547590, rmid:10(Heap), len/tot_len:28/7160, info:72, prev:85/1008BE88] bkpblock[1]: s/d/r:1663/16416/12636 blk:30 hole_off/len:156/1116
Bogus page magic number 0000 at offset 8E000

Thanks,

Jeff

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: GSSAPI, SSPI - include_realm default
Next
From: Peter Eisentraut
Date:
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch