Re: 9.2 recovery/startup problems - Mailing list pgsql-hackers

From Andres Freund
Subject Re: 9.2 recovery/startup problems
Date
Msg-id 20141126210328.GA19004@alap3.anarazel.de
Whole thread Raw
In response to Re: 9.2 recovery/startup problems  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
Hi,

On 2014-11-26 11:29:09 -0800, Jeff Janes wrote:
> On Wed, Nov 26, 2014 at 5:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> 
> > Jeff Janes wrote:
> >
> > > This is what I get in the log from the attempted restart:
> > >
> > > PST LOG:  database system was interrupted; last known up at 2014-11-25
> > > 15:40:33 PST
> > > PST LOG:  database system was not properly shut down; automatic recovery
> > in
> > > progress
> > > PST LOG:  redo starts at 84/EF000080
> > > PST LOG:  record with zero length at 84/EF09AE18
> > > PST LOG:  redo done at 84/EF09AD28
> > > PST LOG:  last completed transaction was at log time 2014-11-25
> > > 15:42:09.173599-08
> > > PST LOG:  checkpoint starting: end-of-recovery immediate
> > > PST LOG:  checkpoint complete: wrote 103 buffers (0.2%); 0 transaction
> > log
> > > file(s) added, 246 removed, 7 recycled; write=0.002 s, sync=0.020 s,
> > > total=0.526 s; sync files=51, longest=0.003 s, average=0.000 s
> > > PST FATAL:  could not create file "base/16416/59288": File exists

Any chance you can reproduce this in a reproducible fashion?

> > Maybe fire up pg_xlogdump to see what xlog record references that
> > relfilenode.
> >
> 
> pg_xlogdump doesn't exist yet in 9.2  (or can I use a newer one against the
> older files?).

Nope, you can't :(

> Immediately after the reboot, base/16416/59288 exists but is empty.
> 
> After the failed start up attempt, base/16416/59288 and
> base/16416/59288_init both exist and are empty, with base/16416/59288
> having its pre-startup-attempt timestamp.
> 
> So it seems like the code that copies the init fork over the main fork at
> the end of crash recovery is doing something wrong in this case.  But I
> don't understand why it then succeeds at starting up the next time I try it.
> 
> 
> If I use Snaga's xlogdump (which I'm not sure is entirely correct under
> 9.2), the end of the stream looks like this:
> 
> [cur:85/1008BD88, xid:4547590, rmid:10(Heap), len/tot_len:174/206, info:64,
> prev:85/1008BD58] hot_update: s/d/r:1663/16416/12636 block 16 off 41 to
> block 16 off 43
> [cur:85/1008BE58, xid:4547590, rmid:2(Storage), len/tot_len:16/48, info:16,
> prev:85/1008BD88] create rel: s/d/r:1663/16416/59288
> [cur:85/1008BE88, xid:4547590, rmid:8(Standby), len/tot_len:16/48, info:0,
> prev:85/1008BE58] standby
> [page:70, xlp_info:5, xlp_tli:2, xlp_pageaddr:85/1008C000]
> XLP_FIRST_IS_CONTRECORD XLP_BKP_REMOVABLE
> [cur:85/1008BEB8, xid:4547590, rmid:10(Heap), len/tot_len:28/7160, info:72,
> prev:85/1008BE88] hot_update: s/d/r:1663/16416/12636 block 30 off 32 to
> block 30 off 33
> [cur:85/1008BEB8, xid:4547590, rmid:10(Heap), len/tot_len:28/7160, info:72,
> prev:85/1008BE88] bkpblock[1]: s/d/r:1663/16416/12636 blk:30
> hole_off/len:156/1116

Hm. It looks like the problem here might be an ordering one. Notice that
there seems to be a record for the main relation, but not the init fork
in the excerpt.

It wonder if ExcecuteTruncate() does things in the wrong order.        /*         * Need the full transaction-safe
pushups.        *         * Create a new empty storage file for the relation, and assign it         * as the
relfilenodevalue. The old storage file is scheduled for         * deletion at commit.         */
RelationSetNewRelfilenode(rel,RecentXmin, minmulti);        if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
          heap_create_init_fork(rel);
 

Arguably that can cause problems if the node is promoted between the
RelationSetNewRelfilenode() and the heap_create_init_fork(). On the
other hand, the fork should essentially be 'invisible' in that case as
the transaction won't commit...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: bug in json_to_record with arrays
Next
From: Andrew Dunstan
Date:
Subject: Re: bug in json_to_record with arrays