Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries. - Mailing list pgsql-hackers

From Marko Tiikkaja
Subject Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Date
Msg-id 505C6BF7.3090808@joh.to
Whole thread Raw
In response to Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 9/20/12 11:55 PM, Andres Freund wrote:
> On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
>> OK, that explains why we've not seen a blizzard of trouble reports.
>> Still seems like a good idea to fix it ASAP, though.
> Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the
> IRC Channel that had symptoms matching this bug.

Another such user reporting in. :-(

Our slave started accumulating WAL files and ran out of disk space 
yesterday.  After investigation from Andres and Andrew, it turns out 
that we were most likely hit by this very same bug.

Here's what they have to say:
"If the db crashes between logging the split and the parent-node insert, 
then in recovery, since relpersistence is not initialized correctly, 
when the recovery process tries to complete the operation, no xlog 
record is written for the insert.  If there's a slave server, then the 
missing xlog record for the insert means that the slave's 
incomplete_actions queue never becomes empty, therefore the slave can no 
longer do recovery restartpoints."

Some relevant information:

[cur:92/314BC870, xid:76872047, rmid:10(Heap), len/tot_len:91/123, 
info:0, prev:92/314BB890] insert: s/d/r:1663/408841/415746 
blk/off:13904/65 header: t_infomask2 8 t_infomask 2050 t_hoff 24
[cur:92/314BC8F0, xid:76872047, rmid:11(Btree), len/tot_len:702/734, 
info:64, prev:92/314BC870] split_r: s/d/r:1663/408841/475676 leftsib 2896
[cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0, 
prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; tli 1; nextxid 
76872048;  nextoid 764990; nextmulti 62062; nextoffset 132044; shutdown 
at 2012-09-11 14:26:26 CEST

2012-09-11 14:26:26.719 CEST,,,44620,,504f2df2.ae4c,5,,2012-09-11 
14:26:26 CEST,,0,LOG,00000,"redo done at 
92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641",""

And apparently the relpersistence check in RelationNeedsWAL() call in 
_bt_insertonpg had a role in this as well.



Regards,
Marko Tiikkaja



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)
Next
From: Michael Paquier
Date:
Subject: Re: pg_reorg in core?