Home > mailing lists

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries. - Mailing list pgsql-hackers

From	Marko Tiikkaja
Subject	Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Date	September 21, 2012 16:30:40
Msg-id	505C6BF7.3090808@joh.to Whole thread Raw
In response to	Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries. (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries. (Andres Freund <andres@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On 9/20/12 11:55 PM, Andres Freund wrote:
> On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
>> OK, that explains why we've not seen a blizzard of trouble reports.
>> Still seems like a good idea to fix it ASAP, though.
> Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the
> IRC Channel that had symptoms matching this bug.

Another such user reporting in. :-(

Our slave started accumulating WAL files and ran out of disk space 
yesterday.  After investigation from Andres and Andrew, it turns out 
that we were most likely hit by this very same bug.

Here's what they have to say:
"If the db crashes between logging the split and the parent-node insert, 
then in recovery, since relpersistence is not initialized correctly, 
when the recovery process tries to complete the operation, no xlog 
record is written for the insert.  If there's a slave server, then the 
missing xlog record for the insert means that the slave's 
incomplete_actions queue never becomes empty, therefore the slave can no 
longer do recovery restartpoints."

Some relevant information:

[cur:92/314BC870, xid:76872047, rmid:10(Heap), len/tot_len:91/123, 
info:0, prev:92/314BB890] insert: s/d/r:1663/408841/415746 
blk/off:13904/65 header: t_infomask2 8 t_infomask 2050 t_hoff 24
[cur:92/314BC8F0, xid:76872047, rmid:11(Btree), len/tot_len:702/734, 
info:64, prev:92/314BC870] split_r: s/d/r:1663/408841/475676 leftsib 2896
[cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0, 
prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; tli 1; nextxid 
76872048;  nextoid 764990; nextmulti 62062; nextoffset 132044; shutdown 
at 2012-09-11 14:26:26 CEST

2012-09-11 14:26:26.719 CEST,,,44620,,504f2df2.ae4c,5,,2012-09-11 
14:26:26 CEST,,0,LOG,00000,"redo done at 
92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641",""

And apparently the relpersistence check in RelationNeedsWAL() call in 
_bt_insertonpg had a role in this as well.

Regards,
Marko Tiikkaja

pgsql-hackers by date:

From: Alvaro Herrera
Date: 21 September 2012, 16:20:44
Subject: Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)

From: Michael Paquier
Date: 21 September 2012, 16:32:37
Subject: Re: pg_reorg in core?

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries. - Mailing list pgsql-hackers

Previous

Next