Home > mailing lists

WAL recovery is broken by FSM patch - Mailing list pgsql-hackers

From	Tom Lane
Subject	WAL recovery is broken by FSM patch
Date	September 30, 2008 19:52:52
Msg-id	27934.1222815135@sss.pgh.pa.us Whole thread Raw
Responses	Re: WAL recovery is broken by FSM patch
List	pgsql-hackers

Tree view

I just managed to make a backend dump core while fooling with the CTE
patch, and found out that the system failed to recover, because the
ensuing startup process *also* dumped core.  Here's the backtrace:

Core was generated by `postgres: startup'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000048df59 in XLogInsert (rmid=2 '\002', info=32 ' ',    rdata=0x7fff41713550) at xlog.c:813
813             record->xl_prev = Insert->PrevRecord;
(gdb) bt
#0  0x000000000048df59 in XLogInsert (rmid=2 '\002', info=32 ' ',    rdata=0x7fff41713550) at xlog.c:813
#1  0x00000000005ec8d0 in smgrtruncate (reln=0x206a148, forknum=FSM_FORKNUM,    nblocks=3, isTemp=0 '\0') at
smgr.c:594
#2  0x00000000005dc194 in FreeSpaceMapTruncateRel (rel=0x2072050, nblocks=15)   at freespace.c:275
#3  0x00000000005dc2ee in fsm_redo (lsn=<value optimized out>,    record=<value optimized out>) at freespace.c:779
#4  0x000000000049003f in StartupXLOG () at xlog.c:5146
#5  0x00000000004a9cd8 in AuxiliaryProcessMain (argc=2, argv=0x7fff41713790)   at bootstrap.c:420
#6  0x00000000005bd24d in StartChildProcess (type=StartupProcess)   at postmaster.c:4074
#7  0x00000000005c053f in PostmasterStateMachine () at postmaster.c:2737
#8  0x00000000005c0965 in reaper (postgres_signal_arg=<value optimized out>)   at postmaster.c:2325
#9  <signal handler called>
#10 0x0000003f71edcbb3 in __select_nocancel () from /lib64/libc.so.6
#11 0x00000000006ce41a in pg_usleep (microsec=<value optimized out>)   at pgsleep.c:43
#12 0x00000000005bed05 in ServerLoop () at postmaster.c:1232
#13 0x00000000005bf99a in PostmasterMain (argc=3, argv=0x203a890)   at postmaster.c:1031
#14 0x0000000000568fd8 in main (argc=3, argv=0x203a890) at main.c:188

We should of course not be attempting XLogInsert during WAL replay.
Now smgr_redo knows about that.  I rather wonder why fsm_redo is
attempting to call smgrtruncate at all, seeing that there's presumably
smgr's own redo record to tell it to deal with that.  I think that all
fsm_redo need do is clear out the last untruncated block of FSM.
        regards, tom lane

pgsql-hackers by date:

From: Simon Riggs
Date: 30 September 2008, 19:51:22
Subject: Infrastructure changes for recovery (v8)

From: "Gurjeet Singh"
Date: 30 September 2008, 21:44:48
Subject: Re: FSM rewrite committed, loose ends

WAL recovery is broken by FSM patch - Mailing list pgsql-hackers

Previous

Next