I just managed to make a backend dump core while fooling with the CTE
patch, and found out that the system failed to recover, because the
ensuing startup process *also* dumped core. Here's the backtrace:
Core was generated by `postgres: startup'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000048df59 in XLogInsert (rmid=2 '\002', info=32 ' ', rdata=0x7fff41713550) at xlog.c:813
813 record->xl_prev = Insert->PrevRecord;
(gdb) bt
#0 0x000000000048df59 in XLogInsert (rmid=2 '\002', info=32 ' ', rdata=0x7fff41713550) at xlog.c:813
#1 0x00000000005ec8d0 in smgrtruncate (reln=0x206a148, forknum=FSM_FORKNUM, nblocks=3, isTemp=0 '\0') at
smgr.c:594
#2 0x00000000005dc194 in FreeSpaceMapTruncateRel (rel=0x2072050, nblocks=15) at freespace.c:275
#3 0x00000000005dc2ee in fsm_redo (lsn=<value optimized out>, record=<value optimized out>) at freespace.c:779
#4 0x000000000049003f in StartupXLOG () at xlog.c:5146
#5 0x00000000004a9cd8 in AuxiliaryProcessMain (argc=2, argv=0x7fff41713790) at bootstrap.c:420
#6 0x00000000005bd24d in StartChildProcess (type=StartupProcess) at postmaster.c:4074
#7 0x00000000005c053f in PostmasterStateMachine () at postmaster.c:2737
#8 0x00000000005c0965 in reaper (postgres_signal_arg=<value optimized out>) at postmaster.c:2325
#9 <signal handler called>
#10 0x0000003f71edcbb3 in __select_nocancel () from /lib64/libc.so.6
#11 0x00000000006ce41a in pg_usleep (microsec=<value optimized out>) at pgsleep.c:43
#12 0x00000000005bed05 in ServerLoop () at postmaster.c:1232
#13 0x00000000005bf99a in PostmasterMain (argc=3, argv=0x203a890) at postmaster.c:1031
#14 0x0000000000568fd8 in main (argc=3, argv=0x203a890) at main.c:188
We should of course not be attempting XLogInsert during WAL replay.
Now smgr_redo knows about that. I rather wonder why fsm_redo is
attempting to call smgrtruncate at all, seeing that there's presumably
smgr's own redo record to tell it to deal with that. I think that all
fsm_redo need do is clear out the last untruncated block of FSM.
regards, tom lane