Re: "PANIC: cannot make new WAL entries during recovery" in the wild - Mailing list pgsql-hackers

From Tom Lane
Subject Re: "PANIC: cannot make new WAL entries during recovery" in the wild
Date
Msg-id 24668.1249667513@sss.pgh.pa.us
Whole thread Raw
In response to "PANIC: cannot make new WAL entries during recovery" in the wild  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: "PANIC: cannot make new WAL entries during recovery" in the wild  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Today we got a report in the spanish list about the message in $subject.
> The server is 8.4 running on Windows.

I accidentally managed to reproduce this in HEAD just now, by kill -9'ing
a backend that was in the midst of a COPY IN operation (I was trying to
reproduce Neil Best's unrelated issue...)  The server log is

LOG:  server process (PID 23846) was terminated by signal 9
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2009-08-07 11:27:36 EDT
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/1B9D7790
LOG:  unexpected pageaddr 0/1532E000 in log file 0, segment 28, offset 3334144
LOG:  redo done at 0/1C32D200
PANIC:  cannot make new WAL entries during recovery
LOG:  startup process (PID 23883) was terminated by signal 6
LOG:  aborting startup due to startup process failure

and the stack trace of the panic'd startup process looks like

#4  0x4b6e20 in errfinish (dummy=1) at elog.c:503
#5  0x4b86a0 in elog_finish (elevel=1073803952, fmt=0x7b0394b0 "") at elog.c:1142
#6  0x1f722c in XLogInsert (rmid=11 '\013', info=114 'r', rdata=0xc004d07c) at xlog.c:555
#7  0x1df290 in _bt_insertonpg (rel=0x4006cf28, buf=70, stack=0x3, itup=0x4006d150, newitemoff=38,
split_only_page=0)at nbtinsert.c:833
 
#8  0x1e0898 in _bt_insert_parent (rel=0x4006cf28, buf=304, rbuf=854, stack=0x7b03b9d8, is_root=0, is_only=0)   at
nbtinsert.c:1627
#9  0x1ef098 in btree_xlog_cleanup () at nbtxlog.c:927
#10 0x201c44 in StartupXLOG () at xlog.c:5767
#11 0x206134 in StartupProcessMain () at xlog.c:8034
#12 0x228d0c in AuxiliaryProcessMain (argc=2, argv=0x7b03b6d8) at bootstrap.c:433
#13 0x39bb68 in StartChildProcess (type=StartupProcess) at postmaster.c:4243

So that confirms my speculation that btree index cleanup is the source
of the message.  We have two basic approaches to dealing with it:

1. Decide that the check added to XLogInsert is wrong and take it out.

2. Arrange for some sort of explicit state transition between the
WAL-reading and cleanup phases of recovery, and make sure XLogInsert
knows about it.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: Re: Fixing geometic calculation
Next
From: Tom Lane
Date:
Subject: Re: Fixing geometic calculation