Re: warning message in standby - Mailing list pgsql-hackers

From Robert Haas
Subject Re: warning message in standby
Date
Msg-id AANLkTileghfzSzpHH8BFdCJNBFRCOelutfBQm6OkcarA@mail.gmail.com
Whole thread Raw
In response to Re: warning message in standby  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: warning message in standby  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Mon, Jun 14, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jun 14, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> That's a different question altogether ;-).  I assume you're not
>>> satisfied by the change Heikki committed a couple hours ago?
>>> It will at least try to do something to recover.
>
>> Yeah, I'm not satisfied by that.  It's an improvement in the technical
>> sense - it replaces an infinite retry that spins at top speed with a
>> slower retry that won't flog your CPU quite so badly, but the chances
>> that it will actually succeed in correcting the underlying problem
>> seem infinitesimal.
>
> I'm not sure about that.  walreceiver will refetch from the start of the
> current WAL page, so there's at least some chance of getting a good copy
> when we didn't have one before.

The testing that I have been doing while we've been discussing this
reveals that you are correct.  I set up an HS/SR master and slave
(running on the same machine), ran pgbench on the master, and then
started randomly sending SIGSEGV to one of the master's backends.  It
seems that complaints about the WAL are possible on both master and
slave.  Here are a couple from the slave:

LOG:  unexpected pageaddr 0/89B7A000 in log file 0, segment 152, offset 12034048
WARNING:  there is no contrecord flag in log file 0, segment 136, offset 2523136
LOG:  invalid magic number 0000 in log file 0, segment 136, offset 2531328

The slave reconnects and then things get better.  So I think your idea
of retrying once and then panicking is probably best.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposal for 9.1: WAL streaming from WAL buffers
Next
From: Tom Lane
Date:
Subject: Re: warning message in standby