Re: warning message in standby - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: warning message in standby
Date
Msg-id 4C110C41.7030607@enterprisedb.com
Whole thread Raw
In response to Re: warning message in standby  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: warning message in standby
Re: warning message in standby
Re: warning message in standby
List pgsql-hackers
On 10/06/10 17:38, Tom Lane wrote:
> Robert Haas<robertmhaas@gmail.com>  writes:
>> On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>>> When an error is found in the WAL streamed from the master, a warning
>>> message is repeated without interval forever in the standby. This
>>> consumes CPU load very much, and would interfere with read-only queries.
>>> To fix this problem, we should add a sleep into emode_for_corrupt_record()
>>> or somewhere? Or we should stop walreceiver and retry to read WAL from
>>> pg_xlog or the archive?
>
>> I ran into this problem at one point, too, but was in the middle of
>> trying to investigate a different bug and didn't have time to track
>> down what was causing it.
>
>> I think the basic question here is - if there's an error in the WAL,
>> how do we expect to EVER recover?  Even if we can read from the
>> archive or pg_xlog, presumably it's the same WAL - why should we be
>> any more successful the second time?
>
> What "warning message" are we talking about?  All the error cases I can
> think of in WAL-application are ERROR, or likely even PANIC.

We're talking about a corrupt record (incorrect CRC, incorrect backlink 
etc.), not errors within redo functions. During crash recovery, a 
corrupt record means you've reached end of WAL. In standby mode, when 
streaming WAL from master, that shouldn't happen, and it's not clear 
what to do if it does. PANIC is not a good idea, at least if the server 
uses hot standby, because that only makes the situation worse from 
availability point of view. So we log the error as a WARNING, and keep 
retrying. It's unlikely that the problem will just go away, but we keep 
retrying anyway in the hope that it does. However, it seems that we're 
too aggressive with the retries.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: How about closing some Open Items?
Next
From: Robert Haas
Date:
Subject: Re: warning message in standby