On Tue, Jun 29, 2010 at 3:55 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jun 15, 2010 at 11:35 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On the other hand, I like immediate-panicking. And I don't want the standby
>> to retry reconnecting the master infinitely.
>
> On second thought, the peremptory PANIC is not good for HA system. If the
> master unfortunately has written an invalid record because of its crash,
> the standby would exit with PANIC before performing a failover.
I don't think that should ever happen. The master only streams WAL
that it has fsync'd. Presumably there's no reason for the master to
ever fsync a partial WAL record (which is usually how a corrupt record
gets into the stream).
> So when an invalid record is found in streamed WAL file, we should keep
> the standby running and leave the decision whether the standby retries to
> connect to the master forever or shuts down right now, up to the user
> (actually, it may be a clusterware)?
Well, if we want to leave it up to the user/clusterware, the current
code is possibly adequate, although there are many different log
messages that could signal this situation, so coding it up might not
be too trivial.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company