On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> I wonder if it would be possible to jigger things so that we send the
>> WAL to the standby as soon as it is generated, but somehow arrange
>> things so that the standby knows the last location that the master has
>> fsync'd and never applies beyond that point.
>
> I can't think of any way which would not require major engineering. And
> you'd be slowing down replication *in general* to deal with a fairly
> unlikely corner case.
>
> I think the panic is the way to go.
I have yet to convince myself of how likely this is to occur. I tried
to reproduce this issue by crashing the database, but I think in 9.0
you need an actual operating system crash to cause this problem, and I
haven't yet set up an environment in which I can repeatedly crash the
OS. I believe, though, that in 9.1, we're going to want to stream
from WAL buffers as proposed in the patch that started out this
thread, and then I think this issue can be triggered with just a
database crash.
In 9.0, I think we can fix this problem by (1) only streaming WAL that
has been fsync'd and (2) PANIC-ing if the problem occurs anyway. But
in 9.1, with sync rep and the performance demands that entails, I
think that we're going to need to rethink it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company