On Wed, Apr 10, 2013 at 7:44 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 04/10/2013 11:40 AM, Fujii Masao wrote:
>
>> Strange. If this is really true, shared disk failover solution is
>> fundamentally broken because the standby needs to start up with the
>> shared "corrupted" database at the failover.
>
>
> How so? Shared disk doesn't use replication. The point I was trying to make
> is that replication requires synchronization between two disparate servers,
> and verifying they have exactly the same data is a non-trivial exercise.
> Even a single transaction after a failover (effectively) negates the old
> server because there's no easy "catch up" mechanism yet.
>
> Even if this isn't necessarily true, it's the safest approach IMO.
We already rely on WAL-before-data to ensure correct recovery. What is
proposed here is to slightly redefine it to require WAL to be
replicated before it is considered to be flushed. This ensures that no
data page on disk differs from the WAL that the slave has. The
machinery to do this is already mostly there, we already wait for WAL
flushes and we know the write location on the slave. The second
requirement is that we never start up as master and we don't trust any
local WAL. This is actually how pacemaker clusters work, you would
only need to amend the RA to wipe the WAL and configure postgresql
with restart_after_crash = false.
It would be very helpful in restoring HA capability after failover if
we wouldn't have to read through the whole database after a VM goes
down and is migrated with the shared disk onto a new host.
Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de