Re: Synchronous Log Shipping Replication - Mailing list pgsql-hackers

From Dimitri Fontaine
Subject Re: Synchronous Log Shipping Replication
Date
Msg-id 200809101107.52311.dfontaine@hi-media.com
Whole thread Raw
In response to Re: Synchronous Log Shipping Replication  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Synchronous Log Shipping Replication  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Hi,

Le mercredi 10 septembre 2008, Heikki Linnakangas a écrit :
> Sure. That's the fundamental problem with synchronous replication.
> That's why many people choose asynchronous replication instead. Clearly
> at some point you'll want to give up and continue without the slave, or
> kill the master and fail over to the slave. I'm wondering how that's
> different than the lag between master and server in asynchronous
> replication from the client's point of view.

As a future user of this new facilities, the difference from client's POV is
simple : in normal mode of operation, we want a strong guarantee that any
COMMIT has made it to both the master and the slave at commit time. No lag
whatsoever.

You're considering lag as an option in case of failure, but I don't see this
as acceptable when you need sync commit. In case of network timeout, cluster
is down. So you want to either continue servicing in degraged mode or get the
service down while you repair the cluster, but neither of those choice can be
transparent to the admins, I'd argue.

Of course, main use case is high availability, which tends to say you do not
have the option to stop service, and seems to dictate continue servicing in
degraded mode: slave can't keep up (whatever the error domain), master is
alone, "advertise" to monitoring solutions and continue servicing.
And provide some way for the slave to "rejoin", maybe, too.

> I'm not sure I understand that paragraph. Who's the user? Do we need to
> expose some new information to the client so that it can do something?

Maybe with some GUCs where to set the acceptable "timeout" for WAL sync
process, and if reaching timeout is a warning or an error. With a userset GUC
we could event have replication-error-level transaction concurrent to non
critical ones...

Now what to do exactly in case of error remains to be decided...

HTH, Regards,
--
dim

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Synchronous Log Shipping Replication
Next
From: Simon Riggs
Date:
Subject: Re: Synchronous Log Shipping Replication