Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Sync Rep: First Thoughts on Code
Date
Msg-id 1228244886.14591.45.camel@dell.linuxdev.us.dell.com
Whole thread Raw
In response to Re: Sync Rep: First Thoughts on Code  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep: First Thoughts on Code  (Josh Berkus <josh@agliodbs.com>)
Re: Sync Rep: First Thoughts on Code  ("Fujii Masao" <masao.fujii@gmail.com>)
List pgsql-hackers
On Tue, 2008-12-02 at 13:09 +0000, Simon Riggs wrote:
> > Is it dangerous to abort the transaction with replication continued when
> > the timeout occurs? I think that the WAL consistency between two servers
> > might be broken. Because the WAL writing and sending are done concurrently,
> > and the backend might already write the WAL to disk on the primary when
> > waiting for walsender.
> 
> The issue I see is that we might want to keep wal_sender_delay small so
> that transaction times are not increased. But we also want
> wal_sender_delay high so that replication never breaks. It seems better
> to have the action on wal_sender_delay configurable if we have an
> unsteady network (like the internet). Marcus made some comments on line
> dropping that seem relevant here; we should listen to his experience.
> 
> Hmmm, dangerous? Well assuming we're linking commits with replication
> sends then it sounds it. We might end up committing to disk and then
> deciding to abort instead. But remember we don't remove the xid from
> procarray or mark the result in clog until the flush is over, so it is
> possible. But I think we should discuss this in more detail when the
> main patch is committed.
> 

What is the "it" in "it is possible"? It seems like there's still a
problem window in there.

Even if that could be made safe, in the event of a real network failure,
you'd just wait the full timeout every transaction, because it still
thinks it's replicating.

If the timeout is exceeded, it seems more reasonable to abandon the
slave until you could re-sync it and continue processing as normal. As
you pointed out, that's not necessarily an expensive operation because
you can use something like rsync. The process of re-syncing might be
made easier (or perhaps less costly), of course.

If we want to still allow processing to happen after a timeout, it seems
reasonable to have a configurable option to allow/disallow non-read-only
transactions when out of sync. 

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Sync Rep: First Thoughts on Code
Next
From: Heikki Linnakangas
Date:
Subject: Re: pg_stop_backup wait bug fix