Thread: synchronous replication: blocking commit on the master
I have observed that a commit on a replication master hangs if there are no slaves to communicate with if synchronous replication is enabled. I believe I have seen a posting that this behavior is deliberate.
In my environment I'd prefer to have the master continue processing transactions if there is a failure at the slave. Questions:
- is there any way to allow the master proceed if the slave is unavailable (perhaps a configuration parameter I'm missing)?
- if not, is there any ideological objection to allowing the master to continue for users that would prefer that behavior?
Thanks.
On Monday, February 27, 2012 4:36:26 pm Jameison Martin wrote: > I have observed that a commit on a replication master hangs if there are no > slaves to communicate with if synchronous replication is enabled. I > believe I have seen a posting that this behavior is deliberate. > > In my environment I'd prefer to have the master continue processing > transactions if there is a failure at the slave. Questions: * is there any > way to allow the master proceed if the slave is unavailable (perhaps a > configuration parameter I'm missing)? http://www.postgresql.org/docs/9.1/interactive/runtime-config-wal.html#GUC- SYNCHRONOUS-COMMIT " If synchronous_standby_names is set, this parameter also controls whether or not transaction commit will wait for the transaction's WAL records to be flushed to disk and replicated to the standby server. The commit wait will last until a reply from the current synchronous standby indicates it has written the commit record of the transaction to durable storage. If synchronous replication is in use, it will normally be sensible either to wait both for WAL records to reach both the local and remote disks, or to allow the transaction to commit asynchronously. However, the special value local is available for transactions that wish to wait for local flush to disk, but not synchronous replication. " > > * if not, is there any ideological objection to allowing the master to > continue for users that would prefer that behavior? Thanks. Seems to me to defeat the purpose of sync replication. Though if you want, beside the above: " Even when synchronous replication is enabled, individual transactions can be configured not to wait for replication by setting the synchronous_commit parameter to local or off. " Though it looks like you are really looking for async streaming replication. -- Adrian Klaver adrian.klaver@gmail.com
I have specific needs for wanting synchronous replication instead of asynchronous replication, notwithstanding my desire to continue processing work on the master if there are no active slaves. I would like to use replication for both HA and for query scaling. I'd like replication to be synchronous to ensure that any slaves are up to date, and I cannot afford even the small data potential loss implied by asynchronous replication. However, should there be a situation where no slaves are alive (e.g. there is a single slave and it fails for whatever reason), I do not want to compromise the availability of the master while the slave is being restored. Instead, I'd like to be able to continue processing transactions on the master unimpeded until a slave can be brought back online. Once a slave is caught back up to the master I'd like to switch back to synchronous replication and again be able to use the slave to scale reads and as a failover target should the master fail.
Does that make sense?
And thanks for the suggestion about switching individual transactions synchronous_commit = local or off. That could be of use in achieving my goal, albeit clumsily and with intervention on the client. I suppose a client could detect that a slave is no longer available, could then send an interrupt to all connections with pending work, and then set synchronous_commit=local on all connections before further work is submitted to the master.
Thanks.
From: Adrian Klaver <adrian.klaver@gmail.com>
To: pgsql-general@postgresql.org; Jameison Martin <jameisonb@yahoo.com>
Sent: Monday, February 27, 2012 5:52 PM
Subject: Re: [GENERAL] synchronous replication: blocking commit on the master
On Monday, February 27, 2012 4:36:26 pm Jameison Martin wrote:
> I have observed that a commit on a replication master hangs if there are no
> slaves to communicate with if synchronous replication is enabled. I
> believe I have seen a posting that this behavior is deliberate.
>
> In my environment I'd prefer to have the master continue processing
> transactions if there is a failure at the slave. Questions: * is there any
> way to allow the master proceed if the slave is unavailable (perhaps a
> configuration parameter I'm missing)?
http://www.postgresql.org/docs/9.1/interactive/runtime-config-wal.html#GUC-
SYNCHRONOUS-COMMIT
"
If synchronous_standby_names is set, this parameter also controls whether or not
transaction commit will wait for the transaction's WAL records to be flushed to
disk and replicated to the standby server. The commit wait will last until a
reply from the current synchronous standby indicates it has written the commit
record of the transaction to durable storage. If synchronous replication is in
use, it will normally be sensible either to wait both for WAL records to reach
both the local and remote disks, or to allow the transaction to commit
asynchronously. However, the special value local is available for transactions
that wish to wait for local flush to disk, but not synchronous replication.
"
>
> * if not, is there any ideological objection to allowing the master to
> continue for users that would prefer that behavior? Thanks.
Seems to me to defeat the purpose of sync replication. Though if you want,
beside the above:
"
Even when synchronous replication is enabled, individual transactions can be
configured not to wait for replication by setting the synchronous_commit
parameter to local or off.
"
Though it looks like you are really looking for async streaming replication.
--
Adrian Klaver
adrian.klaver@gmail.com
On Monday, February 27, 2012 10:21:24 pm Jameison Martin wrote: > I have specific needs for wanting synchronous replication instead of > asynchronous replication, notwithstanding my desire to continue processing > work on the master if there are no active slaves. I would like to use > replication for both HA and for query scaling. I'd like replication to be > synchronous to ensure that any slaves are up to date, and I cannot afford > even the small data potential loss implied by asynchronous replication. > However, should there be a situation where no slaves are alive (e.g. > there is a single slave and it fails for whatever reason), I do not want > to compromise the availability of the master while the slave is being > restored. Instead, I'd like to be able to continue processing transactions > on the master unimpeded until a slave can be brought back online. Once a > slave is caught back up to the master I'd like to switch back to > synchronous replication and again be able to use the slave to scale reads > and as a failover target should the master fail. > > Does that make sense? No not really:) The two statements below seem to be at odds with each other: "I'd like replication to be synchronous to ensure that any slaves are up to date, and I cannot afford even the small data potential loss implied by asynchronous replication." "Instead, I'd like to be able to continue processing transactions on the master unimpeded until a slave can be brought back online." It seems you want async sync replication and, under the observation that a chain is only as strong as its weakest link, you are really getting async replication. That being said, it is your set up and you have the options to have it run the way you want. -- Adrian Klaver adrian.klaver@gmail.com
On Tuesday, February 28, 2012 10:22:14 am Jameison Martin wrote: > > i hope that clears it up. Yes, but before you roll your own you may want to take a look at whats already out there: A survey of what is out there: http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling Some that I know of: repmgr http://repmgr.org/ http://groups.google.com/group/repmgr/browse_thread/thread/1439dfebf890e999?pli=1 pgpool-II http://www.pgpool.net/mediawiki/index.php/Main_Page > > thanks. > -- Adrian Klaver adrian.klaver@gmail.com
i don't think i've explained things very clearly. the implied contradiction is that i'd be using asynchronous replication to catch up a slave after a slave failure and thus i'm losing the transactional consistency that i suggest i need. if a slave fails and is brought back on line i am indeed proposing that it catch up with the master asynchronously; however, the slave wouldn't be promoted to a hot standby until it is completely caught up and could be reestablished as a synchronous replica (at least that is what i'd like to do in theory). so i'm proposing that a slave would never be a candidate for a HA failover unless it is completely in sync with a master: if there is no slave that is in sync with the master at the time the master fails, then the master would have to be recovered from the filesystem via traditional recovery. the fact that i envision 'catching up' a slave to a master using asychronous replication is not particularly relevant to the transactional guarantees of the system as a whole if the slave is effectively unavailable while catching up.
similarly, any slave that isn't caught up to its master would also not be eligible for queries.
i can understand why the master might hang when there is no reachable replica during synchronous commit, this is exactly the right thing to do if you want to guarantee that you have at least 2 distinct spheres of durability. but i'd prefer to sacrifice the extra durability guarantee in favor of availability in this case given that recovery from the file system is still an option should the master subsequently fail. my availability issue is that the master would clearly be hung/unavailable for an unbounded amount of time without a strong guarantee about the time it might take to bring a replica back up which is not acceptable in my case.
if the master hangs commits because there is no active slave, i believe that an administrator would have to
- detect that there are no active slaves
- shut the master down
- disable synchronous replication
- bring the master back up
or, alternatively:
- detect that there are no active slaves
- interrupt any connections that are blocking on commit
- set synchronous_replication = local or off on all connections, effectively disabling synchronous replication going forward
i envision some kind of time out after which the slave is removed from the master's synchronous replica set. and of course i'd need to work out the mechanics of bringing the slave back up to sync with the master and adding it back to the replica set, which would clearly require some additional machinery.
i hope that clears it up.
thanks.
From: Adrian Klaver <adrian.klaver@gmail.com>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Tuesday, February 28, 2012 7:32 AM
Subject: Re: [GENERAL] synchronous replication: blocking commit on the master
On Monday, February 27, 2012 10:21:24 pm Jameison Martin wrote:
> I have specific needs for wanting synchronous replication instead of
> asynchronous replication, notwithstanding my desire to continue processing
> work on the master if there are no active slaves. I would like to use
> replication for both HA and for query scaling. I'd like replication to be
> synchronous to ensure that any slaves are up to date, and I cannot afford
> even the small data potential loss implied by asynchronous replication.
> However, should there be a situation where no slaves are alive (e.g.
> there is a single slave and it fails for whatever reason), I do not want
> to compromise the availability of the master while the slave is being
> restored. Instead, I'd like to be able to continue processing transactions
> on the master unimpeded until a slave can be brought back online. Once a
> slave is caught back up to the master I'd like to switch back to
> synchronous replication and again be able to use the slave to scale reads
> and as a failover target should the master fail.
>
> Does that make sense?
No not really:)
The two statements below seem to be at odds with each other:
"I'd like replication to be synchronous to ensure that any slaves are up to
date, and I cannot afford even the small data potential loss implied by
asynchronous replication."
"Instead, I'd like to be able to continue processing transactions on the master
unimpeded until a slave can be brought back online."
It seems you want async sync replication and, under the observation that a chain
is only as strong as its weakest link, you are really getting async replication.
That being said, it is your set up and you have the options to have it run the
way you want.
--
Adrian Klaver
adrian.klaver@gmail.com
On Wed, Feb 29, 2012 at 3:22 AM, Jameison Martin <jameisonb@yahoo.com> wrote: > i don't think i've explained things very clearly. the implied contradiction > is that i'd be using asynchronous replication to catch up a slave after a > slave failure and thus i'm losing the transactional consistency that i > suggest i need. if a slave fails and is brought back on line i am indeed > proposing that it catch up with the master asynchronously; however, the > slave wouldn't be promoted to a hot standby until it is completely caught up > and could be reestablished as a synchronous replica (at least that is what > i'd like to do in theory). so i'm proposing that a slave would never be a > candidate for a HA failover unless it is completely in sync with a master: > if there is no slave that is in sync with the master at the time the master > fails, then the master would have to be recovered from the filesystem via > traditional recovery. the fact that i envision 'catching up' a slave to a > master using asychronous replication is not particularly relevant to the > transactional guarantees of the system as a whole if the slave is > effectively unavailable while catching up. > > similarly, any slave that isn't caught up to its master would also not be > eligible for queries. > > i can understand why the master might hang when there is no reachable > replica during synchronous commit, this is exactly the right thing to do if > you want to guarantee that you have at least 2 distinct spheres of > durability. but i'd prefer to sacrifice the extra durability guarantee in > favor of availability in this case given that recovery from the file system > is still an option should the master subsequently fail. my availability > issue is that the master would clearly be hung/unavailable for an unbounded > amount of time without a strong guarantee about the time it might take to > bring a replica back up which is not acceptable in my case. > > if the master hangs commits because there is no active slave, i believe that > an administrator would have to > > detect that there are no active slaves > shut the master down > disable synchronous replication > bring the master back up You don't need to restart the server when you disable sync replication. You can do that by emptying synchronous_standby_names in postgresql.conf and reloading it (i.e., pg_ctl reload). BTW, though you can disable sync replication by setting synchronous_commit to local in postgresql.conf, you should use synchronous_standby_names for that purpose instead. Setting synchronous_commit to local can prevent new transactions (which are executed after setting synchronous_commit to local) from being blocked, but cannot resume the already-blocking transactions. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center