Thread: [GENERAL] Replication slot and pg_rewind
We are using the replication slot and pg_rewind feature of postgresql 9.6
Our cluster consists of 1 master and 1 slave node.
The replication slot feature allows the master to keep as much WAL as is required by the slave.
The pg_rewind command uses WALs to bring the slave in sync with the master.
By using replication slots there are always enough WAL in the pg_xlog.
In this case is it safe to use pg_rewind without WAL archiving?
Can there be a situation where pg_rewind fails?
Regards, Subhro
On Tue, Jun 6, 2017 at 12:03 PM, Bhattacharyya, Subhro <s.bhattacharyya@sap.com> wrote: > We are using the replication slot and pg_rewind feature of postgresql 9.6 > Our cluster consists of 1 master and 1 slave node. > > The replication slot feature allows the master to keep as much WAL as is > required by the slave. > > The pg_rewind command uses WALs to bring the slave in sync with the master. > By using replication slots there are always enough WAL in the pg_xlog. > > In this case is it safe to use pg_rewind without WAL archiving? > Can there be a situation where pg_rewind fails? When pg_rewind runs it looks at the WAL from the last checkpoint before WAL diverged on the *target* node, not the source. So retaining the WAL data on the primary after the standby has been promoted makes little sense from this point of view. Even worse, once the promoted standby decides to recycle the past WAL segments you won't be able to do a rewind of the previous primary because there is no way to know what are the blocks modified on the standby since the point of divergence. -- Michael
Our cluster works as follows: We do not promote the slave while the primary is up. During an update scenario, when the master goes down the slave is promoted to master only if there is no replication lag. As a result, we do not have any data difference till now. Transactions now continue on the newly promoted master thus creating a difference in data on the two nodes. When the original master, post update comes back as slave, instead of taking a pg_basebackup, we use pg_rewind. Our expectation is that slave will be able to sync with the new master with the help of whatever WALs are present in thenew master due to replication slots. Can pg_rewind still work without WAL archiving in this scenario. Thanks, Subhro -----Original Message----- From: Michael Paquier [mailto:michael.paquier@gmail.com] Sent: Tuesday, June 6, 2017 8:50 AM To: Bhattacharyya, Subhro <s.bhattacharyya@sap.com> Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Replication slot and pg_rewind On Tue, Jun 6, 2017 at 12:03 PM, Bhattacharyya, Subhro <s.bhattacharyya@sap.com> wrote: > We are using the replication slot and pg_rewind feature of postgresql 9.6 > Our cluster consists of 1 master and 1 slave node. > > The replication slot feature allows the master to keep as much WAL as is > required by the slave. > > The pg_rewind command uses WALs to bring the slave in sync with the master. > By using replication slots there are always enough WAL in the pg_xlog. > > In this case is it safe to use pg_rewind without WAL archiving? > Can there be a situation where pg_rewind fails? When pg_rewind runs it looks at the WAL from the last checkpoint before WAL diverged on the *target* node, not the source. So retaining the WAL data on the primary after the standby has been promoted makes little sense from this point of view. Even worse, once the promoted standby decides to recycle the past WAL segments you won't be able to do a rewind of the previous primary because there is no way to know what are the blocks modified on the standby since the point of divergence. -- Michael
On Tue, Jun 6, 2017 at 1:52 PM, Bhattacharyya, Subhro <s.bhattacharyya@sap.com> wrote: > Our expectation is that slave will be able to sync with the new master with the help of whatever WALs are present in thenew master due to replication slots. > Can pg_rewind still work without WAL archiving in this scenario. I see. Yes, the slot on the old primary would keep retaining WAL, and the promoted standby would stop sending feedback once it has switched to a new timeline so that should work. Don't forget to drop the drop on the old primary after pg_rewind has been run, you don't want to bloat its pg_xlog with useless data. -- Michael