Thread: Streaming replication, master recycling

Streaming replication, master recycling

From

Date:

11 May 2016, 04:31:22

Hi All,

we are currently using streaming replication on multiple node pairs. We are seeing some issues, but I am mainly interrested in clarification.

When a failover occurs, we touch the trigger file, promoting the previous slave to master. That works perfectly.

For recycling the previous master, we create a recovery.conf (with recovery_target_timeline = 'latest') and *try* to start up. If postgresql starts up, we accept it as a new slave. If it does not, we proceed with a full basebackup.

This approach seems to work, but I have found indications that it can lead to database corruption: http://hlinnaka.iki.fi/presentations/NordicPGDay2015-pg_rewind.pdf

I am mainly seeking understanding of if and why this approach is a bad idea.

Thanks,

Fredrik Huitfeldt

Re: Streaming replication, master recycling

From

Venkata Balaji N

Date:

11 May 2016, 10:47:18

On Wed, May 11, 2016 at 2:31 PM, <fredrik@huitfeldt.com> wrote:

Hi All,

we are currently using streaming replication on multiple node pairs. We are seeing some issues, but I am mainly interrested in clarification.

When a failover occurs, we touch the trigger file, promoting the previous slave to master. That works perfectly.

For recycling the previous master, we create a recovery.conf (with recovery_target_timeline = 'latest') and *try* to start up. If postgresql starts up, we accept it as a new slave. If it does not, we proceed with a full basebackup.

Which version of postgresql you are using ?

You need to shutdown master first, then promote slave and then other way round, but, this can be clarified only if you let us know the postgresql version. This is quite tricky in 9.2.x and from 9.3.x.

Regards,

Venkata B N

Fujitsu Australia

Re: Streaming replication, master recycling

From

Date:

11 May 2016, 11:04:33

I apologise for the missing data.

we are running 9.1.15 on debian servers.

when we promote the old slave, it seems to go fine. Are you saying that it will cause issues down the line if the previous master is not shut down before promoting?

I was actually more concerned with the fact that we (some times) recycle the old master without doing a full basebackup. Again, this seems to work, but this presentation seems to indicate that this can cause problems (while seeming to work): http://hlinnaka.iki.fi/presentations/NordicPGDay2015-pg_rewind.pdf

The note is on page 14, under the headline: "Naive approach".

thank you for your support,

Fredrik

On 11 May 2016 at 12:47:13 +02:00, Venkata Balaji N <nag1010@gmail.com> wrote:

On Wed, May 11, 2016 at 2:31 PM, <fredrik@huitfeldt.com> wrote:
Hi All,

we are currently using streaming replication on multiple node pairs. We are seeing some issues, but I am mainly interrested in clarification.

When a failover occurs, we touch the trigger file, promoting the previous slave to master. That works perfectly.

For recycling the previous master, we create a recovery.conf (with recovery_target_timeline = 'latest') and *try* to start up. If postgresql starts up, we accept it as a new slave. If it does not, we proceed with a full basebackup.

Which version of postgresql you are using ?

You need to shutdown master first, then promote slave and then other way round, but, this can be clarified only if you let us know the postgresql version. This is quite tricky in 9.2.x and from 9.3.x.

Regards,
Venkata B N

Fujitsu Australia

Re: Streaming replication, master recycling

From

Sameer Kumar

Date:

11 May 2016, 11:37:11

On Wed, May 11, 2016 at 4:35 PM <fredrik@huitfeldt.com> wrote:

I apologise for the missing data.

we are running 9.1.15 on debian servers.

I think there was a patch in v9.3 which makes sure that if the master has been shutdown properly (smart or fast mode), it will ensure that pending wals are replicated before it shutdown. Also, the timeline switch are written in WAL files since v9.3

So I don't see a reason why a proper switchover with fast shutdown of master and promotion of standby will cause troubles with v9.3 or greater.

Ofcourse I can be wrong (and naive!) and this does not apply for your case.

when we promote the old slave, it seems to go fine. Are you saying that it will cause issues down the line if the previous master is not shut down before promoting?

You might want to share your recovery.conf on standby node and the recovery.conf which you add on the lost node (old master) while adding it as a standby.

I was actually more concerned with the fact that we (some times) recycle the old master without doing a full basebackup.

I have done with with v9.2 and v9.3 and seems to be working fine. As long as you have not missed any transactions from master (controlled switchover). In case you are in a situation where master went down before it could replicate the last committed transaction, I don't think lost node (old master) will be able to join the new timeline of standby so your replication would not work (even though the node has been started up).

Again, this seems to work, but this presentation seems to indicate that this can cause problems (while seeming to work): http://hlinnaka.iki.fi/presentations/NordicPGDay2015-pg_rewind.pdf

The note is on page 14, under the headline: "Naive approach".

thank you for your support,
Fredrik

On 11 May 2016 at 12:47:13 +02:00, Venkata Balaji N <nag1010@gmail.com> wrote:

On Wed, May 11, 2016 at 2:31 PM, <fredrik@huitfeldt.com> wrote:
Hi All,

we are currently using streaming replication on multiple node pairs. We are seeing some issues, but I am mainly interrested in clarification.

When a failover occurs, we touch the trigger file, promoting the previous slave to master. That works perfectly.

For recycling the previous master, we create a recovery.conf (with recovery_target_timeline = 'latest') and *try* to start up. If postgresql starts up, we accept it as a new slave. If it does not, we proceed with a full basebackup.

Which version of postgresql you are using ?

You need to shutdown master first, then promote slave and then other way round, but, this can be clarified only if you let us know the postgresql version. This is quite tricky in 9.2.x and from 9.3.x.

Regards,
Venkata B N

Fujitsu Australia

Best Regards

Sameer Kumar | DB Solution Architect

ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com

Re: Streaming replication, master recycling

From

Venkata Balaji N

Date:

14 May 2016, 07:38:55

On Wed, May 11, 2016 at 9:04 PM, <fredrik@huitfeldt.com> wrote:

I apologise for the missing data.

we are running 9.1.15 on debian servers.

There is a possibility of making the old master standby if you have promoted standby after clean-shutting down the master. I I tested this in 9.2.x and later versions. This is a manual process in 9.2, i think it is the same in 9.1.x version as well.

The process is -

- Ensure master and standby are in sync before switching over

- Promote standby

- build recovery.conf at old master with the parameter recovery_target_timeline set to 'latest'

- When you start the old master, it will ask for timeline history files which you need to manually transfer from new master

- The old master must become standby

Hope this helps and works in 9.1.x

Regards,

Venkata B N

Fujitsu Australia

Re: Streaming replication, master recycling

From

Venkata Balaji N

Date:

15 May 2016, 08:33:47

On Sat, May 14, 2016 at 5:38 PM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Wed, May 11, 2016 at 9:04 PM, <fredrik@huitfeldt.com> wrote:
I apologise for the missing data.

we are running 9.1.15 on debian servers.

There is a possibility of making the old master standby if you have promoted standby after clean-shutting down the master. I I tested this in 9.2.x and later versions. This is a manual process in 9.2, i think it is the same in 9.1.x version as well.

The process is -

- Ensure master and standby are in sync before switching over
- Promote standby
- build recovery.conf at old master with the parameter recovery_target_timeline set to 'latest'
- When you start the old master, it will ask for timeline history files which you need to manually transfer from new master
- The old master must become standby

Hope this helps and works in 9.1.x

I have tested this out and this procedure works in 9.1.x version too. So, you will need to shutdown the master first to ensure roles are successfully reversed.

Regards,

Venkata B N

Fujitsu Australia