Thread: Another streaming replication question

Another streaming replication question

From
Mark Steben
Date:
Good morning,

Thank you all for your responses to my questions about streaming/cascading replication.  I have them ironed out and running on three postgres 9.4 sandbox servers. 
My question is not 'how to ' but more 'best practices'.  We currently have hot standby non-streaming replication with one master serving two standbys in parallel. We wish to convert
to streaming replication.  My concern is that the logshipping load can be quite high - up to 40 16MB logs per minute (during off-hours maintenance). In our current hot standby
environment the logs can queue up on the master side during this period - eventually, during times of lower load they do catch up.  So, a couple questions:
   1.  Can streaming replication work in this high-load situation?
   2.  What, if anything, can I do to make it work better?  Perhaps convert to cascade? Master -->standby1 ---> standby2

I have attached the postgres.conf file for our master for reference.

Any insights welcome.   Thank you.



--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com






Attachment

Re: Another streaming replication question

From
Johannes Truschnigg
Date:
Hi Mark,

On Tue, Nov 06, 2018 at 12:13:18PM -0500, Mark Steben wrote:
> Good morning,
>
> Thank you all for your responses to my questions about streaming/cascading
> replication.  I have them ironed out and running on three postgres 9.4
> sandbox servers.
> My question is not 'how to ' but more 'best practices'.  We currently have
> hot standby non-streaming replication with one master serving two standbys
> in parallel. We wish to convert
> to streaming replication.  My concern is that the logshipping load can be
> quite high - up to 40 16MB logs per minute (during off-hours maintenance).

We use streaming replication ("plain" TCP over a 10G Ethernet link) and have
double-digit WAL segments per second at times - it's no problem for moderately
powerful, contemporary hardware. Using TLS would be fine, too, if you use a
fast stream cipher.


> In our current hot standby
> environment the logs can queue up on the master side during this period -
> eventually, during times of lower load they do catch up.  So, a couple
> questions:
>    1.  Can streaming replication work in this high-load situation?

Sure (depending on your hardware/environment and your current load, of
course).

However, even if it doesn't, it won't make things worse in case you still
archive WAL and/or set up a replication slot for your standby(s) - a secondary
will transparently switch back to applying WAL from the archive whenever
streaming replication is interrupted (e.g. due to too much lag having
accumulated, and too little WAL kept around on the master - in the absence of
replication slots, this can happen), and switch back to streaming once it has
done so (and streaming is available).


>    2.  What, if anything, can I do to make it work better?  Perhaps convert
> to cascade? Master -->standby1 ---> standby2

I guess it depends on what your (perceived?) bottleneck lies - is it your
network's bandwith, your nodes' I/O capabilities, or maybe even mere CPU
saturation that makes your setup work "not good enough" (you should qantify
that, and think hard about how, and by how much, you want to improve)?

The latter would be the hardest to solve (i.e. "get more powerful CPUs", or
maybe "review your current CPUs' max. turbo frequency/power envelope"), whilst
you could - if you were to upgrade to 9.5 or better - use WAL compression to
potentially alleviate the former with a trivial config change.


--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Attachment

Re: Another streaming replication question

From
Rui DeSousa
Date:


On Nov 6, 2018, at 12:13 PM, Mark Steben <mark.steben@drivedominion.com> wrote:

   1.  Can streaming replication work in this high-load situation?

Yes, given you have low latency and the bandwidth to support it.  

Need to plan for cases when streaming does fall behind too; don’t want streaming to just break.

Some options:

1. Replication slot on the upstream sever
2. Allow streaming to fallback to restoring WALs from archive
3. Both of the above options


   2.  What, if anything, can I do to make it work better?  Perhaps convert to cascade? Master -->standby1 ---> standby2

That would help in managing you bandwidth between nodes.


There is a lot to consider and it really depends on your use cases and your application/queries.

A few rhetorical questions: 

What is the replicas being used for?  
What an acceptable replication lag?
What an acceptable apply lag?
Should queries be killed to apply replica transactions?
What’s your vacuum strategy as that impacts replicas?

These two parameters I find invaluable (their validity depends on your use case). 

hot_standby_feedback = true
max_standby_streaming_delay = 30s

The other big thing is query performance and I/O as each has a large impact on replication lag.  So plan for tuning.

Re: Another streaming replication question

From
Mark Steben
Date:
Thank you Johannes for your feedback.  It will be very useful.


On Tue, Nov 6, 2018 at 4:08 PM Johannes Truschnigg <johannes@truschnigg.info> wrote:
Hi Mark,

On Tue, Nov 06, 2018 at 12:13:18PM -0500, Mark Steben wrote:
> Good morning,
>
> Thank you all for your responses to my questions about streaming/cascading
> replication.  I have them ironed out and running on three postgres 9.4
> sandbox servers.
> My question is not 'how to ' but more 'best practices'.  We currently have
> hot standby non-streaming replication with one master serving two standbys
> in parallel. We wish to convert
> to streaming replication.  My concern is that the logshipping load can be
> quite high - up to 40 16MB logs per minute (during off-hours maintenance).

We use streaming replication ("plain" TCP over a 10G Ethernet link) and have
double-digit WAL segments per second at times - it's no problem for moderately
powerful, contemporary hardware. Using TLS would be fine, too, if you use a
fast stream cipher.


> In our current hot standby
> environment the logs can queue up on the master side during this period -
> eventually, during times of lower load they do catch up.  So, a couple
> questions:
>    1.  Can streaming replication work in this high-load situation?

Sure (depending on your hardware/environment and your current load, of
course).

However, even if it doesn't, it won't make things worse in case you still
archive WAL and/or set up a replication slot for your standby(s) - a secondary
will transparently switch back to applying WAL from the archive whenever
streaming replication is interrupted (e.g. due to too much lag having
accumulated, and too little WAL kept around on the master - in the absence of
replication slots, this can happen), and switch back to streaming once it has
done so (and streaming is available).


>    2.  What, if anything, can I do to make it work better?  Perhaps convert
> to cascade? Master -->standby1 ---> standby2

I guess it depends on what your (perceived?) bottleneck lies - is it your
network's bandwith, your nodes' I/O capabilities, or maybe even mere CPU
saturation that makes your setup work "not good enough" (you should qantify
that, and think hard about how, and by how much, you want to improve)?

The latter would be the hardest to solve (i.e. "get more powerful CPUs", or
maybe "review your current CPUs' max. turbo frequency/power envelope"), whilst
you could - if you were to upgrade to 9.5 or better - use WAL compression to
potentially alleviate the former with a trivial config change.


--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.


--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com






Re: Another streaming replication question

From
Mark Steben
Date:
Thank you Rui for your feedback.  It will be very valuable in my ultimate solution I'm sure. 

On Tue, Nov 6, 2018 at 4:09 PM Rui DeSousa <rui@crazybean.net> wrote:


On Nov 6, 2018, at 12:13 PM, Mark Steben <mark.steben@drivedominion.com> wrote:

   1.  Can streaming replication work in this high-load situation?

Yes, given you have low latency and the bandwidth to support it.  

Need to plan for cases when streaming does fall behind too; don’t want streaming to just break.

Some options:

1. Replication slot on the upstream sever
2. Allow streaming to fallback to restoring WALs from archive
3. Both of the above options


   2.  What, if anything, can I do to make it work better?  Perhaps convert to cascade? Master -->standby1 ---> standby2

That would help in managing you bandwidth between nodes.


There is a lot to consider and it really depends on your use cases and your application/queries.

A few rhetorical questions: 

What is the replicas being used for?  
What an acceptable replication lag?
What an acceptable apply lag?
Should queries be killed to apply replica transactions?
What’s your vacuum strategy as that impacts replicas?

These two parameters I find invaluable (their validity depends on your use case). 

hot_standby_feedback = true
max_standby_streaming_delay = 30s

The other big thing is query performance and I/O as each has a large impact on replication lag.  So plan for tuning.


--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com