Re: Hot standby read slaves exceed max delay on WAL segment. Replication lag. - Mailing list pgsql-general

From Venkata Balaji Nagothi
Subject Re: Hot standby read slaves exceed max delay on WAL segment. Replication lag.
Date
Msg-id CAHBAh5uBN0FUb0nmK8BA2XZUD0mb-v-7e9-vMxUSsA9TF2V=HA@mail.gmail.com
Whole thread Raw
In response to Hot standby read slaves exceed max delay on WAL segment. Replication lag.  (Shaun Duncan <shaun.duncan@gmail.com>)
Responses Re: Hot standby read slaves exceed max delay on WAL segment. Replication lag.  (Shaun Duncan <shaun.duncan@gmail.com>)
List pgsql-general
On Thu, Mar 20, 2014 at 8:19 AM, Shaun Duncan <shaun.duncan@gmail.com> wrote:


On Mar 19, 2014 5:08 PM, "Venkata Balaji Nagothi" <vbnpgc@gmail.com> wrote:
>
>
> On Thu, Mar 20, 2014 at 4:12 AM, Shaun Duncan <shaun.duncan@gmail.com> wrote:
>
>> This is on a production 9.0.15 install with 1 master, 5 hot standby read only slaves.
>>
>> We've been trying to look into a situation where we're seeing that hot standby read slaves are receiving WAL segments, but are exceeding max_standby_archive_delay (60s) and max_standby_streaming_delay (60s) and not applying changes. The slave will get the first segment and hang (we've seen up to 30m before removing the slave from our read pool to catch up) and get further and further behind the master. Furthermore, we have seen that after a slave has caught up, putting it back into the read pool will mean will almost immediately start to see this happen again. It acts as if we had max_standby_* set to -1.
>>
>> I'm just looking for some ideas, hints, or suggestions as to what might be going on here or what we might be doing wrong.
>>
>
> Have you noticed if the read requests on standby database are taking long and loading up the server ? If yes, any idea why the queries on standby are taking long to complete ? 
>
> WAL replay would be paused ( until the read requests are served ) at the time of conflicting queries on the standby site.
>
> Also, Do you see any delay in master sending the WALs to standby ?
>
>
> Venkata Balaji N
>
> Sr. Database Administrator
> Fujitsu Australia
>
>
>
>

Yes we have seen queries against the read slave degrade severely while this happens. We have at least optimized some inefficient queries. But I would think (correct me if I am wrong) that queries would be canceled after WAL replay has exceeded the max delay. This is something we haven't seen. It behaves as if it is waiting for reads to complete.

 
No. Generally queries continue to run on Standby without any issues, except in the below 2 situations -

  • When conflicting queries (like DROP TABLE, DROP DATABASE etc..)  are executed on master whilst standby is serving the read requests. This behavior can be controlled to some extent by setting max_standby_*_delay parameters (60s in your case). If there are any read requests in process at standby database,the conflicting queries would wait for 60s - if the read requests (Generally SELECTs) do not complete, then they are forcibly cancelled
           In-short max_standby_*_delay parameters only effect the replication behavior when conflicting queries are coming to standby. 
  • When master is waiting to apply VACUUM related changes (removing dead rows) at the standby site. This behavior can be controlled to some extent by "vacuum_defer_cleanup_age". This implies number of transactions executed on the master database.

If there are no conflicting queries from primary waiting to be applied at standby database - then queries would continue to run without any issues until 
statement_timeout ( if configured ) threshold is reached.

Do you see any messages like "ERROR: canceling statement due to conflict... " in standby postgres logs ?


Hope this helps !

Venkata Balaji N

Sr.Database Administrator
Fujitsu Australia

pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: High Level Committers Wanted
Next
From: Eduardo Morras
Date:
Subject: Re: High Level Committers Wanted