Thread: Queries are failing on standby server

Queries are failing on standby server

From
Wasim Devale
Date:
Hi All

The queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.  

What setting do we need to configure to not conflict with queries. Below is the error.

Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'

Thanks,
Wasim


Re: Queries are failing on standby server

From
Keith Fiske
Date:
Two options here, both settings on the replica side:

- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY

- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK

On Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:
Hi All

The queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.  

What setting do we need to configure to not conflict with queries. Below is the error.

Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'

Thanks,
Wasim




--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: Queries are failing on standby server

From
Ron Johnson
Date:
How do those config options interact with replication slots?

On Thu, Jul 25, 2024 at 9:54 AM Keith Fiske <keith.fiske@crunchydata.com> wrote:
Two options here, both settings on the replica side:

- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY

- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK

On Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:
Hi All

The queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.  

What setting do we need to configure to not conflict with queries. Below is the error.

Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'

 

Re: Queries are failing on standby server

From
Wasim Devale
Date:
This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict.

What does the above statement mean? Primary does not clean up the old rows ? The it will be problematic if database is under high insert delete load.

On Thu, 25 Jul, 2024, 7:24 pm Keith Fiske, <keith.fiske@crunchydata.com> wrote:
Two options here, both settings on the replica side:

- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY

- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK

On Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:
Hi All

The queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.  

What setting do we need to configure to not conflict with queries. Below is the error.

Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'

Thanks,
Wasim




--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: Queries are failing on standby server

From
Ron Johnson
Date:
On Thu, Jul 25, 2024 at 12:54 PM Wasim Devale <wasimd60@gmail.com> wrote:
This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict.

What does the above statement mean? Primary does not clean up the old rows ?

If you're executing long-running queries on the replica, then you you don't want the old rows cleaned up until the long-running query is complete.
 
The it will be problematic if database is under high insert delete load.

Which is why it's only recommended in rare circumstances.
 

On Thu, 25 Jul, 2024, 7:24 pm Keith Fiske, <keith.fiske@crunchydata.com> wrote:
Two options here, both settings on the replica side:

- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY

- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK

On Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:
Hi All

The queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.  

What setting do we need to configure to not conflict with queries. Below is the error.

Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'

Thanks,
Wasim




--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: Queries are failing on standby server

From
Wasim Devale
Date:
ERROR:  User query might have needed to see row versions that must be removed.canceling statement due to conflict with recovery ERROR:  canceling statement due to conflict with recovery SQL state: 40001 Detail: User query might have needed to see row versions that must be removed.

So how to tackle the above error. PG version is 12.8 and has a replication slot created.

On Thu, Jul 25, 2024 at 10:30 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Thu, Jul 25, 2024 at 12:54 PM Wasim Devale <wasimd60@gmail.com> wrote:
This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict.

What does the above statement mean? Primary does not clean up the old rows ?

If you're executing long-running queries on the replica, then you you don't want the old rows cleaned up until the long-running query is complete.
 
The it will be problematic if database is under high insert delete load.

Which is why it's only recommended in rare circumstances.
 

On Thu, 25 Jul, 2024, 7:24 pm Keith Fiske, <keith.fiske@crunchydata.com> wrote:
Two options here, both settings on the replica side:

- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY

- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.

https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK

On Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:
Hi All

The queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.  

What setting do we need to configure to not conflict with queries. Below is the error.

Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'

Thanks,
Wasim




--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: Queries are failing on standby server

From
Scott Ribe
Date:
> On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote:
>
> So how to tackle the above error.

???

The two options to mitigate this, and their side effects, were explained to you. What more do you want?


Re: Queries are failing on standby server

From
Wasim Devale
Date:

Our company's production in the evening only and has heavy loading and unloading of data. So I can suggest them not to run the long running queries in bulk for analysis at that peak time and will ask them to use them off peak hours. Correct? Any suggestions from you.

On Fri, 26 Jul, 2024, 12:05 am Scott Ribe, <scott_ribe@elevated-dev.com> wrote:
> On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote:
>
> So how to tackle the above error.

???

The two options to mitigate this, and their side effects, were explained to you. What more do you want?

Re: Queries are failing on standby server

From
Fernando Hevia
Date:
Hi Wasim,

I think you might have misinterpreted the explanation given to you. The cancellation of the query on the standby server isn't related to the load on the primary server. It happens that when you run queries on a hot standby, the replication is temporarily paused in order to not modify data the running queries on the standby server need. Once the queries end, replication resumes.
The problem of this behaviour is that the standby server starts to fall behind in relation to the master, a scenario which presents a risky condition: if the master happens to fail while the replica is delayed you end up with data loss. 
To avoid having a standby server lagging too far behind Postgres will cancel long running queries on the replica. The parameter max_standby_streaming_delay defines the maximum replication delay the standby will tolerate. Default is 30 seconds. Increase the value to allow for longer running queries on the standby server bearing in mind that you could end up with data loss if the master fails at the wrong moment. 

A working alternative is to have one standby server exclusively for replication purposes and another standby for reporting/read-only queries where you can increase the max_standby_streaming_delay to accommodate your long running queries. Of course, this will require additional computing and storage resources.

Cheers,
Fernando.






  

El jue, 25 jul 2024 a la(s) 3:41 p.m., Wasim Devale (wasimd60@gmail.com) escribió:

Our company's production in the evening only and has heavy loading and unloading of data. So I can suggest them not to run the long running queries in bulk for analysis at that peak time and will ask them to use them off peak hours. Correct? Any suggestions from you.

On Fri, 26 Jul, 2024, 12:05 am Scott Ribe, <scott_ribe@elevated-dev.com> wrote:
> On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote:
>
> So how to tackle the above error.

???

The two options to mitigate this, and their side effects, were explained to you. What more do you want?

Re: Queries are failing on standby server

From
Keith Fiske
Date:


On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
Hi Wasim,

I think you might have misinterpreted the explanation given to you. The cancellation of the query on the standby server isn't related to the load on the primary server. It happens that when you run queries on a hot standby, the replication is temporarily paused in order to not modify data the running queries on the standby server need. Once the queries end, replication resumes.
The problem of this behaviour is that the standby server starts to fall behind in relation to the master, a scenario which presents a risky condition: if the master happens to fail while the replica is delayed you end up with data loss. 
To avoid having a standby server lagging too far behind Postgres will cancel long running queries on the replica. The parameter max_standby_streaming_delay defines the maximum replication delay the standby will tolerate. Default is 30 seconds. Increase the value to allow for longer running queries on the standby server bearing in mind that you could end up with data loss if the master fails at the wrong moment. 

A working alternative is to have one standby server exclusively for replication purposes and another standby for reporting/read-only queries where you can increase the max_standby_streaming_delay to accommodate your long running queries. Of course, this will require additional computing and storage resources.

Cheers,
Fernando.


This is all true, but the hot_standby_feedback option is the way to get around needing to worry about replication delay all together. As far as how it affects VACUUM, it's no different to how running those same queries on the primary would affect it. The reason I mention it is that people think that moving queries to the replica takes away all the effects of running them on the primary. It takes away the load of the query, but there are side effects that still have to be managed. Either of the options mentioned are fine to do as long as you know the consequences of them.  

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: Queries are failing on standby server

From
Laurenz Albe
Date:
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:
> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
> > I think you might have misinterpreted the explanation given to you. The cancellation of the
> > query on the standby server isn't related to the load on the primary server. It happens that
> > when you run queries on a hot standby, the replication is temporarily paused in order to not
> > modify data the running queries on the standby server need.

Replication (applying the WAL information) is only paused if there is a conflict.
Even when replay is paused, the WAL is still replicated to the standby and piles up there.

> > Once the queries end, replication resumes.
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.

No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.

> > To avoid having a standby server lagging too far behind Postgres will cancel long running
> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum
> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to
> > allow for longer running queries on the standby server bearing in mind that you could end
> > up with data loss if the master fails at the wrong moment.

Yes, increasing "max_standby_streaming_delay" is the correct solution.
You can set it to -1 to prevent any queries on the standby from bein cancelled.

> > A working alternative is to have one standby server exclusively for replication purposes
> > and another standby for reporting/read-only queries where you can increase the
> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will
> > require additional computing and storage resources.

That is good advice.

> > >
> This is all true, but the hot_standby_feedback option is the way to get around needing to
> worry about replication delay all together.

No, because there are other kinds of replication conflicts.  The most frequent are:

- lock conflicts

  They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with
  a query on the standby.  The most frequent cause is VACUUM truncation (which can
  be disabled for individual tables).

- buffer pin conflicts

  It depends on the workload if you get them, but you cannot get rid of them.

Yours,
Laurenz Albe



Re: Queries are failing on standby server

From
Wasim Devale
Date:

Thanks everyone for your inputs and solutions.

On Fri, 26 Jul, 2024, 10:38 am Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:
> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
> > I think you might have misinterpreted the explanation given to you. The cancellation of the
> > query on the standby server isn't related to the load on the primary server. It happens that
> > when you run queries on a hot standby, the replication is temporarily paused in order to not
> > modify data the running queries on the standby server need.

Replication (applying the WAL information) is only paused if there is a conflict.
Even when replay is paused, the WAL is still replicated to the standby and piles up there.

> > Once the queries end, replication resumes.
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.

No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.

> > To avoid having a standby server lagging too far behind Postgres will cancel long running
> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum
> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to
> > allow for longer running queries on the standby server bearing in mind that you could end
> > up with data loss if the master fails at the wrong moment.

Yes, increasing "max_standby_streaming_delay" is the correct solution.
You can set it to -1 to prevent any queries on the standby from bein cancelled.

> > A working alternative is to have one standby server exclusively for replication purposes
> > and another standby for reporting/read-only queries where you can increase the
> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will
> > require additional computing and storage resources.

That is good advice.

> > >
> This is all true, but the hot_standby_feedback option is the way to get around needing to
> worry about replication delay all together.

No, because there are other kinds of replication conflicts.  The most frequent are:

- lock conflicts

  They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with
  a query on the standby.  The most frequent cause is VACUUM truncation (which can
  be disabled for individual tables).

- buffer pin conflicts

  It depends on the workload if you get them, but you cannot get rid of them.

Yours,
Laurenz Albe

Re: Queries are failing on standby server

From
obi reddy
Date:
Hi ,


Set the below parameters on standby node.

max_standby_archive_delay = 300s
max_standby_streaming_delay = 300s

Thanks 
Obireddy.G


On Fri, 26 Jul 2024, 11:44 Wasim Devale, <wasimd60@gmail.com> wrote:

Thanks everyone for your inputs and solutions.

On Fri, 26 Jul, 2024, 10:38 am Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:
> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
> > I think you might have misinterpreted the explanation given to you. The cancellation of the
> > query on the standby server isn't related to the load on the primary server. It happens that
> > when you run queries on a hot standby, the replication is temporarily paused in order to not
> > modify data the running queries on the standby server need.

Replication (applying the WAL information) is only paused if there is a conflict.
Even when replay is paused, the WAL is still replicated to the standby and piles up there.

> > Once the queries end, replication resumes.
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.

No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.

> > To avoid having a standby server lagging too far behind Postgres will cancel long running
> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum
> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to
> > allow for longer running queries on the standby server bearing in mind that you could end
> > up with data loss if the master fails at the wrong moment.

Yes, increasing "max_standby_streaming_delay" is the correct solution.
You can set it to -1 to prevent any queries on the standby from bein cancelled.

> > A working alternative is to have one standby server exclusively for replication purposes
> > and another standby for reporting/read-only queries where you can increase the
> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will
> > require additional computing and storage resources.

That is good advice.

> > >
> This is all true, but the hot_standby_feedback option is the way to get around needing to
> worry about replication delay all together.

No, because there are other kinds of replication conflicts.  The most frequent are:

- lock conflicts

  They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with
  a query on the standby.  The most frequent cause is VACUUM truncation (which can
  be disabled for individual tables).

- buffer pin conflicts

  It depends on the workload if you get them, but you cannot get rid of them.

Yours,
Laurenz Albe

RE: [EXTERNAL] Re: Queries are failing on standby server

From
"Wetmore, Matthew (CTR)"
Date:
This is an issue at most place.  Don't give out the r/o login to the Replica.  The Replica is not a data research /prod
sandboxbox, it's a streaming replica for HA failover.  If you want a research r/o query box, create one.
 

-----Original Message-----
From: Scott Ribe <scott_ribe@elevated-dev.com> 
Sent: Thursday, July 25, 2024 11:36 AM
To: Wasim Devale <wasimd60@gmail.com>
Cc: Ron Johnson <ronljohnsonjr@gmail.com>; pgsql-admin <pgsql-admin@postgresql.org>
Subject: [EXTERNAL] Re: Queries are failing on standby server

> On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote:
> 
> So how to tackle the above error.

???

The two options to mitigate this, and their side effects, were explained to you. What more do you want?


----------------------------------------------------------------------
CONFIDENTIALITY NOTICE: If you have received this email in error, please immediately notify the sender by e-mail at the
addressshown. This email transmission may contain confidential information.  This information is intended only for the
useof the individual(s) or entity to whom it is intended even if addressed incorrectly.  Please delete it from your
filesif you are not the intended recipient.  Thank you for your compliance.  Copyright (c) 2024 Evernorth 

Re: Queries are failing on standby server

From
Fernando Hevia
Date:
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.
 
No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.

I definitely messed that one up. Thanks for the correction Laurenz.

Regards,
Fernando.


El vie, 26 jul 2024 a la(s) 2:08 a.m., Laurenz Albe (laurenz.albe@cybertec.at) escribió:
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:
> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
> > I think you might have misinterpreted the explanation given to you. The cancellation of the
> > query on the standby server isn't related to the load on the primary server. It happens that
> > when you run queries on a hot standby, the replication is temporarily paused in order to not
> > modify data the running queries on the standby server need.

Replication (applying the WAL information) is only paused if there is a conflict.
Even when replay is paused, the WAL is still replicated to the standby and piles up there.

> > Once the queries end, replication resumes.
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.

No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.

> > To avoid having a standby server lagging too far behind Postgres will cancel long running
> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum
> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to
> > allow for longer running queries on the standby server bearing in mind that you could end
> > up with data loss if the master fails at the wrong moment.

Yes, increasing "max_standby_streaming_delay" is the correct solution.
You can set it to -1 to prevent any queries on the standby from bein cancelled.

> > A working alternative is to have one standby server exclusively for replication purposes
> > and another standby for reporting/read-only queries where you can increase the
> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will
> > require additional computing and storage resources.

That is good advice.

> > >
> This is all true, but the hot_standby_feedback option is the way to get around needing to
> worry about replication delay all together.

No, because there are other kinds of replication conflicts.  The most frequent are:

- lock conflicts

  They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with
  a query on the standby.  The most frequent cause is VACUUM truncation (which can
  be disabled for individual tables).

- buffer pin conflicts

  It depends on the workload if you get them, but you cannot get rid of them.

Yours,
Laurenz Albe