Thread: tx canceled on standby despite infinite max_standby_streaming_delay

tx canceled on standby despite infinite max_standby_streaming_delay

From
Jay Howard
Date:
I'm seeing long-running transactions (pg_dump) canceled on the standby when there are a lot of inserts happening on the master.  This despite my having set max_standby_streaming_delay to -1 on the standby.

Why might that happen?

This is pg 9.3.12.  When it happens I see:

pg_dump: Dumping the contents of table "TABLE" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
pg_dump: The command was: COPY public.TABLE (COLUMNS) TO stdout;

Re: tx canceled on standby despite infinite max_standby_streaming_delay

From
Venkata Balaji N
Date:

On Sat, May 14, 2016 at 12:27 PM, Jay Howard <jhoward@alumni.utexas.net> wrote:
I'm seeing long-running transactions (pg_dump) canceled on the standby when there are a lot of inserts happening on the master.  This despite my having set max_standby_streaming_delay to -1 on the standby.

Do you have hot_standby_feedback set to "on" ? 

What is the parameter  max_standby_archive_delay configured to ? This will pause WAL archives from being applied when queries are executed on the standby database.
 
Why might that happen?

This is pg 9.3.12.  When it happens I see:

pg_dump: Dumping the contents of table "TABLE" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
pg_dump: The command was: COPY public.TABLE (COLUMNS) TO stdout;

I suspect this is due to the clean up by VACUUM on primary.

Regards,
Venkata B N

Fujitsu Australia
 

Do you have hot_standby_feedback set to "on" ? 

It was off.  Will research that.  Thank you!

What is the parameter  max_standby_archive_delay configured to ? This will pause WAL archives from being applied when queries are executed on the standby database.

It's set to the default, which is 30 seconds.  For some reason I thought setting "max_standby_streaming_delay" to -1 would be sufficient.

At a high level, what's the difference between the "archive_delay" and "streaming_delay"?  I will read up on streaming replication in the mean time.
 



On Sat, May 14, 2016 at 8:20 PM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Sat, May 14, 2016 at 12:27 PM, Jay Howard <jhoward@alumni.utexas.net> wrote:
I'm seeing long-running transactions (pg_dump) canceled on the standby when there are a lot of inserts happening on the master.  This despite my having set max_standby_streaming_delay to -1 on the standby.

Do you have hot_standby_feedback set to "on" ? 

What is the parameter  max_standby_archive_delay configured to ? This will pause WAL archives from being applied when queries are executed on the standby database.
 
Why might that happen?

This is pg 9.3.12.  When it happens I see:

pg_dump: Dumping the contents of table "TABLE" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
pg_dump: The command was: COPY public.TABLE (COLUMNS) TO stdout;

I suspect this is due to the clean up by VACUUM on primary.

Regards,
Venkata B N

Fujitsu Australia
 


Re: tx canceled on standby despite infinite max_standby_streaming_delay

From
"David G. Johnston"
Date:
Its customary to bottom-post (or respond inline) on these lists.

On Sun, May 15, 2016 at 7:01 PM, Jay Howard <jhoward@alumni.utexas.net> wrote:
Do you have hot_standby_feedback set to "on" ? 

It was off.  Will research that.  Thank you!

What is the parameter  max_standby_archive_delay configured to ? This will pause WAL archives from being applied when queries are executed on the standby database.

It's set to the default, which is 30 seconds.  For some reason I thought setting "max_standby_streaming_delay" to -1 would be sufficient.


​At minimum I think there is room for improvement in the documentation here since I spent probably a good 15-20 minutes trying to find an answer related to either vacuum or WAL accumulation and could not discover anything that directly permitted your situation to occur.​

At a high level, what's the difference between the "archive_delay" and "streaming_delay"?  I will read up on streaming replication in the mean time.

 
​"""
 When a conflicting query is short, it's typically desirable to allow it to complete by delaying WAL application for a little bit; but a long delay in WAL application is usually not desirable. So the cancel mechanism has parameters, max_standby_archive_delay and max_standby_streaming_delay, that define the maximum allowed delay in WAL application. Conflicting queries will be canceled once it has taken longer than the relevant delay setting to apply any newly-received WAL data. There are two parameters so that different delay values can be specified for the case of reading WAL data from an archive (i.e., initial recovery from a base backup or "catching up" a standby server that has fallen far behind) versus reading WAL data via streaming replication.
​"""

​David J.


On Sun, May 15, 2016 at 6:15 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
Its customary to bottom-post (or respond inline) on these lists.

On Sun, May 15, 2016 at 7:01 PM, Jay Howard <jhoward@alumni.utexas.net> wrote:
Do you have hot_standby_feedback set to "on" ? 

It was off.  Will research that.  Thank you!

What is the parameter  max_standby_archive_delay configured to ? This will pause WAL archives from being applied when queries are executed on the standby database.

It's set to the default, which is 30 seconds.  For some reason I thought setting "max_standby_streaming_delay" to -1 would be sufficient.


​At minimum I think there is room for improvement in the documentation here since I spent probably a good 15-20 minutes trying to find an answer related to either vacuum or WAL accumulation and could not discover anything that directly permitted your situation to occur.​

At a high level, what's the difference between the "archive_delay" and "streaming_delay"?  I will read up on streaming replication in the mean time.

 
​"""
 When a conflicting query is short, it's typically desirable to allow it to complete by delaying WAL application for a little bit; but a long delay in WAL application is usually not desirable. So the cancel mechanism has parameters, max_standby_archive_delay and max_standby_streaming_delay, that define the maximum allowed delay in WAL application. Conflicting queries will be canceled once it has taken longer than the relevant delay setting to apply any newly-received WAL data. There are two parameters so that different delay values can be specified for the case of reading WAL data from an archive (i.e., initial recovery from a base backup or "catching up" a standby server that has fallen far behind) versus reading WAL data via streaming replication.
​"""

​David J.



W.r.t. improving the documentation, what I didn't realize is that WAL files can be placed in the archive (making "max_standby_archive_delay" relevant) in situations other than just "initial recovery from a base backup".

In my case, there was some heavy-duty stuff happening on the master that caused the standby to get sufficiently far behind that WAL files were "archived".  Then, while they were being applied from the archive, my pg_dump ran on the standby.