Thread: Failing backups, canceling statement due to conflict with recovery

Failing backups, canceling statement due to conflict with recovery

From
Stuart Bishop
Date:
Hi.

I'm unable to offload my backups to one of my PG 9.1 hot standbys
using purely streaming replication. After a few hours, usually on the
same large table, pg_dump is failing with 'ERROR:  canceling statement
due to conflict with recovery'.

From my reading from the documentation, this should not be possible as
my hot standby has 'hot_standby_feedback = on' in its postgresql.conf.

I submitted a bug report a while ago, but am still unsure if this
option has never worked as intended or if I'm misinterpreting the
documentation.

My goal is to reliably make pg_dump backups from a hot standby without
the hot standby lagging far behind the master.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: Failing backups, canceling statement due to conflict with recovery

From
"René Romero Benavides"
Date:
What does the following query give you on the standby ?
select * from pg_settings where category = 'Replication / Standby Servers';


On Wednesday, February 13, 2013 03:53:33 PM Stuart Bishop wrote:
> 'hot_standby_feedback = on'
--
postgresql.org.mx



Re: Failing backups, canceling statement due to conflict with recovery

From
Sergey Konoplev
Date:
On Wed, Feb 13, 2013 at 12:53 AM, Stuart Bishop <stuart@stuartbishop.net> wrote:
> I'm unable to offload my backups to one of my PG 9.1 hot standbys
> using purely streaming replication. After a few hours, usually on the
> same large table, pg_dump is failing with 'ERROR:  canceling statement
> due to conflict with recovery'.
>
> From my reading from the documentation, this should not be possible as
> my hot standby has 'hot_standby_feedback = on' in its postgresql.conf.

hot_standby_feedback affects VACUUM only to prevent it from removing
dead rows on master that might cause the cleanup conflict. It has no
deal with other hard conflicts like in case of DROP TABLE etc.

Yo could try increasing max_standby_streaming_delay that will "pause"
applying WAL data on replica but it still does not guarantee
successful result and moreover could cause lagging far behind that is
not your goal.

> My goal is to reliably make pg_dump backups from a hot standby without
> the hot standby lagging far behind the master.

Also slave does not guarantee that it is always up-to-date. There
could be issues like network problems, etc.

Personally I recommend to do pg_dump on master at least on <=9.2.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Failing backups, canceling statement due to conflict with recovery

From
Stuart Bishop
Date:
On Thu, Feb 14, 2013 at 5:20 AM, Ren=C3=A9 Romero Benavides
<ichbinrene@gmail.com> wrote:
> What does the following query give you on the standby ?
> select * from pg_settings where category =3D 'Replication / Standby Serve=
rs';

             name             | setting
------------------------------+---------
 hot_standby                  | on
 hot_standby_feedback         | on
 max_standby_archive_delay    | 30000
 max_standby_streaming_delay  | 30000
 wal_receiver_status_interval | 10


--=20
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: Failing backups, canceling statement due to conflict with recovery

From
Stuart Bishop
Date:
On Thu, Feb 14, 2013 at 7:21 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> On Wed, Feb 13, 2013 at 12:53 AM, Stuart Bishop <stuart@stuartbishop.net>=
 wrote:
>> I'm unable to offload my backups to one of my PG 9.1 hot standbys
>> using purely streaming replication. After a few hours, usually on the
>> same large table, pg_dump is failing with 'ERROR:  canceling statement
>> due to conflict with recovery'.
>>
>> From my reading from the documentation, this should not be possible as
>> my hot standby has 'hot_standby_feedback =3D on' in its postgresql.conf.
>
> hot_standby_feedback affects VACUUM only to prevent it from removing
> dead rows on master that might cause the cleanup conflict. It has no
> deal with other hard conflicts like in case of DROP TABLE etc.

I can confirm that no DDL is being run (apart from temporary tables).
I can also confirm that the replication connection does not drop out.
I can't think of what else would be causing problems apart from
vacuum, and I used vacuum to trigger the problem in the bug report
test case ( http://postgresql.1045698.n5.nabble.com/BUG-7546-Backups-on-hot=
-standby-cancelled-despite-hot-standby-on-td5724284.html
)

Something that might be interesting that I neglected to mention, the
DETAIL of the error message is random; on production my failures end
up with one of these three:

DETAIL: =C2=A0User query might have needed to see row versions that must be=
 removed.
DETAIL: =C2=A0User was holding a relation lock for too long.
DETAIL:  User was holding shared buffer pin for too long.

> Personally I recommend to do pg_dump on master at least on <=3D9.2.

Anything in particular in 9.2? I've been seeing a lot of replication
related fixes in 9.1 patch releases and had been planning on sticking
with 9.1 for the next 18 months.

I'm still unsure if this is supposed to work, and this is a bug in
PostgreSQL or Ubuntu, or if I'm just misreading the documentation.

--=20
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: Failing backups, canceling statement due to conflict with recovery

From
Sergey Konoplev
Date:
On Wed, Feb 13, 2013 at 8:59 PM, Stuart Bishop <stuart@stuartbishop.net> wrote:
> Something that might be interesting that I neglected to mention, the
> DETAIL of the error message is random; on production my failures end
> up with one of these three:
>
> DETAIL:  User query might have needed to see row versions that must be removed.
> DETAIL:  User was holding a relation lock for too long.
> DETAIL:  User was holding shared buffer pin for too long.

I think it can only be solved by increasing
max_standby_streaming_delay or by setting it to -1.

What about VACUUM from your test case. Probably it is not the matter
of it, but the matter of what is happening in the connection. Try
replace the VACUUM with SELECT pg_sleep(<some long period>) or may be
start a transaction without/with query inside, or something else. Try
to simulate different stuff from the activity that happens on your
server to find out what causes which DETAILs.

>> Personally I recommend to do pg_dump on master at least on <=9.2.
>
> Anything in particular in 9.2? I've been seeing a lot of replication
> related fixes in 9.1 patch releases and had been planning on sticking
> with 9.1 for the next 18 months.

Nothing significant AFAIK.

> I'm still unsure if this is supposed to work, and this is a bug in
> PostgreSQL or Ubuntu, or if I'm just misreading the documentation.

I would not say it is a bug. I think it just was not supposed to be a
functionality of standby servers.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray.ru@gmail.com