Thread: Failing backups, canceling statement due to conflict with recovery
Hi. I'm unable to offload my backups to one of my PG 9.1 hot standbys using purely streaming replication. After a few hours, usually on the same large table, pg_dump is failing with 'ERROR: canceling statement due to conflict with recovery'. From my reading from the documentation, this should not be possible as my hot standby has 'hot_standby_feedback = on' in its postgresql.conf. I submitted a bug report a while ago, but am still unsure if this option has never worked as intended or if I'm misinterpreting the documentation. My goal is to reliably make pg_dump backups from a hot standby without the hot standby lagging far behind the master. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Re: Failing backups, canceling statement due to conflict with recovery
From
"René Romero Benavides"
Date:
What does the following query give you on the standby ? select * from pg_settings where category = 'Replication / Standby Servers'; On Wednesday, February 13, 2013 03:53:33 PM Stuart Bishop wrote: > 'hot_standby_feedback = on' -- postgresql.org.mx
On Wed, Feb 13, 2013 at 12:53 AM, Stuart Bishop <stuart@stuartbishop.net> wrote: > I'm unable to offload my backups to one of my PG 9.1 hot standbys > using purely streaming replication. After a few hours, usually on the > same large table, pg_dump is failing with 'ERROR: canceling statement > due to conflict with recovery'. > > From my reading from the documentation, this should not be possible as > my hot standby has 'hot_standby_feedback = on' in its postgresql.conf. hot_standby_feedback affects VACUUM only to prevent it from removing dead rows on master that might cause the cleanup conflict. It has no deal with other hard conflicts like in case of DROP TABLE etc. Yo could try increasing max_standby_streaming_delay that will "pause" applying WAL data on replica but it still does not guarantee successful result and moreover could cause lagging far behind that is not your goal. > My goal is to reliably make pg_dump backups from a hot standby without > the hot standby lagging far behind the master. Also slave does not guarantee that it is always up-to-date. There could be issues like network problems, etc. Personally I recommend to do pg_dump on master at least on <=9.2. -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
On Thu, Feb 14, 2013 at 5:20 AM, Ren=C3=A9 Romero Benavides <ichbinrene@gmail.com> wrote: > What does the following query give you on the standby ? > select * from pg_settings where category =3D 'Replication / Standby Serve= rs'; name | setting ------------------------------+--------- hot_standby | on hot_standby_feedback | on max_standby_archive_delay | 30000 max_standby_streaming_delay | 30000 wal_receiver_status_interval | 10 --=20 Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On Thu, Feb 14, 2013 at 7:21 AM, Sergey Konoplev <gray.ru@gmail.com> wrote: > On Wed, Feb 13, 2013 at 12:53 AM, Stuart Bishop <stuart@stuartbishop.net>= wrote: >> I'm unable to offload my backups to one of my PG 9.1 hot standbys >> using purely streaming replication. After a few hours, usually on the >> same large table, pg_dump is failing with 'ERROR: canceling statement >> due to conflict with recovery'. >> >> From my reading from the documentation, this should not be possible as >> my hot standby has 'hot_standby_feedback =3D on' in its postgresql.conf. > > hot_standby_feedback affects VACUUM only to prevent it from removing > dead rows on master that might cause the cleanup conflict. It has no > deal with other hard conflicts like in case of DROP TABLE etc. I can confirm that no DDL is being run (apart from temporary tables). I can also confirm that the replication connection does not drop out. I can't think of what else would be causing problems apart from vacuum, and I used vacuum to trigger the problem in the bug report test case ( http://postgresql.1045698.n5.nabble.com/BUG-7546-Backups-on-hot= -standby-cancelled-despite-hot-standby-on-td5724284.html ) Something that might be interesting that I neglected to mention, the DETAIL of the error message is random; on production my failures end up with one of these three: DETAIL: =C2=A0User query might have needed to see row versions that must be= removed. DETAIL: =C2=A0User was holding a relation lock for too long. DETAIL: User was holding shared buffer pin for too long. > Personally I recommend to do pg_dump on master at least on <=3D9.2. Anything in particular in 9.2? I've been seeing a lot of replication related fixes in 9.1 patch releases and had been planning on sticking with 9.1 for the next 18 months. I'm still unsure if this is supposed to work, and this is a bug in PostgreSQL or Ubuntu, or if I'm just misreading the documentation. --=20 Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On Wed, Feb 13, 2013 at 8:59 PM, Stuart Bishop <stuart@stuartbishop.net> wrote: > Something that might be interesting that I neglected to mention, the > DETAIL of the error message is random; on production my failures end > up with one of these three: > > DETAIL: User query might have needed to see row versions that must be removed. > DETAIL: User was holding a relation lock for too long. > DETAIL: User was holding shared buffer pin for too long. I think it can only be solved by increasing max_standby_streaming_delay or by setting it to -1. What about VACUUM from your test case. Probably it is not the matter of it, but the matter of what is happening in the connection. Try replace the VACUUM with SELECT pg_sleep(<some long period>) or may be start a transaction without/with query inside, or something else. Try to simulate different stuff from the activity that happens on your server to find out what causes which DETAILs. >> Personally I recommend to do pg_dump on master at least on <=9.2. > > Anything in particular in 9.2? I've been seeing a lot of replication > related fixes in 9.1 patch releases and had been planning on sticking > with 9.1 for the next 18 months. Nothing significant AFAIK. > I'm still unsure if this is supposed to work, and this is a bug in > PostgreSQL or Ubuntu, or if I'm just misreading the documentation. I would not say it is a bug. I think it just was not supposed to be a functionality of standby servers. -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com