Thread: Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
From
Stuart Bishop
Date:
On Wed, Sep 19, 2012 at 5:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Sep 17, 2012 at 7:28 PM, <stuart@stuartbishop.net> wrote: >> The following bug has been logged on the website: >> >> Bug reference: 7546 >> Logged by: Stuart Bishop >> Email address: stuart@stuartbishop.net >> PostgreSQL version: 9.1.5 >> Operating system: Ubuntu 12.10 >> Description: >> >> I have a primary and a hot standby using streaming replication. The hot >> standby specifies 'hot_standby_feedback=on' with other replication settings >> set to default. >> >> If a vacuum occurs on the primary while pg_dump is dumping a large table, >> the pg_dump is cancelled, usually with the following error: >> >> ERROR: canceling statement due to conflict with recovery >> DETAIL: User was holding shared buffer pin for too long. > > Maybe I'm missing something, but ISTM that hot_standby_feedback doesn't > eliminate the query cancels caused by buffer pin lock which you encountered. > It eliminates only the query cancels caused by cleanup of rows. So you might > need to set max_standby_streaming_delay to -1, to avoid query cancels. I have also seen the following (but only on production, not with my test case): ERROR: canceling statement due to conflict with recovery DETAIL: User was holding a relation lock for too long. If you are correct, it seems impossible with 9.1 to have reliable long transactions on a hot standby without allowing the hot standby to lag behind by the length of the longest transaction. I was hoping when the docs state "this parameter can be used to eliminate query cancels caused by cleanup records" that it would cover all the background maintenance. Disabling autovacuum during the backup window would be one work around. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
From
Andres Freund
Date:
On Wednesday, September 19, 2012 10:06:03 AM Stuart Bishop wrote: > On Wed, Sep 19, 2012 at 5:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Mon, Sep 17, 2012 at 7:28 PM, <stuart@stuartbishop.net> wrote: > >> The following bug has been logged on the website: > >> > >> Bug reference: 7546 > >> Logged by: Stuart Bishop > >> Email address: stuart@stuartbishop.net > >> PostgreSQL version: 9.1.5 > >> Operating system: Ubuntu 12.10 > >> Description: > >> > >> I have a primary and a hot standby using streaming replication. The hot > >> standby specifies 'hot_standby_feedback=on' with other replication > >> settings set to default. > >> > >> If a vacuum occurs on the primary while pg_dump is dumping a large > >> table, the pg_dump is cancelled, usually with the following error: > >> > >> ERROR: canceling statement due to conflict with recovery > >> DETAIL: User was holding shared buffer pin for too long. > > > > Maybe I'm missing something, but ISTM that hot_standby_feedback doesn't > > eliminate the query cancels caused by buffer pin lock which you > > encountered. It eliminates only the query cancels caused by cleanup of > > rows. So you might need to set max_standby_streaming_delay to -1, to > > avoid query cancels. I think the above is just a case of the backend waiting a long time to send data out due to your rate limiting. Why it does that holding a buffer pin may be worth investigating... > I have also seen the following (but only on production, not with my test > case): > > ERROR: canceling statement due to conflict with recovery > DETAIL: User was holding a relation lock for too long. > > If you are correct, it seems impossible with 9.1 to have reliable long > transactions on a hot standby without allowing the hot standby to lag > behind by the length of the longest transaction. Uh. I don't think thats a valid chain of thought, especially when it comes due to conflicts on relation locks. Those primarily (only?) happen if you do perform an action on the primary that requires an exlusive lock. Like ALTER TABLE, TRUNCATE, CLUSTER, LOCK TABLE, ... There is not really much the standby can do if gets a request for an exlusive lock other than either cancelling everything that blocks acquiring such a lock or waiting. Is it possible that you perform such commands on tables used on the standby? > I was hoping when the docs state "this parameter can be used to > eliminate query cancels caused by cleanup records" that it would cover > all the background maintenance. Relation lock conflicts shouldn't be caused by background maintenance. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
From
Stuart Bishop
Date:
On Wed, Sep 19, 2012 at 3:38 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> >> ERROR: canceling statement due to conflict with recovery >> >> DETAIL: User was holding shared buffer pin for too long. >> > Maybe I'm missing something, but ISTM that hot_standby_feedback doesn't >> > eliminate the query cancels caused by buffer pin lock which you >> > encountered. It eliminates only the query cancels caused by cleanup of >> > rows. So you might need to set max_standby_streaming_delay to -1, to >> > avoid query cancels. > I think the above is just a case of the backend waiting a long time to send > data out due to your rate limiting. Why it does that holding a buffer pin may > be worth investigating... I put the rate limiting in there to trigger the problem consistently with a small data set. On the main production server I see this problem, it normally fails dumping a table with nearly 2 billion rows to local disk (pg_dump -Fc). The trick seems to be that vacuum needs to start running on the primary after the dump starts on the hot standby, and the vacuum complete before the dump finishes. >> I have also seen the following (but only on production, not with my test >> case): >> >> ERROR: canceling statement due to conflict with recovery >> DETAIL: User was holding a relation lock for too long. >> >> If you are correct, it seems impossible with 9.1 to have reliable long >> transactions on a hot standby without allowing the hot standby to lag >> behind by the length of the longest transaction. > Is it possible that you perform such commands on tables used on the standby? Yes, it is quite possible CREATE INDEX CONCURRENTLY was run on the primary during the backup window. I can't discount other commands either, but if that is the case one of our ops needs a spanking. I've only been able to reproduce the buffer pin error in isolation. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
From
Andres Freund
Date:
On Wednesday, September 19, 2012 11:33:29 AM Stuart Bishop wrote: > On Wed, Sep 19, 2012 at 3:38 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> >> ERROR: canceling statement due to conflict with recovery > >> >> DETAIL: User was holding shared buffer pin for too long. > >> > > >> > Maybe I'm missing something, but ISTM that hot_standby_feedback > >> > doesn't eliminate the query cancels caused by buffer pin lock which > >> > you encountered. It eliminates only the query cancels caused by > >> > cleanup of rows. So you might need to set max_standby_streaming_delay > >> > to -1, to avoid query cancels. > > > > I think the above is just a case of the backend waiting a long time to > > send data out due to your rate limiting. Why it does that holding a > > buffer pin may be worth investigating... > > I put the rate limiting in there to trigger the problem consistently > with a small data set. On the main production server I see this > problem, it normally fails dumping a table with nearly 2 billion rows > to local disk (pg_dump -Fc). The trick seems to be that vacuum needs > to start running on the primary after the dump starts on the hot > standby, and the vacuum complete before the dump finishes. You see buffer pin errors or you see the relation errors? Those arent the same. So if you see no buffer pin errors but relation lock ones in production your way to reproduce the issue isn't telling us very much ;) > >> I have also seen the following (but only on production, not with my test > >> case): > >> > >> ERROR: canceling statement due to conflict with recovery > >> DETAIL: User was holding a relation lock for too long. > >> > >> If you are correct, it seems impossible with 9.1 to have reliable long > >> transactions on a hot standby without allowing the hot standby to lag > >> behind by the length of the longest transaction. > > > > Is it possible that you perform such commands on tables used on the > > standby? > > Yes, it is quite possible CREATE INDEX CONCURRENTLY was run on the > primary during the backup window. I can't discount other commands > either, but if that is the case one of our ops needs a spanking. I've > only been able to reproduce the buffer pin error in isolation. Do you have DDL logging enabled so you could investigate that? I really wonder why we don't bother logging the dbOid/relOid pair before we cancel transactions due to lock conflicts. That should make investigating such issues considerably easier... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
From
Stuart Bishop
Date:
On Wed, Sep 19, 2012 at 6:56 PM, Andres Freund <andres@2ndquadrant.com> wrote: > You see buffer pin errors or you see the relation errors? Those arent the same. > So if you see no buffer pin errors but relation lock ones in production your > way to reproduce the issue isn't telling us very much ;) On production I have seen both. I have only been able to confirm and isolate the trigger of the buffer pin errors, and thus this bug report. I mentioned the rarer relation lock error just in case it was relevant. > Do you have DDL logging enabled so you could investigate that? > > I really wonder why we don't bother logging the dbOid/relOid pair before we > cancel transactions due to lock conflicts. That should make investigating such > issues considerably easier... I do have DDL logging enabled. I've waded through a two hour period on the primary where the problem must have occurred and can see no DDL except for the creation and dropping of temporary tables. My parser is unfortunately grep and my eyeballs to filter out the noise. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
From
Stuart Bishop
Date:
Yesterday's dump failed with a new error. pg_stat_replication tells me replication has been connected for nearly a month, so no network issues. I've confirmed hot_standby_feedback is still set to 'on': ERROR: canceling statement due to conflict with recovery DETAIL: User query might have needed to see row versions that must be removed. As before, the only error I've been able to reproduce in isolation is the buffer pin error I originally reported. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/