Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
Date
Msg-id 201209191356.12176.andres@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on  (Stuart Bishop <stuart@stuartbishop.net>)
Responses Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on  (Stuart Bishop <stuart@stuartbishop.net>)
List pgsql-bugs
On Wednesday, September 19, 2012 11:33:29 AM Stuart Bishop wrote:
> On Wed, Sep 19, 2012 at 3:38 PM, Andres Freund <andres@2ndquadrant.com>
wrote:
> >> >> ERROR:  canceling statement due to conflict with recovery
> >> >> DETAIL:  User was holding shared buffer pin for too long.
> >> >
> >> > Maybe I'm missing something, but ISTM that hot_standby_feedback
> >> > doesn't eliminate the query cancels caused by buffer pin lock which
> >> > you encountered. It eliminates only the query cancels caused by
> >> > cleanup of rows. So you might need to set max_standby_streaming_delay
> >> > to -1, to avoid query cancels.
> >
> > I think the above is just a case of the backend waiting a long time to
> > send data out due to your rate limiting. Why it does that holding a
> > buffer pin may be worth investigating...
>
> I put the rate limiting in there to trigger the problem consistently
> with a small data set. On the main production server I see this
> problem, it normally fails dumping a table with nearly 2 billion rows
> to local disk (pg_dump -Fc). The trick seems to be that vacuum needs
> to start running on the primary after the dump starts on the hot
> standby, and the vacuum complete before the dump finishes.
You see buffer pin errors or you see the relation errors? Those arent the same.
So if you see no buffer pin errors but relation lock ones in production your
way to reproduce the issue isn't telling us very much ;)

> >> I have also seen the following (but only on production, not with my test
> >> case):
> >>
> >> ERROR:  canceling statement due to conflict with recovery
> >> DETAIL:  User was holding a relation lock for too long.
> >>
> >> If you are correct, it seems impossible with 9.1 to have reliable long
> >> transactions on a hot standby without allowing the hot standby to lag
> >> behind by the length of the longest transaction.
> >
> > Is it possible that you perform such commands on tables used on the
> > standby?
>
> Yes, it is quite possible CREATE INDEX CONCURRENTLY was run on the
> primary during the backup window. I can't discount other commands
> either, but if that is the case one of our ops needs a spanking. I've
> only been able to reproduce the buffer pin error in isolation.
Do you have DDL logging enabled so you could investigate that?

I really wonder why we don't bother logging the dbOid/relOid pair before we
cancel transactions due to lock conflicts. That should make investigating such
issues considerably easier...

Greetings,

Andres
--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: Stuart Bishop
Date:
Subject: Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
Next
From: Stuart Bishop
Date:
Subject: Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on