Thread: hot_standby_feedback
Hi,
I am in the process of reviewing our configs for a number of 9.3 databases and found a replica with hot_standby_feedback=on. I remember when we set it long ago we were fighting cancelled queries. I also remember that it never really worked for us. In the end we set up 2 replicas, one suitable for short queries where we prefer low replication lag, and another one where we allow for long running queries but sacrifice timeliness (max_standby_*_delay=-1).
I have a hunch why hot_standby_feedback=on didn't work. But I never verified it. So, here it is. The key is this sentence:
"Feedback messages will not be sent more frequently than once per wal_receiver_status_interval."
That interval is 10 sec. So, assuming a transaction on the replica uses a row right after the message has been sent. Then there is a 10 sec window in which the master cannot know that the row is needed on the replica and can vacuum it. If then the transaction on the replica takes longer than max_standby_*_delay, the only option is to cancel it.
Is that explanation correct?
What is the correct way to use hot_standby_feedback to prevent cancellations reliably? (and accepting the bloat)
Thanks,
Torsten
On 2016-11-28 22:14:55 +0100, Torsten Förtsch wrote: > Hi, > > I am in the process of reviewing our configs for a number of 9.3 databases > and found a replica with hot_standby_feedback=on. I remember when we set it > long ago we were fighting cancelled queries. I also remember that it never > really worked for us. In the end we set up 2 replicas, one suitable for > short queries where we prefer low replication lag, and another one where we > allow for long running queries but sacrifice timeliness > (max_standby_*_delay=-1). There's a few kind of conflicts against which hs_feedback doesn't protect. E.g. exclusive locks on tables that are in use and such (e.g. by vacuum truncating a table or an explicit drop table). There's a table with some information about the causes of cancellations, pg_stat_database_conflicts - did you check that? > I have a hunch why hot_standby_feedback=on didn't work. But I never > verified it. So, here it is. The key is this sentence: > > "Feedback messages will not be sent more frequently than once per > wal_receiver_status_interval." > > That interval is 10 sec. So, assuming a transaction on the replica uses a > row right after the message has been sent. Then there is a 10 sec window in > which the master cannot know that the row is needed on the replica and can > vacuum it. If then the transaction on the replica takes longer than > max_standby_*_delay, the only option is to cancel it. > > Is that explanation correct? No. That just means that we don't update the value more frequently. The value reported is a "horizon" meaning that nothing older than the reported value can be accessed. Greetings, Andres Freund