Thread: hot_standby_feedback

hot_standby_feedback

From
Torsten Förtsch
Date:
Hi,

I am in the process of reviewing our configs for a number of 9.3 databases and found a replica with hot_standby_feedback=on. I remember when we set it long ago we were fighting cancelled queries. I also remember that it never really worked for us. In the end we set up 2 replicas, one suitable for short queries where we prefer low replication lag, and another one where we allow for long running queries but sacrifice timeliness (max_standby_*_delay=-1).

I have a hunch why hot_standby_feedback=on didn't work. But I never verified it. So, here it is. The key is this sentence:

"Feedback messages will not be sent more frequently than once per wal_receiver_status_interval."

That interval is 10 sec. So, assuming a transaction on the replica uses a row right after the message has been sent. Then there is a 10 sec window in which the master cannot know that the row is needed on the replica and can vacuum it. If then the transaction on the replica takes longer than max_standby_*_delay, the only option is to cancel it.

Is that explanation correct?

What is the correct way to use hot_standby_feedback to prevent cancellations reliably? (and accepting the bloat)

Thanks,
Torsten

Re: hot_standby_feedback

From
Andres Freund
Date:
On 2016-11-28 22:14:55 +0100, Torsten Förtsch wrote:
> Hi,
>
> I am in the process of reviewing our configs for a number of 9.3 databases
> and found a replica with hot_standby_feedback=on. I remember when we set it
> long ago we were fighting cancelled queries. I also remember that it never
> really worked for us. In the end we set up 2 replicas, one suitable for
> short queries where we prefer low replication lag, and another one where we
> allow for long running queries but sacrifice timeliness
> (max_standby_*_delay=-1).

There's a few kind of conflicts against which hs_feedback doesn't
protect. E.g. exclusive locks on tables that are in use and such
(e.g. by vacuum truncating a table or an explicit drop table).

There's a table with some information about the causes of cancellations,
pg_stat_database_conflicts - did you check that?

> I have a hunch why hot_standby_feedback=on didn't work. But I never
> verified it. So, here it is. The key is this sentence:
>
> "Feedback messages will not be sent more frequently than once per
> wal_receiver_status_interval."
>
> That interval is 10 sec. So, assuming a transaction on the replica uses a
> row right after the message has been sent. Then there is a 10 sec window in
> which the master cannot know that the row is needed on the replica and can
> vacuum it. If then the transaction on the replica takes longer than
> max_standby_*_delay, the only option is to cancel it.
>
> Is that explanation correct?

No. That just means that we don't update the value more frequently. The
value reported is a "horizon" meaning that nothing older than the
reported value can be accessed.

Greetings,

Andres Freund