Thread: Why is hot_standby_feedback off by default?

Why is hot_standby_feedback off by default?

From
sirisha chamarthi
Date:
Hi Hackers,

Is there any specific reason hot_standby_feedback default is set to off? I see some explanation in the thread [1] about recovery_min_apply_delay value > 0 causing table bloat. However, recovery_min_apply_delay is set to 0 by default. So, if a server admin wants to change this value, they can change hot_standby_feedback as well if needed right?

Thanks!


Re: Why is hot_standby_feedback off by default?

From
Vik Fearing
Date:
On 10/22/23 09:50, sirisha chamarthi wrote:
> Is there any specific reason hot_standby_feedback default is set to off?


Yes.  No one wants a rogue standby to ruin production.
-- 
Vik Fearing




Re: Why is hot_standby_feedback off by default?

From
Andres Freund
Date:
Hi,

On October 22, 2023 4:56:15 AM PDT, Vik Fearing <vik@postgresfriends.org> wrote:
>On 10/22/23 09:50, sirisha chamarthi wrote:
>> Is there any specific reason hot_standby_feedback default is set to off?
>
>
>Yes.  No one wants a rogue standby to ruin production.

Medium term, I think we need an approximate xid->"time of assignment" mapping that's continually maintained on the primary. One of the things that'd show us to do is introduce a GUC to control the maximum effect of hs_feedback  on the primary, in a useful unit. Numbers of xids are not a useful unit (100k xids is forever on some systems, a few minutes at best on others, the rate is not necessarily that steady when plpgsql exception handles are used, ...)

It'd be useful to have such a mapping for other features too. E.g.

- making it visible in pg_stat _activity how problematic a longrunning xact is - a 3 day old xact that doesn't have an xid assigned and has a recent xmin is fine, it won't prevent vacuum from doing things. But a somewhat recent xact that still has a snapshot from before an old xact was cancelled could be problematic.

- turn pg_class.relfrozenxid into an understandable timeframe. It's a fair bit of mental effort to classify "370M xids old" into problem/fine (it's e.g. not a problem on a system with a high xid rate, on a big table that takes a bit to a bit to vacuum).

- using the mapping to compute an xid consumption rate IMO would be one building block for smarter AV scheduling. Together with historical vacuum runtimes it'd allow us to start vacuuming early enough to prevent hitting thresholds, adapt pacing, prioritize between tables etc.

Greetings,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Why is hot_standby_feedback off by default?

From
sirisha chamarthi
Date:
Hi Andres,

On Sun, Oct 22, 2023 at 12:08 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On October 22, 2023 4:56:15 AM PDT, Vik Fearing <vik@postgresfriends.org> wrote:
>On 10/22/23 09:50, sirisha chamarthi wrote:
>> Is there any specific reason hot_standby_feedback default is set to off?
>
>
>Yes.  No one wants a rogue standby to ruin production.

Medium term, I think we need an approximate xid->"time of assignment" mapping that's continually maintained on the primary. One of the things that'd show us to do is introduce a GUC to control the maximum effect of hs_feedback  on the primary, in a useful unit. Numbers of xids are not a useful unit (100k xids is forever on some systems, a few minutes at best on others, the rate is not necessarily that steady when plpgsql exception handles are used, ...)

+1 on this idea. Please let me give this a try.
 
Thanks,
Sirisha

Re: Why is hot_standby_feedback off by default?

From
sirisha chamarthi
Date:
On Sun, Oct 22, 2023 at 4:56 AM Vik Fearing <vik@postgresfriends.org> wrote:
On 10/22/23 09:50, sirisha chamarthi wrote:
> Is there any specific reason hot_standby_feedback default is set to off?


Yes.  No one wants a rogue standby to ruin production.

Agreed. I believe that any reasonable use of a standby server for queries requires hot_standby_feedback to be turned on. Otherwise, we can potentially see query cancellations, increased replication lag because of conflicts (while replaying vacuum cleanup records) on standby (resulting in longer failover times if the server is configured for disaster recovery + read scaling). Recent logical decoding on standby as well requires hot_standby_feedback to be turned on to avoid slot invalidation [1]. If there is no requirement to query the standby, admins can always  set hot_standby to off. My goal here is to minimize the amount of configuration tuning required to use these features.


Thanks,
Sirisha

Re: Why is hot_standby_feedback off by default?

From
Vik Fearing
Date:
On 10/23/23 04:02, sirisha chamarthi wrote:
> On Sun, Oct 22, 2023 at 4:56 AM Vik Fearing <vik@postgresfriends.org> wrote:
> 
>> On 10/22/23 09:50, sirisha chamarthi wrote:
>>> Is there any specific reason hot_standby_feedback default is set to off?
>>
>>
>> Yes.  No one wants a rogue standby to ruin production.
>>
> 
> Agreed.


Okay...


> I believe that any reasonable use of a standby server for queries
> requires hot_standby_feedback to be turned on. Otherwise, we can
> potentially see query cancellations, increased replication lag because of
> conflicts (while replaying vacuum cleanup records) on standby (resulting in
> longer failover times if the server is configured for disaster recovery +
> read scaling). Recent logical decoding on standby as well requires
> hot_standby_feedback to be turned on to avoid slot invalidation [1]. If
> there is no requirement to query the standby, admins can always  set
> hot_standby to off. My goal here is to minimize the amount of configuration
> tuning required to use these features.
> 
> [1]:
> https://www.postgresql.org/docs/current/logicaldecoding-explanation.html


This does not sound like you agree.
-- 
Vik Fearing




Re: Why is hot_standby_feedback off by default?

From
Tom Lane
Date:
sirisha chamarthi <sirichamarthi22@gmail.com> writes:
> I believe that any reasonable use of a standby server for queries
> requires hot_standby_feedback to be turned on.

The fact that it's not the default should suggest to you that that's
not the majority opinion.

            regards, tom lane



Re: Why is hot_standby_feedback off by default?

From
Nathan Bossart
Date:
On Sun, Oct 22, 2023 at 12:07:59PM -0700, Andres Freund wrote:
> Medium term, I think we need an approximate xid->"time of assignment" mapping that's continually maintained on the
primary.One of the things that'd show us to do is introduce a GUC to control the maximum effect of hs_feedback  on the
primary,in a useful unit. Numbers of xids are not a useful unit (100k xids is forever on some systems, a few minutes at
beston others, the rate is not necessarily that steady when plpgsql exception handles are used, ...)
 
> 
> It'd be useful to have such a mapping for other features too. E.g.
> 
>  - making it visible in pg_stat _activity how problematic a longrunning xact is - a 3 day old xact that doesn't have
anxid assigned and has a recent xmin is fine, it won't prevent vacuum from doing things. But a somewhat recent xact
thatstill has a snapshot from before an old xact was cancelled could be problematic.
 
> 
> - turn pg_class.relfrozenxid into an understandable timeframe. It's a fair bit of mental effort to classify "370M
xidsold" into problem/fine (it's e.g. not a problem on a system with a high xid rate, on a big table that takes a bit
toa bit to vacuum).
 
> 
> - using the mapping to compute an xid consumption rate IMO would be one building block for smarter AV scheduling.
Togetherwith historical vacuum runtimes it'd allow us to start vacuuming early enough to prevent hitting thresholds,
adaptpacing, prioritize between tables etc. 
 

Big +1 to all of this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Why is hot_standby_feedback off by default?

From
John Naylor
Date:
On Tue, Oct 24, 2023 at 3:42 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Sun, Oct 22, 2023 at 12:07:59PM -0700, Andres Freund wrote:
> > Medium term, I think we need an approximate xid->"time of assignment" mapping that's continually maintained on the
primary.One of the things that'd show us to do is introduce a GUC to control the maximum effect of hs_feedback  on the
primary,in a useful unit. Numbers of xids are not a useful unit (100k xids is forever on some systems, a few minutes at
beston others, the rate is not necessarily that steady when plpgsql exception handles are used, ...) 
> >
> > It'd be useful to have such a mapping for other features too. E.g.
> >
> >  - making it visible in pg_stat _activity how problematic a longrunning xact is - a 3 day old xact that doesn't
havean xid assigned and has a recent xmin is fine, it won't prevent vacuum from doing things. But a somewhat recent
xactthat still has a snapshot from before an old xact was cancelled could be problematic. 
> >
> > - turn pg_class.relfrozenxid into an understandable timeframe. It's a fair bit of mental effort to classify "370M
xidsold" into problem/fine (it's e.g. not a problem on a system with a high xid rate, on a big table that takes a bit
toa bit to vacuum). 
> >
> > - using the mapping to compute an xid consumption rate IMO would be one building block for smarter AV scheduling.
Togetherwith historical vacuum runtimes it'd allow us to start vacuuming early enough to prevent hitting thresholds,
adaptpacing, prioritize between tables etc. 
>
> Big +1 to all of this.

Sounds like a TODO?



Re: Why is hot_standby_feedback off by default?

From
Andres Freund
Date:
On 2023-11-20 16:34:47 +0700, John Naylor wrote:
> Sounds like a TODO?

WFM. I don't personally use or update TODO, as I have my doubts about its
usefulness or state of maintenance. But please feel free to add this as a TODO
from my end...



Re: Why is hot_standby_feedback off by default?

From
John Naylor
Date:
On Tue, Nov 21, 2023 at 6:49 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-11-20 16:34:47 +0700, John Naylor wrote:
> > Sounds like a TODO?
>
> WFM. I don't personally use or update TODO, as I have my doubts about its
> usefulness or state of maintenance. But please feel free to add this as a TODO
> from my end...

Yeah, I was hoping to change that, but it's been a long row to hoe.
Anyway, the above idea was added added under "administration".