Thread: The suggestion of reducing autovacuum_vacuum_cost_delay should be documented

The suggestion of reducing autovacuum_vacuum_cost_delay should be documented

From
PG Doc comments form
Date:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/11/runtime-config-autovacuum.html
Description:

The `autovacuum_vacuum_cost_delay` setting changes to 2ms in PostgreSQL 12
but in the old Postgresql version, the default setting is still 20ms. I
would suggest adding a suggestion in the old document
to lower the autovacuum_vacuum_cost_delay as:

> The default value of autovacuum_vacuum_cost_delay is reduced to 2ms in
PostgreSQL 12. Reducing the autovacuum_vacuum_cost_delay will make the
autovacuum more aggressive and might reduce the vacuum cost for
write-intensive workload on big table.

Detail:

I run the PostgreSQL 11 on a table with 1 million record under a
write-instensive workload by sysbench. The result shows that when the
autovacuum_vacuum_cost_delay is 20ms, the automatic vacuum takes 15 seconds.
If I reduce the autovacuum_vacuum_cost_delay to 2ms, the automatic vacuum
just uses 3 seconds. 

When autovacuum_vacuum_cost_delay = 20ms
LOG:  automatic vacuum of table "test.public.sbtest1": index scans: 1
        pages: 0 removed, 27172 remain, 0 skipped due to pins, 0 skipped
frozen
        tuples: 2787 removed, 993634 remain, 387 are dead but not yet
removable, oldest xmin: 1358042
        buffer usage: 21557 hits, 9160 misses, 1262 dirtied
        avg read rate: 4.783 MB/s, avg write rate: 0.659 MB/s
        system usage: CPU: user: 1.21 s, system: 0.04 s, elapsed: 14.96 s

When autovacuum_vacuum_cost_delay = 2ms
LOG:  automatic vacuum of table "test.public.sbtest1": index scans: 1
        pages: 0 removed, 27172 remain, 0 skipped due to pins, 0 skipped
frozen
        tuples: 2817 removed, 994303 remain, 85 are dead but not yet
removable, oldest xmin: 1329367
        buffer usage: 23129 hits, 6987 misses, 3873 dirtied
        avg read rate: 20.532 MB/s, avg write rate: 11.381 MB/s
        system usage: CPU: user: 0.92 s, system: 0.02 s, elapsed: 2.65 s

On Tue, May 12, 2020 at 10:26:27AM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/11/runtime-config-autovacuum.html
> Description:
> 
> The `autovacuum_vacuum_cost_delay` setting changes to 2ms in PostgreSQL 12
> but in the old Postgresql version, the default setting is still 20ms. I
> would suggest adding a suggestion in the old document
> to lower the autovacuum_vacuum_cost_delay as:
> 
> > The default value of autovacuum_vacuum_cost_delay is reduced to 2ms in
> PostgreSQL 12. Reducing the autovacuum_vacuum_cost_delay will make the
> autovacuum more aggressive and might reduce the vacuum cost for
> write-intensive workload on big table.

Uh, we usually don't suggest new defaults in back branches.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On Tue, May 12, 2020 at 05:40:14PM -0400, Bruce Momjian wrote:
> On Tue, May 12, 2020 at 10:26:27AM +0000, PG Doc comments form wrote:
> > The following documentation comment has been logged on the website:
> > 
> > Page: https://www.postgresql.org/docs/11/runtime-config-autovacuum.html
> > Description:
> > 
> > The `autovacuum_vacuum_cost_delay` setting changes to 2ms in PostgreSQL 12
> > but in the old Postgresql version, the default setting is still 20ms. I
> > would suggest adding a suggestion in the old document
> > to lower the autovacuum_vacuum_cost_delay as:
> > 
> > > The default value of autovacuum_vacuum_cost_delay is reduced to 2ms in
> > PostgreSQL 12. Reducing the autovacuum_vacuum_cost_delay will make the
> > autovacuum more aggressive and might reduce the vacuum cost for
> > write-intensive workload on big table.
> 
> Uh, we usually don't suggest new defaults in back branches.

Basically, what I am saying is that if you want this, it would be a new
behavior that would need general discussion.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: The suggestion of reducing autovacuum_vacuum_cost_delay should be documented

From
"David G. Johnston"
Date:
On Tue, May 12, 2020 at 2:41 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, May 12, 2020 at 05:40:14PM -0400, Bruce Momjian wrote:
> On Tue, May 12, 2020 at 10:26:27AM +0000, PG Doc comments form wrote:
> > The following documentation comment has been logged on the website:
> >
> > Page: https://www.postgresql.org/docs/11/runtime-config-autovacuum.html
> > Description:
> >
> > The `autovacuum_vacuum_cost_delay` setting changes to 2ms in PostgreSQL 12
> > but in the old Postgresql version, the default setting is still 20ms. I
> > would suggest adding a suggestion in the old document
> > to lower the autovacuum_vacuum_cost_delay as:
> >
> > > The default value of autovacuum_vacuum_cost_delay is reduced to 2ms in
> > PostgreSQL 12. Reducing the autovacuum_vacuum_cost_delay will make the
> > autovacuum more aggressive and might reduce the vacuum cost for
> > write-intensive workload on big table.
>
> Uh, we usually don't suggest new defaults in back branches.

Basically, what I am saying is that if you want this, it would be a new
behavior that would need general discussion.


The proposal is to document in versions 9.4 to 11 that the recommended value for the setting is 2ms while for reasons of continuity the default in these versions is 20ms.

I don't really see any harm in it.  Its not like the choice to reduce the value was made because of new features introduced in 12 - it was a re-evaluation of a 15 year old default.

David J.

On Tue, May 12, 2020 at 03:38:16PM -0700, David G. Johnston wrote:
> On Tue, May 12, 2020 at 2:41 PM Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Tue, May 12, 2020 at 05:40:14PM -0400, Bruce Momjian wrote:
>     > On Tue, May 12, 2020 at 10:26:27AM +0000, PG Doc comments form wrote:
>     > > The following documentation comment has been logged on the website:
>     > >
>     > > Page: https://www.postgresql.org/docs/11/runtime-config-autovacuum.html
>     > > Description:
>     > >
>     > > The `autovacuum_vacuum_cost_delay` setting changes to 2ms in PostgreSQL
>     12
>     > > but in the old Postgresql version, the default setting is still 20ms. I
>     > > would suggest adding a suggestion in the old document
>     > > to lower the autovacuum_vacuum_cost_delay as:
>     > >
>     > > > The default value of autovacuum_vacuum_cost_delay is reduced to 2ms
>     in
>     > > PostgreSQL 12. Reducing the autovacuum_vacuum_cost_delay will make the
>     > > autovacuum more aggressive and might reduce the vacuum cost for
>     > > write-intensive workload on big table.
>     >
>     > Uh, we usually don't suggest new defaults in back branches.
> 
>     Basically, what I am saying is that if you want this, it would be a new
>     behavior that would need general discussion.
> 
> 
> 
> The proposal is to document in versions 9.4 to 11 that the recommended value
> for the setting is 2ms while for reasons of continuity the default in these
> versions is 20ms.
> 
> I don't really see any harm in it.  Its not like the choice to reduce the value
> was made because of new features introduced in 12 - it was a re-evaluation of a
> 15 year old default.

Well, we really need to have some general discussion about whether
changing defaults in major releases should trigger a mention to change
the defaults in back branches.  This is something that would have to be
discussed on the hackers list.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Bruce Momjian <bruce@momjian.us> writes:
> On Tue, May 12, 2020 at 03:38:16PM -0700, David G. Johnston wrote:
>> The proposal is to document in versions 9.4 to 11 that the recommended value
>> for the setting is 2ms while for reasons of continuity the default in these
>> versions is 20ms.
>> I don't really see any harm in it.  Its not like the choice to reduce the value
>> was made because of new features introduced in 12 - it was a re-evaluation of a
>> 15 year old default.

> Well, we really need to have some general discussion about whether
> changing defaults in major releases should trigger a mention to change
> the defaults in back branches.  This is something that would have to be
> discussed on the hackers list.

It's not immediately obvious that the new default value established in
version N is appropriate for version N-minus-several.  Certainly, whatever
testing was done to justify the new default wouldn't have been done on old
versions; and there might have been relevant changes.

In short: nope, I'm not on board with blindly back-patching such
recommendations.

            regards, tom lane



On Tue, May 12, 2020 at 07:26:07PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Tue, May 12, 2020 at 03:38:16PM -0700, David G. Johnston wrote:
> >> The proposal is to document in versions 9.4 to 11 that the recommended value
> >> for the setting is 2ms while for reasons of continuity the default in these
> >> versions is 20ms.
> >> I don't really see any harm in it.  Its not like the choice to reduce the value
> >> was made because of new features introduced in 12 - it was a re-evaluation of a
> >> 15 year old default.
> 
> > Well, we really need to have some general discussion about whether
> > changing defaults in major releases should trigger a mention to change
> > the defaults in back branches.  This is something that would have to be
> > discussed on the hackers list.
> 
> It's not immediately obvious that the new default value established in
> version N is appropriate for version N-minus-several.  Certainly, whatever
> testing was done to justify the new default wouldn't have been done on old
> versions; and there might have been relevant changes.
> 
> In short: nope, I'm not on board with blindly back-patching such
> recommendations.

That was my analysis too.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: The suggestion of reducing autovacuum_vacuum_cost_delay should be documented

From
"David G. Johnston"
Date:
On Tue, May 12, 2020 at 4:30 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, May 12, 2020 at 07:26:07PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Tue, May 12, 2020 at 03:38:16PM -0700, David G. Johnston wrote:
> >> The proposal is to document in versions 9.4 to 11 that the recommended value
> >> for the setting is 2ms while for reasons of continuity the default in these
> >> versions is 20ms.
> >> I don't really see any harm in it.  Its not like the choice to reduce the value
> >> was made because of new features introduced in 12 - it was a re-evaluation of a
> >> 15 year old default.
>
> > Well, we really need to have some general discussion about whether
> > changing defaults in major releases should trigger a mention to change
> > the defaults in back branches.  This is something that would have to be
> > discussed on the hackers list.
>
> It's not immediately obvious that the new default value established in
> version N is appropriate for version N-minus-several.  Certainly, whatever
> testing was done to justify the new default wouldn't have been done on old
> versions; and there might have been relevant changes.
>
> In short: nope, I'm not on board with blindly back-patching such
> recommendations.

That was my analysis too.

To be clear, because my cursory reading of the thread that was linked from the commit suggested that this specific situation was more "lets catch up to modern times", my position isn't that such documentation changes should be done as a rule, I am suggesting that we give a yes/no decision on this specific change (in advance of bike-shedding the wording).  IMO neither a blanket rule allowing or prohibiting such a change to the documentation makes sense given the rarity of the event.

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> To be clear, because my cursory reading of the thread that was linked from
> the commit suggested that this specific situation was more "lets catch up
> to modern times", my position isn't that such documentation changes should
> be done as a rule, I am suggesting that we give a yes/no decision on this
> specific change (in advance of bike-shedding the wording).  IMO neither a
> blanket rule allowing or prohibiting such a change to the documentation
> makes sense given the rarity of the event.

Sure.  My point was just that changing the back-branch documentation would
require doing additional testing to verify that the proposed value is
an improvement in those branches.

            regards, tom lane



Re: The suggestion of reducing autovacuum_vacuum_cost_delay should be documented

From
"David G. Johnston"
Date:
On Tue, May 12, 2020 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> To be clear, because my cursory reading of the thread that was linked from
> the commit suggested that this specific situation was more "lets catch up
> to modern times", my position isn't that such documentation changes should
> be done as a rule, I am suggesting that we give a yes/no decision on this
> specific change (in advance of bike-shedding the wording).  IMO neither a
> blanket rule allowing or prohibiting such a change to the documentation
> makes sense given the rarity of the event.

Sure.  My point was just that changing the back-branch documentation would
require doing additional testing to verify that the proposed value is
an improvement in those branches.


Yeah, details of any testing during v12 didn't make it into the thread, the main conclusion that did was:

"""
However, the current settings are predicated on the assumption that
you can't get the kernel to give you a sleep of less than circa 10ms.
That assumption is way outdated, I believe; poking around on systems
I have here, the minimum delay time using pg_usleep(1) seems to be
generally less than 100us, and frequently less than 10us, on anything
released in the last decade.
"""

Hence assuming this was more of a "modernization".  v12 has been out for a year (yeah, not a super long time, I know) and I'm not recalling any complaints about the new default.  I don't have a feel, though, for what informal or other measurements that all you hackers do to get a feel for how safe and useful the change from 20ms to 2ms was.

David's original post:

"""
I've had to do quite a bit of performance investigation work this year
and it seems that I only too often discover that the same problem is
repeating itself...
"""

I'd be inclined to assume many of those investigations were not on v12 which lends itself to at least wanting to give admins of older versions a heads-up in the documentation.  There seems to be consensus that this is a win - and people have a couple of months now to demonstrate that it isn't on older versions. (IOW, I'm not going to prove it one way or another but everything that went into having this committed in v12 leads me to at least tell people of older versions we changed things in v12 for reasons - even if we cover our rear ends and say we simply didn't test older versions and mileage may vary).

David J.


Thank you all for your fast response. I agree that back patching the default change in general cases should be probably discussed in the hacker mailing list to establish good practice. However, as a user, I would typically just refer to the documentation of the specific version v11 in this case instead of the latest v12. If I follow the v11 doc, I would not be aware of the performance caveat. Since v11 is still a relatively recent release and my testing demonstrates the same significant perf difference exists in v11, I think it might be worth adding the note in the doc or wiki to let the people of older versions know that the default value is changing for catching up modern times.

On Wed, May 13, 2020 at 8:17 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, May 12, 2020 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> To be clear, because my cursory reading of the thread that was linked from
> the commit suggested that this specific situation was more "lets catch up
> to modern times", my position isn't that such documentation changes should
> be done as a rule, I am suggesting that we give a yes/no decision on this
> specific change (in advance of bike-shedding the wording).  IMO neither a
> blanket rule allowing or prohibiting such a change to the documentation
> makes sense given the rarity of the event.

Sure.  My point was just that changing the back-branch documentation would
require doing additional testing to verify that the proposed value is
an improvement in those branches.


Yeah, details of any testing during v12 didn't make it into the thread, the main conclusion that did was:

"""
However, the current settings are predicated on the assumption that
you can't get the kernel to give you a sleep of less than circa 10ms.
That assumption is way outdated, I believe; poking around on systems
I have here, the minimum delay time using pg_usleep(1) seems to be
generally less than 100us, and frequently less than 10us, on anything
released in the last decade.
"""

Hence assuming this was more of a "modernization".  v12 has been out for a year (yeah, not a super long time, I know) and I'm not recalling any complaints about the new default.  I don't have a feel, though, for what informal or other measurements that all you hackers do to get a feel for how safe and useful the change from 20ms to 2ms was.

David's original post:

"""
I've had to do quite a bit of performance investigation work this year
and it seems that I only too often discover that the same problem is
repeating itself...
"""

I'd be inclined to assume many of those investigations were not on v12 which lends itself to at least wanting to give admins of older versions a heads-up in the documentation.  There seems to be consensus that this is a win - and people have a couple of months now to demonstrate that it isn't on older versions. (IOW, I'm not going to prove it one way or another but everything that went into having this committed in v12 leads me to at least tell people of older versions we changed things in v12 for reasons - even if we cover our rear ends and say we simply didn't test older versions and mileage may vary).

David J.