Thread: [MASSMAIL]allow changing autovacuum_max_workers without restarting

[MASSMAIL]allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
I frequently hear about scenarios where users with thousands upon thousands
of tables realize that autovacuum is struggling to keep up.  When they
inevitably go to bump up autovacuum_max_workers, they discover that it
requires a server restart (i.e., downtime) to take effect, causing further
frustration.  For this reason, I think $SUBJECT is a desirable improvement.
I spent some time looking for past discussions about this, and I was
surprised to not find any, so I thought I'd give it a try.

The attached proof-of-concept patch demonstrates what I have in mind.
Instead of trying to dynamically change the global process table, etc., I'm
proposing that we introduce a new GUC that sets the effective maximum
number of autovacuum workers that can be started at any time.  This means
there would be two GUCs for the number of autovacuum workers: one for the
number of slots reserved for autovacuum workers, and another that restricts
the number of those slots that can be used.  The former would continue to
require a restart to change its value, and users would typically want to
set it relatively high.  The latter could be changed at any time and would
allow for raising or lowering the maximum number of active autovacuum
workers, up to the limit set by the other parameter.

The proof-of-concept patch keeps autovacuum_max_workers as the maximum
number of slots to reserve for workers, but I think we should instead
rename this parameter to something else and then reintroduce
autovacuum_max_workers as the new parameter that can be adjusted without
restarting.  That way, autovacuum_max_workers continues to work much the
same way as in previous versions.

There are a couple of weird cases with this approach.  One is when the
restart-only limit is set lower than the PGC_SIGHUP limit.  In that case, I
think we should just use the restart-only limit.  The other is when there
are already N active autovacuum workers and the PGC_SIGHUP parameter is
changed to something less than N.  For that case, I think we should just
block starting additional workers until the number of workers drops below
the new parameter's value.  I don't think we should kill existing workers,
or anything else like that.

TBH I've been sitting on this idea for a while now, only because I think it
has a slim chance of acceptance, but IMHO this is a simple change that
could help many users.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
> I frequently hear about scenarios where users with thousands upon thousands
> of tables realize that autovacuum is struggling to keep up.  When they
> inevitably go to bump up autovacuum_max_workers, they discover that it
> requires a server restart (i.e., downtime) to take effect, causing further
> frustration.  For this reason, I think $SUBJECT is a desirable improvement.
> I spent some time looking for past discussions about this, and I was
> surprised to not find any, so I thought I'd give it a try.

I did not review the patch in detail yet, but +1 to the idea. 
It's not just thousands of tables that suffer from this.
If a user has a few large tables hogging the autovac workers, then other
tables don't get the autovac cycles they require. Users are then forced
to run manual vacuums, which adds complexity to their operations.

> The attached proof-of-concept patch demonstrates what I have in mind.
> Instead of trying to dynamically change the global process table, etc., I'm
> proposing that we introduce a new GUC that sets the effective maximum
> number of autovacuum workers that can be started at any time.

max_worker_processes defines a pool of max # of background workers allowed.
parallel workers and extensions that spin  up background workers all utilize from 
this pool. 

Should autovacuum_max_workers be able to utilize from max_worker_processes also?

This will allow autovacuum_max_workers to be dynamic while the user only has
to deal with an already existing GUC. We may want to increase the default value
for max_worker_processes as part of this.


Regards,

Sami
Amazon Web Services (AWS)


Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Thu, Apr 11, 2024 at 02:24:18PM +0000, Imseih (AWS), Sami wrote:
> max_worker_processes defines a pool of max # of background workers allowed.
> parallel workers and extensions that spin  up background workers all utilize from 
> this pool. 
> 
> Should autovacuum_max_workers be able to utilize from max_worker_processes also?
> 
> This will allow autovacuum_max_workers to be dynamic while the user only has
> to deal with an already existing GUC. We may want to increase the default value
> for max_worker_processes as part of this.

My concern with this approach is that other background workers could use up
all the slots and prevent autovacuum workers from starting, unless of
course we reserve autovacuum_max_workers slots for _only_ autovacuum
workers.  I'm not sure if we want to get these parameters tangled up like
this, though...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Thu, Apr 11, 2024 at 09:42:40AM -0500, Nathan Bossart wrote:
> On Thu, Apr 11, 2024 at 02:24:18PM +0000, Imseih (AWS), Sami wrote:
>> max_worker_processes defines a pool of max # of background workers allowed.
>> parallel workers and extensions that spin  up background workers all utilize from 
>> this pool. 
>> 
>> Should autovacuum_max_workers be able to utilize from max_worker_processes also?
>> 
>> This will allow autovacuum_max_workers to be dynamic while the user only has
>> to deal with an already existing GUC. We may want to increase the default value
>> for max_worker_processes as part of this.
> 
> My concern with this approach is that other background workers could use up
> all the slots and prevent autovacuum workers from starting, unless of
> course we reserve autovacuum_max_workers slots for _only_ autovacuum
> workers.  I'm not sure if we want to get these parameters tangled up like
> this, though...

I see that the logical replication launcher process uses this pool, but we
take special care to make sure it gets a slot:

    /*
     * Register the apply launcher.  It's probably a good idea to call this
     * before any modules had a chance to take the background worker slots.
     */
    ApplyLauncherRegister();

I'm not sure there's another way to effectively reserve slots that would
work for the autovacuum workers (which need to restart to connect to
different databases), so that would need to be invented.  We'd probably
also want to fail startup if autovacuum_max_workers < max_worker_processes,
which seems like it has the potential to cause problems when folks first
upgrade to v18.

Furthermore, we might have to convert autovacuum workers to background
worker processes for this to work.  I've admittedly wondered about whether
we should do that eventually, anyway, but it'd expand the scope of this
work quite a bit.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
> My concern with this approach is that other background workers could use up
> all the slots and prevent autovacuum workers from starting

That's a good point, the current settings do not guarantee that you
get a worker for the purpose if none are available, 
i.e. max_parallel_workers_per_gather, you may have 2 workers planned 
and 0 launched. 

> unless of
> course we reserve autovacuum_max_workers slots for _only_ autovacuum
> workers. I'm not sure if we want to get these parameters tangled up like
> this, though...

This will be confusing to describe and we will be reserving autovac workers
implicitly, rather than explicitly with a new GUC.

Regards,

Sami  




Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Thu, Apr 11, 2024 at 03:37:23PM +0000, Imseih (AWS), Sami wrote:
>> My concern with this approach is that other background workers could use up
>> all the slots and prevent autovacuum workers from starting
> 
> That's a good point, the current settings do not guarantee that you
> get a worker for the purpose if none are available, 
> i.e. max_parallel_workers_per_gather, you may have 2 workers planned 
> and 0 launched. 
> 
>> unless of
>> course we reserve autovacuum_max_workers slots for _only_ autovacuum
>> workers. I'm not sure if we want to get these parameters tangled up like
>> this, though...
> 
> This will be confusing to describe and we will be reserving autovac workers
> implicitly, rather than explicitly with a new GUC.

Yeah, that's probably a good reason to give autovacuum its own worker pool.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
I spent sometime reviewing/testing the POC. It is relatively simple with a lot
of obvious value. 

I tested with 16 tables that constantly reach the autovac threashold and the
patch did the right thing. I observed concurrent autovacuum workers matching
the setting as I was adjusting it dynamically.

As you mention above, If there are more autovacs in progress and a new lower setting 
is applied, we should not take any special action on those autovacuums, and eventually 
the max number of autovacuum workers will match the setting.

I also tested by allowing user connections to reach max_connections, and observed the 
expected number of autovacuums spinning up and correctly adjusted.

Having autovacuum tests ( unless I missed something ) like the above is a good 
general improvement, but it should not be tied to this. 

A few comments on  the POC patch:

1/ We should emit a log when autovacuum_workers is set higher than the max.

2/ should the name of the restart limit be "reserved_autovacuum_workers"?

Regards,

Sami Imseih
AWS (Amazon Web Services)




Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Fri, Apr 12, 2024 at 05:27:40PM +0000, Imseih (AWS), Sami wrote:
> A few comments on  the POC patch:

Thanks for reviewing.

> 1/ We should emit a log when autovacuum_workers is set higher than the max.

Hm.  Maybe the autovacuum launcher could do that.

> 2/ should the name of the restart limit be "reserved_autovacuum_workers"?

That's kind-of what I had in mind, although I think we might want to avoid
the word "reserved" because it sounds a bit like reserved_connections and
superuser_reserved_connections.  "autovacuum_max_slots" or
"autovacuum_max_worker_slots" might be worth considering, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
>> 1/ We should emit a log when autovacuum_workers is set higher than the max.


>> Hm. Maybe the autovacuum launcher could do that.

Would it be better to use a GUC check_hook that compares the 
new value with the max allowed values and emits a WARNING ?

autovacuum_max_workers already has a check_autovacuum_max_workers
check_hook, which can be repurposed for this.

In the POC patch, this check_hook is kept as-is, which will no longer make sense.

>> 2/ should the name of the restart limit be "reserved_autovacuum_workers"?


>> That's kind-of what I had in mind, although I think we might want to avoid
>> the word "reserved" because it sounds a bit like reserved_connections 
>> and superuser_reserved_connections

Yes, I agree. This can be confusing.

>>  "autovacuum_max_slots" or
>> "autovacuum_max_worker_slots" might be worth considering, too.

"autovacuum_max_worker_slots" is probably the best option because
we should have "worker" in the name of the GUC.


Regards,

Sami


Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Fri, Apr 12, 2024 at 10:17:44PM +0000, Imseih (AWS), Sami wrote:
>>> Hm. Maybe the autovacuum launcher could do that.
> 
> Would it be better to use a GUC check_hook that compares the 
> new value with the max allowed values and emits a WARNING ?
> 
> autovacuum_max_workers already has a check_autovacuum_max_workers
> check_hook, which can be repurposed for this.
> 
> In the POC patch, this check_hook is kept as-is, which will no longer make sense.

IIRC using GUC hooks to handle dependencies like this is generally frowned
upon because it tends to not work very well [0].  We could probably get it
to work for this particular case, but IMHO we should still try to avoid
this approach.  I didn't find any similar warnings for other GUCs like
max_parallel_workers_per_gather, so it might not be crucial to emit a
WARNING here.

[0] https://postgr.es/m/27574.1581015893%40sss.pgh.pa.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
> IIRC using GUC hooks to handle dependencies like this is generally frowned
> upon because it tends to not work very well [0]. We could probably get it
> to work for this particular case, but IMHO we should still try to avoid
> this approach. 

Thanks for pointing this out. I agree, this could lead to false logs being
emitted.

> so it might not be crucial to emit a
> WARNING here.

As mentioned earlier in the thread, we can let the autovacuum launcher emit the 
log,  but it will need to be careful not flood the logs when this condition exists ( i.e.
log only the first time the condition is detected or log every once in a while )

The additional complexity is not worth it.

Regards,

Sami





Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
Here is a first attempt at a proper patch set based on the discussion thus
far.  I've split it up into several small patches for ease of review, which
is probably a bit excessive.  If this ever makes it to commit, they could
likely be combined.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
Justin Pryzby
Date:
On Wed, Apr 10, 2024 at 04:23:44PM -0500, Nathan Bossart wrote:
> The attached proof-of-concept patch demonstrates what I have in mind.
> Instead of trying to dynamically change the global process table, etc., I'm
> proposing that we introduce a new GUC that sets the effective maximum
> number of autovacuum workers that can be started at any time.  This means
> there would be two GUCs for the number of autovacuum workers: one for the
> number of slots reserved for autovacuum workers, and another that restricts
> the number of those slots that can be used.  The former would continue to
> require a restart to change its value, and users would typically want to
> set it relatively high.  The latter could be changed at any time and would
> allow for raising or lowering the maximum number of active autovacuum
> workers, up to the limit set by the other parameter.
> 
> The proof-of-concept patch keeps autovacuum_max_workers as the maximum
> number of slots to reserve for workers, but I think we should instead
> rename this parameter to something else and then reintroduce
> autovacuum_max_workers as the new parameter that can be adjusted without
> restarting.  That way, autovacuum_max_workers continues to work much the
> same way as in previous versions.

When I thought about this, I considered proposing to add a new GUC for
"autovacuum_policy_workers".

autovacuum_max_workers would be the same as before, requiring a restart
to change.  The policy GUC would be the soft limit, changable at runtime
up to the hard limit of autovacuum_max_workers (or maybe any policy
value exceeding autovacuum_max_workers would be ignored).

We'd probably change autovacuum_max_workers to default to a higher value
(8, or 32 as in your patch), and have autovacuum_max_workers default to
3, for consistency with historic behavior.  Maybe
autovacuum_policy_workers=-1 would mean to use all workers.

There's the existing idea to change autovacuum thresholds during the
busy period of the day vs. off hours.  This would allow something
similar with nworkers rather than thresholds: if the goal were to reduce
the resource use of vacuum, the admin could set max_workers=8, with
policy_workers=2 during the busy period.

-- 
Justin



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Mon, Apr 15, 2024 at 08:33:33AM -0500, Justin Pryzby wrote:
> On Wed, Apr 10, 2024 at 04:23:44PM -0500, Nathan Bossart wrote:
>> The proof-of-concept patch keeps autovacuum_max_workers as the maximum
>> number of slots to reserve for workers, but I think we should instead
>> rename this parameter to something else and then reintroduce
>> autovacuum_max_workers as the new parameter that can be adjusted without
>> restarting.  That way, autovacuum_max_workers continues to work much the
>> same way as in previous versions.
> 
> When I thought about this, I considered proposing to add a new GUC for
> "autovacuum_policy_workers".
> 
> autovacuum_max_workers would be the same as before, requiring a restart
> to change.  The policy GUC would be the soft limit, changable at runtime
> up to the hard limit of autovacuum_max_workers (or maybe any policy
> value exceeding autovacuum_max_workers would be ignored).
> 
> We'd probably change autovacuum_max_workers to default to a higher value
> (8, or 32 as in your patch), and have autovacuum_max_workers default to
> 3, for consistency with historic behavior.  Maybe
> autovacuum_policy_workers=-1 would mean to use all workers.

This sounds like roughly the same idea, although it is backwards from what
I'm proposing in the v1 patch set.  My thinking is that by making a new
restart-only GUC that would by default be set higher than the vast majority
of systems should ever need, we could simplify migrating to these
parameters.  The autovacuum_max_workers parameter would effectively retain
it's original meaning, and existing settings would continue to work
normally on v18, but users could now adjust it without restarting.  If we
did it the other way, users would need to bump up autovacuum_max_workers
and restart prior to being able to raise autovacuum_policy_workers beyond
what they previously had set for autovacuum_max_workers.  That being said,
I'm open to doing it this way if folks prefer this approach, as I think it
is still an improvement.

> There's the existing idea to change autovacuum thresholds during the
> busy period of the day vs. off hours.  This would allow something
> similar with nworkers rather than thresholds: if the goal were to reduce
> the resource use of vacuum, the admin could set max_workers=8, with
> policy_workers=2 during the busy period.

Precisely.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Mon, Apr 15, 2024 at 11:28:33AM -0500, Nathan Bossart wrote:
> On Mon, Apr 15, 2024 at 08:33:33AM -0500, Justin Pryzby wrote:
>> On Wed, Apr 10, 2024 at 04:23:44PM -0500, Nathan Bossart wrote:
>>> The proof-of-concept patch keeps autovacuum_max_workers as the maximum
>>> number of slots to reserve for workers, but I think we should instead
>>> rename this parameter to something else and then reintroduce
>>> autovacuum_max_workers as the new parameter that can be adjusted without
>>> restarting.  That way, autovacuum_max_workers continues to work much the
>>> same way as in previous versions.
>> 
>> When I thought about this, I considered proposing to add a new GUC for
>> "autovacuum_policy_workers".
>> 
>> autovacuum_max_workers would be the same as before, requiring a restart
>> to change.  The policy GUC would be the soft limit, changable at runtime
>> up to the hard limit of autovacuum_max_workers (or maybe any policy
>> value exceeding autovacuum_max_workers would be ignored).
>> 
>> We'd probably change autovacuum_max_workers to default to a higher value
>> (8, or 32 as in your patch), and have autovacuum_max_workers default to
>> 3, for consistency with historic behavior.  Maybe
>> autovacuum_policy_workers=-1 would mean to use all workers.
> 
> This sounds like roughly the same idea, although it is backwards from what
> I'm proposing in the v1 patch set.  My thinking is that by making a new
> restart-only GUC that would by default be set higher than the vast majority
> of systems should ever need, we could simplify migrating to these
> parameters.  The autovacuum_max_workers parameter would effectively retain
> it's original meaning, and existing settings would continue to work
> normally on v18, but users could now adjust it without restarting.  If we
> did it the other way, users would need to bump up autovacuum_max_workers
> and restart prior to being able to raise autovacuum_policy_workers beyond
> what they previously had set for autovacuum_max_workers.  That being said,
> I'm open to doing it this way if folks prefer this approach, as I think it
> is still an improvement.

Another option could be to just remove the restart-only GUC and hard-code
the upper limit of autovacuum_max_workers to 64 or 128 or something.  While
that would simplify matters, I suspect it would be hard to choose an
appropriate limit that won't quickly become outdated.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
> Another option could be to just remove the restart-only GUC and hard-code
> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
> that would simplify matters, I suspect it would be hard to choose an
> appropriate limit that won't quickly become outdated.

Hardcoded values are usually hard to deal with because they are hidden either
In code or in docs.

> When I thought about this, I considered proposing to add a new GUC for
> "autovacuum_policy_workers".

> autovacuum_max_workers would be the same as before, requiring a restart
> to change.  The policy GUC would be the soft limit, changable at runtime

I think autovacuum_max_workers should still be the GUC that controls
the number of concurrent autovacuums. This parameter is already well 
established and changing the meaning now will be confusing. 

I suspect most users will be glad it's now dynamic, but will probably 
be annoyed if it's no longer doing what it's supposed to.

Regards,

Sami 


Re: allow changing autovacuum_max_workers without restarting

From
wenhui qiu
Date:
Agree +1,From a dba perspective, I would prefer that this parameter can be dynamically modified, rather than adding a new parameter,What is more difficult is how to smoothly reach the target value when the setting is considered to be too large and needs to be lowered.
    


Regards 

On Tue, 16 Apr 2024 at 01:41, Imseih (AWS), Sami <simseih@amazon.com> wrote:
> Another option could be to just remove the restart-only GUC and hard-code
> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
> that would simplify matters, I suspect it would be hard to choose an
> appropriate limit that won't quickly become outdated.

Hardcoded values are usually hard to deal with because they are hidden either
In code or in docs.

> When I thought about this, I considered proposing to add a new GUC for
> "autovacuum_policy_workers".

> autovacuum_max_workers would be the same as before, requiring a restart
> to change.  The policy GUC would be the soft limit, changable at runtime

I think autovacuum_max_workers should still be the GUC that controls
the number of concurrent autovacuums. This parameter is already well
established and changing the meaning now will be confusing.

I suspect most users will be glad it's now dynamic, but will probably
be annoyed if it's no longer doing what it's supposed to.

Regards,

Sami

Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
> Here is a first attempt at a proper patch set based on the discussion thus
> far. I've split it up into several small patches for ease of review, which
> is probably a bit excessive. If this ever makes it to commit, they could
> likely be combined.

I looked at the patch set. With the help of DEBUG2 output, I tested to ensure
that the the autovacuum_cost_limit  balance adjusts correctly when the 
autovacuum_max_workers value increases/decreases. I did not think the 
patch will break this behavior, but it's important to verify this.

Some comments on the patch:

1. A nit. There should be a tab here.

-       dlist_head      av_freeWorkers;
+       dclist_head av_freeWorkers;

2. autovacuum_max_worker_slots documentation:

+       <para>
+        Note that the value of <xref linkend="guc-autovacuum-max-workers"/> is
+        silently capped to this value.
+       </para>

This comment looks redundant in the docs, since the entry 
for autovacuum_max_workers that follows mentions the
same.


3. The docs for autovacuum_max_workers should mention that when
the value changes, consider adjusting the autovacuum_cost_limit/cost_delay. 

This is not something new. Even in the current state, users should think about 
these settings. However, it seems even important if this value is to be 
dynamically adjusted.


Regards,

Sami







Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Thu, Apr 18, 2024 at 05:05:03AM +0000, Imseih (AWS), Sami wrote:
> I looked at the patch set. With the help of DEBUG2 output, I tested to ensure
> that the the autovacuum_cost_limit  balance adjusts correctly when the 
> autovacuum_max_workers value increases/decreases. I did not think the 
> patch will break this behavior, but it's important to verify this.

Great.

> 1. A nit. There should be a tab here.
> 
> -       dlist_head      av_freeWorkers;
> +       dclist_head av_freeWorkers;

I dare not argue with pgindent.

> 2. autovacuum_max_worker_slots documentation:
> 
> +       <para>
> +        Note that the value of <xref linkend="guc-autovacuum-max-workers"/> is
> +        silently capped to this value.
> +       </para>
> 
> This comment looks redundant in the docs, since the entry 
> for autovacuum_max_workers that follows mentions the
> same.

Removed in v2.  I also noticed that I forgot to update the part about when
autovacuum_max_workers can be changed.  *facepalm*

> 3. The docs for autovacuum_max_workers should mention that when
> the value changes, consider adjusting the autovacuum_cost_limit/cost_delay. 
> 
> This is not something new. Even in the current state, users should think about 
> these settings. However, it seems even important if this value is to be 
> dynamically adjusted.

I don't necessarily disagree that it might be worth mentioning these
parameters, but I would argue that this should be proposed in a separate
thread.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
Robert Haas
Date:
On Fri, Apr 19, 2024 at 11:43 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> Removed in v2.  I also noticed that I forgot to update the part about when
> autovacuum_max_workers can be changed.  *facepalm*

I think this could help a bunch of users, but I'd still like to
complain, not so much with the desire to kill this patch as with the
desire to broaden the conversation.

Part of the underlying problem here is that, AFAIK, neither PostgreSQL
as a piece of software nor we as human beings who operate PostgreSQL
databases have much understanding of how autovacuum_max_workers should
be set. It's relatively easy to hose yourself by raising
autovacuum_max_workers to try to make things go faster, but produce
the exact opposite effect due to how the cost balancing stuff works.
But, even if you have the correct use case for autovacuum_max_workers,
something like a few large tables that take a long time to vacuum plus
a bunch of smaller ones that can't get starved just because the big
tables are in the midst of being processed, you might well ask
yourself why it's your job to figure out the correct number of
workers.

Now, before this patch, there is a fairly good reason for that, which
is that we need to reserve shared memory resources for each autovacuum
worker that might potentially run, and the system can't know how much
shared memory you'd like to reserve for that purpose. But if that were
the only problem, then this patch would probably just be proposing to
crank up the default value of that parameter rather than introducing a
second one. I bet Nathan isn't proposing that because his intuition is
that it will work out badly, and I think he's right. I bet that
cranking up the number of allowed workers will often result in running
more workers than we really should. One possible negative consequence
is that we'll end up with multiple processes fighting over the disk in
a situation where they should just take turns. I suspect there are
also ways that we can be harmed - in broadly similar fashion - by cost
balancing.

So I feel like what this proposal reveals is that we know that our
algorithm for ramping up the number of running workers doesn't really
work. And maybe that's just a consequence of the general problem that
we have no global information about how much vacuuming work there is
to be done at any given time, and therefore we cannot take any kind of
sensible guess about whether 1 more worker will help or hurt. Or,
maybe there's some way to do better than what we do today without a
big rewrite. I'm not sure. I don't think this patch should be burdened
with solving the general problem here. But I do think the general
problem is worth some discussion.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Fri, Apr 19, 2024 at 02:42:13PM -0400, Robert Haas wrote:
> I think this could help a bunch of users, but I'd still like to
> complain, not so much with the desire to kill this patch as with the
> desire to broaden the conversation.

I think I subconsciously hoped this would spark a bigger discussion...

> Now, before this patch, there is a fairly good reason for that, which
> is that we need to reserve shared memory resources for each autovacuum
> worker that might potentially run, and the system can't know how much
> shared memory you'd like to reserve for that purpose. But if that were
> the only problem, then this patch would probably just be proposing to
> crank up the default value of that parameter rather than introducing a
> second one. I bet Nathan isn't proposing that because his intuition is
> that it will work out badly, and I think he's right. I bet that
> cranking up the number of allowed workers will often result in running
> more workers than we really should. One possible negative consequence
> is that we'll end up with multiple processes fighting over the disk in
> a situation where they should just take turns. I suspect there are
> also ways that we can be harmed - in broadly similar fashion - by cost
> balancing.

Even if we were content to bump up the default value of
autovacuum_max_workers and tell folks to just mess with the cost settings,
there are still probably many cases where bumping up the number of workers
further would be necessary.  If you have a zillion tables, turning
cost-based vacuuming off completely may be insufficient to keep up, at
which point your options become limited.  It can be difficult to tell
whether you might end up in this situation over time as your workload
evolves.  In any case, it's not clear to me that bumping up the default
value of autovacuum_max_workers would do more good than harm.  I get the
idea that the default of 3 is sufficient for a lot of clusters, so there'd
really be little upside to changing it AFAICT.  (I guess this proves your
point about my intuition.)

> So I feel like what this proposal reveals is that we know that our
> algorithm for ramping up the number of running workers doesn't really
> work. And maybe that's just a consequence of the general problem that
> we have no global information about how much vacuuming work there is
> to be done at any given time, and therefore we cannot take any kind of
> sensible guess about whether 1 more worker will help or hurt. Or,
> maybe there's some way to do better than what we do today without a
> big rewrite. I'm not sure. I don't think this patch should be burdened
> with solving the general problem here. But I do think the general
> problem is worth some discussion.

I certainly don't want to hold up $SUBJECT for a larger rewrite of
autovacuum scheduling, but I also don't want to shy away from a larger
rewrite if it's an idea whose time has come.  I'm looking forward to
hearing your ideas in your pgconf.dev talk.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
> Part of the underlying problem here is that, AFAIK, neither PostgreSQL
> as a piece of software nor we as human beings who operate PostgreSQL
> databases have much understanding of how autovacuum_max_workers should
> be set. It's relatively easy to hose yourself by raising
> autovacuum_max_workers to try to make things go faster, but produce
> the exact opposite effect due to how the cost balancing stuff works.

Yeah, this patch will not fix this problem. Anyone who raises av_max_workers
should think about adjusting the av_cost_delay. This was discussed up the
thread [4] and even without this patch, I think it's necessary to add more
documentation on the relationship between workers and cost.


> So I feel like what this proposal reveals is that we know that our
> algorithm for ramping up the number of running workers doesn't really
> work. And maybe that's just a consequence of the general problem that
> we have no global information about how much vacuuming work there is
> to be done at any given time, and therefore we cannot take any kind of
> sensible guess about whether 1 more worker will help or hurt. Or,
> maybe there's some way to do better than what we do today without a
> big rewrite. I'm not sure. I don't think this patch should be burdened
> with solving the general problem here. But I do think the general
> problem is worth some discussion.

This patch is only solving the operational problem of adjusting 
autovacuum_max_workers,  and it does so without introducing complexity.

A proposal that will alleviate the users from the burden of having to think about
autovacuum_max_workers, cost_delay and cost_limit settings will be great.
This patch may be the basis for such dynamic  "auto-tuning" of autovacuum workers.


Regards,

Sami

[4] https://www.postgresql.org/message-id/20240419154322.GA3988554%40nathanxps13


Re: allow changing autovacuum_max_workers without restarting

From
Robert Haas
Date:
On Fri, Apr 19, 2024 at 4:29 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I certainly don't want to hold up $SUBJECT for a larger rewrite of
> autovacuum scheduling, but I also don't want to shy away from a larger
> rewrite if it's an idea whose time has come.  I'm looking forward to
> hearing your ideas in your pgconf.dev talk.

Yeah, I suppose I was hoping you were going to tell me the all the
answers and thus make the talk a lot easier to write, but I guess life
isn't that simple. :-)

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Mon, Apr 15, 2024 at 05:41:04PM +0000, Imseih (AWS), Sami wrote:
>> Another option could be to just remove the restart-only GUC and hard-code
>> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
>> that would simplify matters, I suspect it would be hard to choose an
>> appropriate limit that won't quickly become outdated.
> 
> Hardcoded values are usually hard to deal with because they are hidden either
> In code or in docs.

That's true, but using a hard-coded limit means we no longer need to add a
new GUC.  Always allocating, say, 256 slots might require a few additional
kilobytes of shared memory, most of which will go unused, but that seems
unlikely to be a problem for the systems that will run Postgres v18.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
> That's true, but using a hard-coded limit means we no longer need to add a
> new GUC. Always allocating, say, 256 slots might require a few additional
> kilobytes of shared memory, most of which will go unused, but that seems
> unlikely to be a problem for the systems that will run Postgres v18.

I agree with this.


Regards,

Sami


Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Fri, May 03, 2024 at 12:57:18PM +0000, Imseih (AWS), Sami wrote:
>> That's true, but using a hard-coded limit means we no longer need to add a
>> new GUC. Always allocating, say, 256 slots might require a few additional
>> kilobytes of shared memory, most of which will go unused, but that seems
>> unlikely to be a problem for the systems that will run Postgres v18.
> 
> I agree with this.

Here's what this might look like.  I chose an upper limit of 1024, which
seems like it "ought to be enough for anybody," at least for now.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
"Imseih (AWS), Sami"
Date:
>>> That's true, but using a hard-coded limit means we no longer need to add a
>>> new GUC. Always allocating, say, 256 slots might require a few additional
>>> kilobytes of shared memory, most of which will go unused, but that seems
>>> unlikely to be a problem for the systems that will run Postgres v18.
>>
>> I agree with this.


> Here's what this might look like. I chose an upper limit of 1024, which
> seems like it "ought to be enough for anybody," at least for now.

I thought 256 was a good enough limit. In practice, I doubt anyone will 
benefit from more than a few dozen autovacuum workers. 
I think 1024 is way too high to even allow.

Besides that the overall patch looks good to me, but I have
some comments on the documentation.

I don't think combining 1024 + 5 = 1029 is a good idea in docs.
Breaking down the allotment and using the name of the constant 
is much more clear.

I suggest 
" max_connections + max_wal_senders + max_worker_processes + AUTOVAC_MAX_WORKER_SLOTS + 5"

and in other places in the docs, we should mention the actual 
value of AUTOVAC_MAX_WORKER_SLOTS. Maybe in the 
below section?

Instead of:
-    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
+    (1024) and allowed background

do something like:
-    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
+   AUTOVAC_MAX_WORKER_SLOTS  (1024) and allowed background

Also,  replace the 1024 here with AUTOVAC_MAX_WORKER_SLOTS.

+    <varname>max_wal_senders</varname>,
+    plus <varname>max_worker_processes</varname>, plus 1024 for autovacuum
+    worker processes, plus one extra for each 16


Also, Not sure if I am mistaken here, but the "+ 5" in the existing docs
seems wrong.
 
If it refers to NUM_AUXILIARY_PROCS defined in 
include/storage/proc.h, it should a "6"

#define NUM_AUXILIARY_PROCS             6

This is not a consequence of this patch, and can be dealt with
In a separate thread if my understanding is correct.


Regards,

Sami 



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Thu, May 16, 2024 at 04:37:10PM +0000, Imseih (AWS), Sami wrote:
> I thought 256 was a good enough limit. In practice, I doubt anyone will 
> benefit from more than a few dozen autovacuum workers. 
> I think 1024 is way too high to even allow.

WFM

> I don't think combining 1024 + 5 = 1029 is a good idea in docs.
> Breaking down the allotment and using the name of the constant 
> is much more clear.
> 
> I suggest 
> " max_connections + max_wal_senders + max_worker_processes + AUTOVAC_MAX_WORKER_SLOTS + 5"
> 
> and in other places in the docs, we should mention the actual 
> value of AUTOVAC_MAX_WORKER_SLOTS. Maybe in the 
> below section?
> 
> Instead of:
> -    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
> +    (1024) and allowed background
> 
> do something like:
> -    (<xref linkend="guc-autovacuum-max-workers"/>) and allowed background
> +   AUTOVAC_MAX_WORKER_SLOTS  (1024) and allowed background
> 
> Also,  replace the 1024 here with AUTOVAC_MAX_WORKER_SLOTS.
> 
> +    <varname>max_wal_senders</varname>,
> +    plus <varname>max_worker_processes</varname>, plus 1024 for autovacuum
> +    worker processes, plus one extra for each 16

Part of me wonders whether documenting the exact formula is worthwhile.
This portion of the docs is rather complicated, and I can't recall ever
having to do the arithmetic is describes.  Plus, see below...

> Also, Not sure if I am mistaken here, but the "+ 5" in the existing docs
> seems wrong.
>  
> If it refers to NUM_AUXILIARY_PROCS defined in 
> include/storage/proc.h, it should a "6"
> 
> #define NUM_AUXILIARY_PROCS             6
> 
> This is not a consequence of this patch, and can be dealt with
> In a separate thread if my understanding is correct.

Ha, I think it should actually be "+ 7"!  The value is calculated as

    MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders + 6

Looking at the history, this documentation tends to be wrong quite often.
In v9.2, the checkpointer was introduced, and these formulas were not
updated.  In v9.3, background worker processes were introduced, and the
formulas were still not updated.  Finally, in v9.6, it was fixed in commit
597f7e3.  Then, in v14, the archiver process was made an auxiliary process
(commit d75288f), making the formulas out-of-date again.  And in v17, the
WAL summarizer was added.

On top of this, IIUC you actually need even more semaphores if your system
doesn't support atomics, and from a quick skim this doesn't seem to be
covered in this documentation.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Thu, May 16, 2024 at 09:16:46PM -0500, Nathan Bossart wrote:
> On Thu, May 16, 2024 at 04:37:10PM +0000, Imseih (AWS), Sami wrote:
>> I thought 256 was a good enough limit. In practice, I doubt anyone will 
>> benefit from more than a few dozen autovacuum workers. 
>> I think 1024 is way too high to even allow.
> 
> WFM

Here is an updated patch that uses 256 as the upper limit.

>> I don't think combining 1024 + 5 = 1029 is a good idea in docs.
>> Breaking down the allotment and using the name of the constant 
>> is much more clear.

I plan to further improve this section of the documentation in v18, so I've
left the constant unexplained for now.

-- 
nathan

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
Andres Freund
Date:
Hi,

On 2024-06-03 13:52:29 -0500, Nathan Bossart wrote:
> Here is an updated patch that uses 256 as the upper limit.

I don't have time to read through the entire thread right now - it'd be good
for the commit message of a patch like this to include justification for why
it's ok to make such a change. Even before actually committing it, so
reviewers have an easier time catching up.

Why do we think that increasing the number of PGPROC slots, heavyweight locks
etc by 256 isn't going to cause issues?  That's not an insubstantial amount of
memory to dedicate to something that will practically never be used.

ISTM that at the very least we ought to exclude the reserved slots from the
computation of things like the number of locks resulting from
max_locks_per_transaction.  It's very common to increase
max_locks_per_transaction substantially, adding ~250 to the multiplier can be
a good amount of memory. And AV workers should never need a meaningful number.

Increasing e.g. the size of the heavyweight lock table has consequences
besides the increase in memory usage, the size increase can make it less
likely for the table to fit largely into L3, thus decreasing performance.

Greetings,

Andres Freund



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Mon, Jun 03, 2024 at 12:08:52PM -0700, Andres Freund wrote:
> I don't have time to read through the entire thread right now - it'd be good
> for the commit message of a patch like this to include justification for why
> it's ok to make such a change. Even before actually committing it, so
> reviewers have an easier time catching up.

Sorry about that.  I think the main question (besides "should we do this?")
is whether we ought to make the upper limit configurable.  My initial idea
was to split autovacuum_max_workers into two GUCs: one for the upper limit
that only be changed at server start and another for the effective limit
that can be changed up to the upper limit without restarting the server.
If we can just set a sufficiently high upper limit and avoid the extra GUC
without causing problems, that might be preferable, but I sense that you
are about to tell me that it will indeed cause problems.  :)

> Why do we think that increasing the number of PGPROC slots, heavyweight locks
> etc by 256 isn't going to cause issues?  That's not an insubstantial amount of
> memory to dedicate to something that will practically never be used.

I personally have not observed problems with these kinds of bumps in
resource usage, although I may be biased towards larger systems where it
doesn't matter as much.

> ISTM that at the very least we ought to exclude the reserved slots from the
> computation of things like the number of locks resulting from
> max_locks_per_transaction.  It's very common to increase
> max_locks_per_transaction substantially, adding ~250 to the multiplier can be
> a good amount of memory. And AV workers should never need a meaningful number.

This is an interesting idea.

> Increasing e.g. the size of the heavyweight lock table has consequences
> besides the increase in memory usage, the size increase can make it less
> likely for the table to fit largely into L3, thus decreasing performance.

IMHO this might be a good argument for making the upper limit configurable
and setting it relatively low by default.  That's not quite as nice from a
user experience perspective, but weird, hard-to-diagnose performance issues
are certainly not nice, either.

-- 
nathan



Re: allow changing autovacuum_max_workers without restarting

From
Andres Freund
Date:
Hi,

On 2024-06-03 14:28:13 -0500, Nathan Bossart wrote:
> On Mon, Jun 03, 2024 at 12:08:52PM -0700, Andres Freund wrote:
> > Why do we think that increasing the number of PGPROC slots, heavyweight locks
> > etc by 256 isn't going to cause issues?  That's not an insubstantial amount of
> > memory to dedicate to something that will practically never be used.
> 
> I personally have not observed problems with these kinds of bumps in
> resource usage, although I may be biased towards larger systems where it
> doesn't matter as much.

IME it matters *more* on larger systems. Or at least used to, I haven't
experimented with this in quite a while.

It's possible that we improved a bunch of things sufficiently for this to not
matter anymore.

Greetings,

Andres Freund



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Mon, Jun 03, 2024 at 04:24:27PM -0700, Andres Freund wrote:
> On 2024-06-03 14:28:13 -0500, Nathan Bossart wrote:
>> On Mon, Jun 03, 2024 at 12:08:52PM -0700, Andres Freund wrote:
>> > Why do we think that increasing the number of PGPROC slots, heavyweight locks
>> > etc by 256 isn't going to cause issues?  That's not an insubstantial amount of
>> > memory to dedicate to something that will practically never be used.
>> 
>> I personally have not observed problems with these kinds of bumps in
>> resource usage, although I may be biased towards larger systems where it
>> doesn't matter as much.
> 
> IME it matters *more* on larger systems. Or at least used to, I haven't
> experimented with this in quite a while.
> 
> It's possible that we improved a bunch of things sufficiently for this to not
> matter anymore.

I'm curious if there is something specific you would look into to verify
this.  IIUC one concern is the lock table not fitting into L3.  Is there
anything else?  Any particular workloads you have in mind?

-- 
nathan



Re: allow changing autovacuum_max_workers without restarting

From
Andres Freund
Date:
Hi,

On 2024-06-18 14:00:00 -0500, Nathan Bossart wrote:
> On Mon, Jun 03, 2024 at 04:24:27PM -0700, Andres Freund wrote:
> > On 2024-06-03 14:28:13 -0500, Nathan Bossart wrote:
> >> On Mon, Jun 03, 2024 at 12:08:52PM -0700, Andres Freund wrote:
> >> > Why do we think that increasing the number of PGPROC slots, heavyweight locks
> >> > etc by 256 isn't going to cause issues?  That's not an insubstantial amount of
> >> > memory to dedicate to something that will practically never be used.
> >> 
> >> I personally have not observed problems with these kinds of bumps in
> >> resource usage, although I may be biased towards larger systems where it
> >> doesn't matter as much.
> > 
> > IME it matters *more* on larger systems. Or at least used to, I haven't
> > experimented with this in quite a while.
> > 
> > It's possible that we improved a bunch of things sufficiently for this to not
> > matter anymore.
> 
> I'm curious if there is something specific you would look into to verify
> this.  IIUC one concern is the lock table not fitting into L3.  Is there
> anything else?  Any particular workloads you have in mind?

That was the main thing I was thinking of.


But I think I just thought of one more: It's going to *substantially* increase
the resource usage for tap tests.  Right now Cluster.pm has
        # conservative settings to ensure we can run multiple postmasters:
        print $conf "shared_buffers = 1MB\n";
        print $conf "max_connections = 10\n";

for nodes that allow streaming.

Adding 256 extra backend slots increases the shared memory usage from ~5MB to
~18MB.


I just don't see much point in reserving 256 worker "possibilities", tbh. I
can't think of any practical system where it makes sense to use this much (nor
do I think it's going to be reasonable in the next 10 years) and it's just
going to waste memory and startup time for everyone.

Nor does it make sense to me to have the max autovac workers be independent of
max_connections.

Greetings,

Andres Freund



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Tue, Jun 18, 2024 at 01:43:34PM -0700, Andres Freund wrote:
> I just don't see much point in reserving 256 worker "possibilities", tbh. I
> can't think of any practical system where it makes sense to use this much (nor
> do I think it's going to be reasonable in the next 10 years) and it's just
> going to waste memory and startup time for everyone.

Given this, here are some options I see for moving this forward:

* lower the cap to, say, 64 or 32
* exclude autovacuum worker slots from computing number of locks, etc.
* make the cap configurable and default it to something low (e.g., 8)

My intent with a reserved set of 256 slots was to prevent users from
needing to deal with two GUCs.  For all practical purposes, it would be
possible to change autovacuum_max_workers whenever you want.  But if the
extra resource requirements are too much of a tax, I'm content to change
course.

-- 
nathan



Re: allow changing autovacuum_max_workers without restarting

From
Andres Freund
Date:
Hi,

On 2024-06-18 16:09:09 -0500, Nathan Bossart wrote:
> On Tue, Jun 18, 2024 at 01:43:34PM -0700, Andres Freund wrote:
> > I just don't see much point in reserving 256 worker "possibilities", tbh. I
> > can't think of any practical system where it makes sense to use this much (nor
> > do I think it's going to be reasonable in the next 10 years) and it's just
> > going to waste memory and startup time for everyone.
> 
> Given this, here are some options I see for moving this forward:
> 
> * lower the cap to, say, 64 or 32
> * exclude autovacuum worker slots from computing number of locks, etc.

That seems good regardless

> * make the cap configurable and default it to something low (e.g., 8)


Another one:

Have a general cap of 64, but additionally limit it to something like
     max(1, min(WORKER_CAP, max_connections / 4))

so that cases like tap tests don't end up allocating vastly more worker slots
than actual connection slots.


> My intent with a reserved set of 256 slots was to prevent users from
> needing to deal with two GUCs.  For all practical purposes, it would be
> possible to change autovacuum_max_workers whenever you want.  But if the
> extra resource requirements are too much of a tax, I'm content to change
> course.

Approximately tripling shared memory usage for tap test instances does seem
too much to me.

Greetings,

Andres Freund



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Tue, Jun 18, 2024 at 02:33:31PM -0700, Andres Freund wrote:
> Another one:
> 
> Have a general cap of 64, but additionally limit it to something like
>      max(1, min(WORKER_CAP, max_connections / 4))
> 
> so that cases like tap tests don't end up allocating vastly more worker slots
> than actual connection slots.

That's a clever idea.  My only concern would be that we are tethering two
parameters that aren't super closely related, but I'm unsure whether it
would cause any problems in practice.

-- 
nathan



Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Tue, Jun 18, 2024 at 07:43:36PM -0500, Nathan Bossart wrote:
> On Tue, Jun 18, 2024 at 02:33:31PM -0700, Andres Freund wrote:
>> Another one:
>> 
>> Have a general cap of 64, but additionally limit it to something like
>>      max(1, min(WORKER_CAP, max_connections / 4))
>> 
>> so that cases like tap tests don't end up allocating vastly more worker slots
>> than actual connection slots.
> 
> That's a clever idea.  My only concern would be that we are tethering two
> parameters that aren't super closely related, but I'm unsure whether it
> would cause any problems in practice.

Here is an attempt at doing this.  I've added 0001 [0] and 0002 [1] as
prerequisite patches, which helps simplify 0003 a bit.  It probably doesn't
work correctly for EXEC_BACKEND builds yet.

I'm still not sure about this approach.  At the moment, I'm leaning towards
something more like v2 [2] where the upper limit is a PGC_POSTMASTER GUC
(that we would set very low for TAP tests).

[0] https://commitfest.postgresql.org/48/4998/
[1] https://commitfest.postgresql.org/48/5059/
[2] https://postgr.es/m/20240419154322.GA3988554%40nathanxps13

-- 
nathan

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Fri, Jun 21, 2024 at 03:44:07PM -0500, Nathan Bossart wrote:
> I'm still not sure about this approach.  At the moment, I'm leaning towards
> something more like v2 [2] where the upper limit is a PGC_POSTMASTER GUC
> (that we would set very low for TAP tests).

Like so.

-- 
nathan

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
Here is a rebased patch.

One thing that still bugs me is that there is no feedback sent to the user
when autovacuum_max_workers is set higher than autovacuum_worker_slots.  I
think we should at least emit a WARNING, perhaps from the autovacuum
launcher, i.e., once when the launcher starts and then again as needed via
HandleAutoVacLauncherInterrupts().  Or we could fail to start in
PostmasterMain() and then ignore later misconfigurations via a GUC check
hook.  I'm not too thrilled about adding more GUC check hooks that depend
on the value of other GUCs, but I do like the idea of failing instead of
silently proceeding with a different value than the user configured.  Any
thoughts?

-- 
nathan

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
On Mon, Jul 08, 2024 at 02:29:16PM -0500, Nathan Bossart wrote:
> One thing that still bugs me is that there is no feedback sent to the user
> when autovacuum_max_workers is set higher than autovacuum_worker_slots.  I
> think we should at least emit a WARNING, perhaps from the autovacuum
> launcher, i.e., once when the launcher starts and then again as needed via
> HandleAutoVacLauncherInterrupts().  Or we could fail to start in
> PostmasterMain() and then ignore later misconfigurations via a GUC check
> hook.  I'm not too thrilled about adding more GUC check hooks that depend
> on the value of other GUCs, but I do like the idea of failing instead of
> silently proceeding with a different value than the user configured.  Any
> thoughts?

From recent discussions, it sounds like there isn't much appetite for GUC
check hooks that depend on the values of other GUCs.  Here is a new version
of the patch that adds the WARNING described above.

-- 
nathan

Attachment

Re: allow changing autovacuum_max_workers without restarting

From
Nathan Bossart
Date:
If there are no remaining concerns, I'd like to move forward with
committing v9 in September's commitfest.

-- 
nathan



Re: allow changing autovacuum_max_workers without restarting

From
Yogesh Sharma
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, failed
Spec compliant:           tested, failed
Documentation:            not tested

Hi,
- Tested patch with check-world.
- Verified CheckAutovacuumWorkerGUCs functionality and the correct WARNING was reported.
- For feature specific testing, I created multiple tables and generated bloat. Expected behavior was witnessed.
Lower autovacuum_worker_slots = 16 setting is better suited to start with.

Thanks
Yogesh