Re: Should vacuum process config file reload more often - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Should vacuum process config file reload more often
Date
Msg-id CAAKRu_a-y1eTNLJA3U-SwcjU4SWZ5w1QSXAN6uee3wivcjR6BQ@mail.gmail.com
Whole thread Raw
In response to Re: Should vacuum process config file reload more often  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Should vacuum process config file reload more often  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On Thu, Mar 23, 2023 at 8:27 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Thu, Mar 23, 2023 at 2:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > And, I was wondering if it was worth trying to split up the part that
> > > reloads the config file and all of the autovacuum stuff. The reloading
> > > of the config file by itself won't actually result in autovacuum workers
> > > having updated cost delays because of them overwriting it with
> > > wi_cost_delay, but it will allow VACUUM to have those updated values.
> >
> > It makes sense to me to have changes for overhauling the rebalance
> > mechanism in a separate patch.
> >
> > Looking back at the original concern you mentioned[1]:
> >
> > speed up long-running vacuum of a large table by
> > decreasing autovacuum_vacuum_cost_delay/vacuum_cost_delay, however the
> > config file is only reloaded between tables (for autovacuum) or after
> > the statement (for explicit vacuum).
> >
> > does it make sense to have autovac_balance_cost() update workers'
> > wi_cost_delay too? Autovacuum launcher already reloads the config file
> > and does the rebalance. So I thought autovac_balance_cost() can update
> > the cost_delay as well, and this might be a minimal change to deal
> > with your concern. This doesn't have the effect for manual VACUUM but
> > since vacuum delay is disabled by default it won't be a big problem.
> > As for manual VACUUMs, we would need to reload the config file in
> > vacuum_delay_point() as the part of your patch does. Overhauling the
> > rebalance mechanism would be another patch to improve it further.
>
> So, we can't do this without acquiring an access shared lock on every
> call to vacuum_delay_point() because cost delay is a double.
>
> I will work on a patchset with separate commits for reloading the config
> file, though (with autovac not benefitting in the first commit).

So, I realized we could actually do as you say and have autovac workers
update their wi_cost_delay and keep the balance changes in a separate
commit. I've done this in attached v8.

Workers take the exclusive lock to update their wi_cost_delay and
wi_cost_limit only when there is a config reload. So, there is one
commit that implements this behavior and a separate commit to revise the
worker rebalancing.

Note that we must have the workers also update wi_cost_limit_base and
then call autovac_balance_cost() when they reload the config file
(instead of waiting for launcher to call autovac_balance_cost()) to
avoid potentially calculating the sleep with a new value of cost delay
and an old value of cost limit.

In the commit which revises the worker rebalancing, I'm still wondering
if wi_dobalance should be an atomic flag -- probably not worth it,
right?

On Fri, Mar 24, 2023 at 1:27 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I realized nworkers_for_balance should be initialized to 0 and not 1 --
> 1 is misleading since there are often 0 autovac workers. We just never
> want to use nworkers_for_balance when it is 0. But, workers put a floor
> of 1 on the number when they divide limit/nworkers_for_balance (since
> they know there must be at least one worker right now since they are a
> worker). I thought about whether or not they should call
> autovac_balance_cost() if they find that nworkers_for_balance is 0 when
> updating their own limit, but I'm not sure.

I've gone ahead and updated this. I haven't made the workers call
autovac_balance_cost() if they find that nworkers_for_balance is 0 when
they try and use it when updating their limit because I'm not sure if
this can happen. I would be interested in input here.

I'm also still interested in feedback on the variable name
av_nworkers_for_balance.

> On Fri, Mar 24, 2023 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > I need another few readthroughs to figure out of VacuumFailsafeActive does what
> > > > I think it does, and should be doing, but in general I think this is a good
> > > > idea and a patch in good condition close to being committable.
> >
> > Another approach would be to make VacuumCostActive a ternary value:
> > on, off, and never. When we trigger the failsafe mode we switch it to
> > never, meaning that it never becomes active even after reloading the
> > config file. A good point is that we don't need to add a new global
> > variable, but I'm not sure it's better than the current approach.
>
> Hmm, this is interesting. I don't love the word "never" since it kind of
> implies a duration longer than the current table being vacuumed. But we
> could find a different word or just document it well. For clarity, we
> might want to call it failsafe_mode or something.
>
> I wonder if the primary drawback to converting
> LVRelState->failsafe_active to a global VacuumFailsafeActive is just the
> general rule of limiting scope to the minimum needed.

Okay, so I've changed my mind about this. I like having a ternary for
VacuumCostActive and keeping failsafe_active in LVRelState. What I
didn't like was having non-vacuum code have to care about the
distinction between failsafe + inactive and just inactive. To handle
this, I converted VacuumCostActive to VacuumCostInactive since there are
two inactive cases (inactive and failsafe and plain inactive) and only
one active case. Then, I defined VacuumCostInactive as an int but use
enum values for it in vacuum code to distinguish between failsafe +
inactive and just inactive (I call it VACUUM_COST_INACTIVE_AND_LOCKED
and VACUUM_COST_INACTIVE_AND_UNLOCKED). Non-vacuum code only needs to
check if VacuumCostInactive is 0 like if (!VacuumCostInactive). I'm
happy with the result, and I think it employs only well-defined C
behavior.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
Next
From: Jeff Davis
Date:
Subject: Re: Non-superuser subscription owners