Re: Should vacuum process config file reload more often - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Should vacuum process config file reload more often |
Date | |
Msg-id | CAAKRu_YraQ1mtqRJSqv+rvp0Z3VSWhy19gEkAoYqRkmOFRbYXw@mail.gmail.com Whole thread Raw |
In response to | Re: Should vacuum process config file reload more often (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: Should vacuum process config file reload more often
(Masahiko Sawada <sawada.mshk@gmail.com>)
|
List | pgsql-hackers |
On Thu, Mar 23, 2023 at 2:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sun, Mar 19, 2023 at 7:47 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > Do we need to calculate the number of workers running with > nworkers_for_balance by iterating over the running worker list? I > guess autovacuum workers can increment/decrement it at the beginning > and end of vacuum. I don't think we can do that because if a worker crashes, we have no way of knowing if it had incremented or decremented the number, so we can't adjust for it. > > > > > > Also not sure how the patch interacts with failsafe autovac and parallel > > > > > > vacuum. > > > > > > > > > > Good point. > > > > > > > > > > When entering the failsafe mode, we disable the vacuum delays (see > > > > > lazy_check_wraparound_failsafe()). We need to keep disabling the > > > > > vacuum delays even after reloading the config file. One idea is to > > > > > have another global variable indicating we're in the failsafe mode. > > > > > vacuum_delay_point() doesn't update VacuumCostActive if the flag is > > > > > true. > > > > > > > > I think we might not need to do this. Other than in > > > > lazy_check_wraparound_failsafe(), VacuumCostActive is only updated in > > > > two places: > > > > > > > > 1) in vacuum() which autovacuum will call per table. And failsafe is > > > > reset per table as well. > > > > > > > > 2) in vacuum_delay_point(), but, since VacuumCostActive will already be > > > > false when we enter vacuum_delay_point() the next time after > > > > lazy_check_wraparound_failsafe(), we won't set VacuumCostActive there. > > > > > > Indeed. But does it mean that there is no code path to turn > > > vacuum-delay on, even when vacuum_cost_delay is updated from 0 to > > > non-0? > > > > Ah yes! Good point. This is true. > > I'm not sure how to cheaply allow for re-enabling delays after disabling > > them in the middle of a table vacuum. > > > > I don't see a way around checking if we need to reload the config file > > on every call to vacuum_delay_point() (currently, we are only doing this > > when we have to wait anyway). It seems expensive to do this check every > > time. If we do do this, we would update VacuumCostActive when updating > > VacuumCostDelay, and we would need a global variable keeping the > > failsafe status, as you mentioned. > > > > It could be okay to say that you can only disable cost-based delays in > > the middle of vacuuming a table (i.e. you cannot enable them if they are > > already disabled until you start vacuuming the next table). Though maybe > > it is weird that you can increase the delay but not re-enable it... > > On Mon, Mar 20, 2023 at 1:48 AM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > So, I thought about it some more, and I think it is a bit odd that you > > can increase the delay and limit but not re-enable them if they were > > disabled. And, perhaps it would be okay to check ConfigReloadPending at > > the top of vacuum_delay_point() instead of only after sleeping. It is > > just one more branch. We can check if VacuumCostActive is false after > > checking if we should reload and doing so if needed and return early. > > I've implemented that in attached v6. > > > > I added in the global we discussed for VacuumFailsafeActive. If we keep > > it, we can probably remove the one in LVRelState -- as it seems > > redundant. Let me know what you think. > > I think the following change is related: > > - if (!VacuumCostActive || InterruptPending) > + if (InterruptPending || VacuumFailsafeActive || > + (!VacuumCostActive && !ConfigReloadPending)) > return; > > + /* > + * Reload the configuration file if requested. This allows changes to > + * [autovacuum_]vacuum_cost_limit and [autovacuum_]vacuum_cost_delay to > + * take effect while a table is being vacuumed or analyzed. > + */ > + if (ConfigReloadPending && !analyze_in_outer_xact) > + { > + ConfigReloadPending = false; > + ProcessConfigFile(PGC_SIGHUP); > + AutoVacuumUpdateDelay(); > + AutoVacuumUpdateLimit(); > + } > > It makes sense to me that we need to reload the config file even when > vacuum-delay is disabled. But I think it's not convenient for users > that we don't reload the configuration file once the failsafe is > triggered. I think users might want to change some GUCs such as > log_autovacuum_min_duration. Ah, okay. Attached v7 has this change (it reloads even if failsafe is active). > > And, I was wondering if it was worth trying to split up the part that > > reloads the config file and all of the autovacuum stuff. The reloading > > of the config file by itself won't actually result in autovacuum workers > > having updated cost delays because of them overwriting it with > > wi_cost_delay, but it will allow VACUUM to have those updated values. > > It makes sense to me to have changes for overhauling the rebalance > mechanism in a separate patch. > > Looking back at the original concern you mentioned[1]: > > speed up long-running vacuum of a large table by > decreasing autovacuum_vacuum_cost_delay/vacuum_cost_delay, however the > config file is only reloaded between tables (for autovacuum) or after > the statement (for explicit vacuum). > > does it make sense to have autovac_balance_cost() update workers' > wi_cost_delay too? Autovacuum launcher already reloads the config file > and does the rebalance. So I thought autovac_balance_cost() can update > the cost_delay as well, and this might be a minimal change to deal > with your concern. This doesn't have the effect for manual VACUUM but > since vacuum delay is disabled by default it won't be a big problem. > As for manual VACUUMs, we would need to reload the config file in > vacuum_delay_point() as the part of your patch does. Overhauling the > rebalance mechanism would be another patch to improve it further. So, we can't do this without acquiring an access shared lock on every call to vacuum_delay_point() because cost delay is a double. I will work on a patchset with separate commits for reloading the config file, though (with autovac not benefitting in the first commit). On Thu, Mar 23, 2023 at 12:24 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > +bool VacuumFailsafeActive = false; > This needs documentation, how it's used and how it relates to failsafe_active > in LVRelState (which it might replace(?), but until then). Thanks! I've removed LVRelState->failsafe_active. I've also separated the VacuumFailsafeActive change into its own commit. I will say that that commit message needs some work. > + pg_atomic_uint32 nworkers_for_balance; > This needs a short oneline documentation update to the struct comment. Done. I also prefixed with av to match the other members. I am thinking that this variable name could be better. I want to convey that it is the number of workers sharing a cost limit, so I considered av_limit_sharers or something like that. I am looking to convey that it is the number of workers amongst whom we must split the cost limit. > > - double wi_cost_delay; > - int wi_cost_limit; > - int wi_cost_limit_base; > This change makes the below comment in do_autovacuum in need of an update: > /* > * Remove my info from shared memory. We could, but intentionally. > * don't, clear wi_cost_limit and friends --- this is on the > * assumption that we probably have more to do with similar cost > * settings, so we don't want to give up our share of I/O for a very > * short interval and thereby thrash the global balance. > */ Updated to mention wi_dobalance instead. On the topic of wi_dobalance, should we bother making it an atomic flag instead? We would avoid taking a lock a few times, though probably not frequently enough to matter. I was wondering if making it atomically accessible would be less confusing than acquiring a lock only to set one member in do_autovacuum() (and otherwise it is only read). I think if I had to make it an atomic flag, I would reverse the logic and make it wi_skip_balance or something like that. > + if (av_table_option_cost_delay >= 0) > + VacuumCostDelay = av_table_option_cost_delay; > + else > + VacuumCostDelay = autovacuum_vac_cost_delay >= 0 ? > + autovacuum_vac_cost_delay : VacuumCostDelay; > While it's a matter of personal preference, I for one would like if we reduced > the number of ternary operators in the vacuum code, especially those mixed into > if statements. The vacuum code is full of this already though so this isn't > less of an objection (as it follows style) than an observation. I agree. This one was better served as an "else if" anyway -- updated! > > + * note: in cost_limit, zero also means use value from elsewhere, because > + * zero is not a valid value. > ... > + int vac_cost_limit = autovacuum_vac_cost_limit > 0 ? > + autovacuum_vac_cost_limit : VacuumCostLimit; > Not mentioning the fact that a magic value in a GUC means it's using the value > from another GUC (which is not great IMHO), it seems we are using zero as well > as -1 as that magic value here? (not introduced in this patch.) The docs does > AFAICT only specify -1 as that value though. Am I missing something or is the > code and documentation slightly out of sync? I copied that comment from elsewhere, but, yes it is a weird situation. So, you can set autovacuum_vacuum_cost_limit to 0, -1 or a positive number. You can only set vacuum_cost_limit to a positive value. The documentation mentions that setting autovacuum_vacuum_cost_limit to -1, the default, will have it use vacuum_cost_limit. However, it says nothing about what setting it to 0 does. In the code, everywhere assumes if autovacuum_vacuum_cost_limit is 0 OR -1, use vacuum_cost_limit. This is in contrast to autovacuum_vacuum_cost_delay, for which 0 means to disable it -- so setting autovacuum_vacuum_cost_delay to 0 will specifically not fall back to vacuum_cost_limit. I think the problem is that 0 is not a valid cost limit (i.e. it has no meaning like infinity/no limit), so we basically don't want to allow the cost limit to be set to 0, but GUC values have to be a range with a max and a min, so we can't just exclude 0 if we want to allow -1 (as far as I know). I think it would be nice to be able to specify multiple valid ranges for GUCs to the GUC machinery. So, to answer your question, yes, the code and docs are a bit out-of-sync. > I need another few readthroughs to figure out of VacuumFailsafeActive does what > I think it does, and should be doing, but in general I think this is a good > idea and a patch in good condition close to being committable. I will take a pass at splitting up the main commit into two. However, I have attached a new version with the other specific updates discussed in this thread. Feel free to provide review on this version in the meantime. - Melanie
Attachment
pgsql-hackers by date: