Re: Should vacuum process config file reload more often - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Should vacuum process config file reload more often |
Date | |
Msg-id | CAAKRu_bkg+0WAGTxsy0HP75RzFbPwqz3ctYq1kThjFhSL69upw@mail.gmail.com Whole thread Raw |
In response to | Re: Should vacuum process config file reload more often (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Should vacuum process config file reload more often
Re: Should vacuum process config file reload more often |
List | pgsql-hackers |
On Sun, Apr 2, 2023 at 10:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for updating the patches. Here are comments for 0001, 0002, > and 0003 patches: Thanks for the review! v13 attached with requested updates. > 0001: > > @@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, > Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED); > Assert(params->truncate != VACOPTVALUE_UNSPECIFIED && > params->truncate != VACOPTVALUE_AUTO); > - vacrel->failsafe_active = false; > + VacuumFailsafeActive = false; > > If we go with the idea of using VacuumCostActive + > VacuumFailsafeActive, we need to make sure that both are cleared at > the end of the vacuum per table. Since the patch clears it only here, > it remains true even after vacuum() if we trigger the failsafe mode > for the last table in the table list. > > In addition to that, to ensure that also in an error case, I think we > need to clear it also in PG_FINALLY() block in vacuum(). So, in 0001, I tried to keep it exactly the same as LVRelState->failsafe_active except for it being a global. We don't actually use VacuumFailsafeActive in this commit except in vacuumlazy.c, which does its own management of the value (it resets it to false at the top of heap_vacuum_rel()). In the later commit which references VacuumFailsafeActive outside of vacuumlazy.c, I had reset it in PG_FINALLY(). I hadn't reset it in the relation list loop in vacuum(). Autovacuum calls vacuum() for each relation. However, you are right that for VACUUM with a list of relations for a table access method other than heap, once set to true, if the table AM forgets to reset the value to false at the end of vacuuming the relation, it would stay true. I've set it to false now at the bottom of the loop through relations in vacuum(). > --- > @@ -306,6 +306,7 @@ extern PGDLLIMPORT pg_atomic_uint32 > *VacuumSharedCostBalance; > extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers; > extern PGDLLIMPORT int VacuumCostBalanceLocal; > > +extern bool VacuumFailsafeActive; > > Do we need PGDLLIMPORT for VacuumFailSafeActive? I didn't add one because I thought extensions and other code probably shouldn't access this variable. I thought PGDLLIMPORT was only needed for extensions built on windows to access variables. > 0002: > > @@ -2388,6 +2398,7 @@ vac_max_items_to_alloc_size(int max_items) > return offsetof(VacDeadItems, items) + > sizeof(ItemPointerData) * max_items; > } > > + > /* > * vac_tid_reaped() -- is a particular tid deletable? > * > > Unnecessary new line. There are some other unnecessary new lines in this patch. Thanks! I think I got them all. > --- > @@ -307,6 +309,8 @@ extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers; > extern PGDLLIMPORT int VacuumCostBalanceLocal; > > extern bool VacuumFailsafeActive; > +extern int VacuumCostLimit; > +extern double VacuumCostDelay; > > and > > @@ -266,8 +266,6 @@ extern PGDLLIMPORT int max_parallel_maintenance_workers; > extern PGDLLIMPORT int VacuumCostPageHit; > extern PGDLLIMPORT int VacuumCostPageMiss; > extern PGDLLIMPORT int VacuumCostPageDirty; > -extern PGDLLIMPORT int VacuumCostLimit; > -extern PGDLLIMPORT double VacuumCostDelay; > > Do we need PGDLLIMPORT too? I was on the fence about this. I annotated the new guc variables vacuum_cost_delay and vacuum_cost_limit with PGDLLIMPORT, but I did not annotate the variables used in vacuum code (VacuumCostLimit/Delay). I think whether or not this is the right choice depends on two things: whether or not my understanding of PGDLLIMPORT is correct and, if it is, whether or not we want extensions to be able to access VacuumCostLimit/Delay or if just access to the guc variables is sufficient/desirable. > --- > @@ -1773,20 +1773,33 @@ FreeWorkerInfo(int code, Datum arg) > } > } > > + > /* > - * Update the cost-based delay parameters, so that multiple workers consume > - * each a fraction of the total available I/O. > + * Update vacuum cost-based delay-related parameters for autovacuum workers and > + * backends executing VACUUM or ANALYZE using the value of relevant gucs and > + * global state. This must be called during setup for vacuum and after every > + * config reload to ensure up-to-date values. > */ > void > -AutoVacuumUpdateDelay(void) > +VacuumUpdateCosts(void > > Isn't it better to define VacuumUpdateCosts() in vacuum.c rather than > autovacuum.c as this is now a common code for both vacuum and > autovacuum? We can't access members of WorkerInfoData from inside vacuum.c > 0003: > > @@ -501,9 +502,9 @@ vacuum(List *relations, VacuumParams *params, > { > ListCell *cur; > > - VacuumUpdateCosts(); > in_vacuum = true; > - VacuumCostActive = (VacuumCostDelay > 0); > + VacuumFailsafeActive = false; > + VacuumUpdateCosts(); > > Hmm, if we initialize VacuumFailsafeActive here, should it be included > in 0001 patch? See comment above. This is the first patch where we use or reference it outside of vacuumlazy.c > --- > + if (VacuumCostDelay > 0) > + VacuumCostActive = true; > + else > + { > + VacuumCostActive = false; > + VacuumCostBalance = 0; > + } > > I agree to update VacuumCostActive in VacuumUpdateCosts(). But if we > do that I think this change should be included in 0002 patch. I'm a bit hesitant to do this because in 0002 VacuumCostActive cannot change status while vacuuming a table or even between tables for VACUUM when a list of relations is specified (except for being disabled by failsafe mode) Adding it to VacuumUpdateCosts() in 0003 makes it clear that it could change while vacuuming a table, so we must update it. I previously had 0002 introduce AutoVacuumUpdateLimit(), which only updated VacuumCostLimit with wi_cost_limit for autovacuum workers and then called that in vacuum_delay_point() (instead of AutoVacuumUpdateDelay() or VacuumUpdateCosts()). I abandoned that idea in favor of the simplicity of having VacuumUpdateCosts() just update those variables for everyone, since it could be reused in 0003. Now, I'm thinking the previous method might be more clear? Or is what I have okay? > --- > + if (ConfigReloadPending && !analyze_in_outer_xact) > + { > + ConfigReloadPending = false; > + ProcessConfigFile(PGC_SIGHUP); > + VacuumUpdateCosts(); > + } > > Since analyze_in_outer_xact is false by default, we reload the config > file in vacuum_delay_point() by default. We need to note that > vacuum_delay_point() could be called via other paths, for example > gin_cleanup_pending_list() and ambulkdelete() called by > validate_index(). So it seems to me that we should do the opposite; we > have another global variable, say vacuum_can_reload_config, which is > false by default, and is set to true only when vacuum() allows it. In > vacuum_delay_point(), we reload the config file iff > (ConfigReloadPending && vacuum_can_reload_config). Wow, great point. Thanks for catching this. I've made the update you suggested. I also set vacuum_can_reload_config to false in PG_FINALLY() in vacuum(). - Melanie
Attachment
pgsql-hackers by date: