Re: Should vacuum process config file reload more often - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Should vacuum process config file reload more often
Date
Msg-id CAD21AoDi4i4tj1Cf9BG0q2iyVhHisQ2xxVBZajEqExX6t6_b3g@mail.gmail.com
Whole thread Raw
In response to Re: Should vacuum process config file reload more often  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Should vacuum process config file reload more often
List pgsql-hackers
On Sat, Apr 1, 2023 at 4:09 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Mar 31, 2023 at 10:31 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > On Thu, Mar 30, 2023 at 3:26 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > >
> > > > On 30 Mar 2023, at 04:57, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > As another idea, why don't we use macros for that? For example,
> > > > suppose VacuumCostStatus is like:
> > > >
> > > > typedef enum VacuumCostStatus
> > > > {
> > > >    VACUUM_COST_INACTIVE_LOCKED = 0,
> > > >    VACUUM_COST_INACTIVE,
> > > >    VACUUM_COST_ACTIVE,
> > > > } VacuumCostStatus;
> > > > VacuumCostStatus VacuumCost;
> > > >
> > > > non-vacuum code can use the following macros:
> > > >
> > > > #define VacuumCostActive() (VacuumCost == VACUUM_COST_ACTIVE)
> > > > #define VacuumCostInactive() (VacuumCost <= VACUUM_COST_INACTIVE) //
> > > > or we can use !VacuumCostActive() instead.
> > >
> > > I'm in favor of something along these lines.  A variable with a name that
> > > implies a boolean value (active/inactive) but actually contains a tri-value is
> > > easily misunderstood.  A VacuumCostState tri-value variable (or a better name)
> > > with a set of convenient macros for extracting the boolean active/inactive that
> > > most of the code needs to be concerned with would more for more readable code I
> > > think.
> >
> > The macros are very error-prone. I was just implementing this idea and
> > mistakenly tried to set the macro instead of the variable in multiple
> > places. Avoiding this involves another set of macros, and, in the end, I
> > think the complexity is much worse. Given the reviewers' uniform dislike
> > of VacuumCostInactive, I favor going back to two variables
> > (VacuumCostActive + VacuumFailsafeActive) and moving
> > LVRelState->failsafe_active to the global VacuumFailsafeActive.
> >
> > I will reimplement this in the next version.

Thank you for updating the patches. Here are comments for 0001, 0002,
and 0003 patches:

 0001:

@@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
         Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
         Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
                    params->truncate != VACOPTVALUE_AUTO);
-        vacrel->failsafe_active = false;
+        VacuumFailsafeActive = false;

If we go with the idea of using VacuumCostActive +
VacuumFailsafeActive, we need to make sure that both are cleared at
the end of the vacuum per table. Since the patch clears it only here,
it remains true even after vacuum() if we trigger the failsafe mode
for the last table in the table list.

In addition to that,  to ensure that also in an error case, I think we
need to clear it also in PG_FINALLY() block in vacuum().

---
@@ -306,6 +306,7 @@ extern PGDLLIMPORT pg_atomic_uint32
*VacuumSharedCostBalance;
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
 extern PGDLLIMPORT int VacuumCostBalanceLocal;

+extern bool VacuumFailsafeActive;

Do we need PGDLLIMPORT for VacuumFailSafeActive?

0002:

@@ -2388,6 +2398,7 @@ vac_max_items_to_alloc_size(int max_items)
         return offsetof(VacDeadItems, items) +
sizeof(ItemPointerData) * max_items;
 }

+
 /*
  *     vac_tid_reaped() -- is a particular tid deletable?
  *

Unnecessary new line. There are some other unnecessary new lines in this patch.

---
@@ -307,6 +309,8 @@ extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
 extern PGDLLIMPORT int VacuumCostBalanceLocal;

 extern bool VacuumFailsafeActive;
+extern int     VacuumCostLimit;
+extern double VacuumCostDelay;

and

@@ -266,8 +266,6 @@ extern PGDLLIMPORT int max_parallel_maintenance_workers;
 extern PGDLLIMPORT int VacuumCostPageHit;
 extern PGDLLIMPORT int VacuumCostPageMiss;
 extern PGDLLIMPORT int VacuumCostPageDirty;
-extern PGDLLIMPORT int VacuumCostLimit;
-extern PGDLLIMPORT double VacuumCostDelay;

Do we need PGDLLIMPORT too?

---
@@ -1773,20 +1773,33 @@ FreeWorkerInfo(int code, Datum arg)
        }
 }

+
 /*
- * Update the cost-based delay parameters, so that multiple workers consume
- * each a fraction of the total available I/O.
+ * Update vacuum cost-based delay-related parameters for autovacuum workers and
+ * backends executing VACUUM or ANALYZE using the value of relevant gucs and
+ * global state. This must be called during setup for vacuum and after every
+ * config reload to ensure up-to-date values.
  */
 void
-AutoVacuumUpdateDelay(void)
+VacuumUpdateCosts(void)
 {

Isn't it better to define VacuumUpdateCosts() in vacuum.c rather than
autovacuum.c as this is now a common code for both vacuum and
autovacuum?

0003:

@@ -501,9 +502,9 @@ vacuum(List *relations, VacuumParams *params,
         {
                 ListCell   *cur;

-                VacuumUpdateCosts();
                 in_vacuum = true;
-                VacuumCostActive = (VacuumCostDelay > 0);
+                VacuumFailsafeActive = false;
+                VacuumUpdateCosts();

Hmm, if we initialize VacuumFailsafeActive here, should it be included
in 0001 patch?

---
+        if (VacuumCostDelay > 0)
+                VacuumCostActive = true;
+        else
+        {
+                VacuumCostActive = false;
+                VacuumCostBalance = 0;
+        }

I agree to update VacuumCostActive in VacuumUpdateCosts(). But if we
do that I think this change should be included in 0002 patch.

---
+        if (ConfigReloadPending && !analyze_in_outer_xact)
+        {
+                ConfigReloadPending = false;
+                ProcessConfigFile(PGC_SIGHUP);
+                VacuumUpdateCosts();
+        }

Since analyze_in_outer_xact is false by default, we reload the config
file in vacuum_delay_point() by default. We need to note that
vacuum_delay_point() could be called via other paths, for example
gin_cleanup_pending_list() and ambulkdelete() called by
validate_index(). So it seems to me that we should do the opposite; we
have another global variable, say vacuum_can_reload_config, which is
false by default, and is set to true only when vacuum() allows it. In
vacuum_delay_point(), we reload the config file iff
(ConfigReloadPending && vacuum_can_reload_config).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Yurii Rashkovskii
Date:
Subject: Re: [PATCH] Support % wildcard in extension upgrade filenames
Next
From: Noah Misch
Date:
Subject: Re: running logical replication as the subscription owner