Re: do only critical work during single-user vacuum? - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: do only critical work during single-user vacuum?
Date
Msg-id CAD21AoD06hKYbx_vYj8WGA=ccPHKZ30R8dgX1aFXHbndyyu+pw@mail.gmail.com
Whole thread Raw
In response to Re: do only critical work during single-user vacuum?  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: do only critical work during single-user vacuum?  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Wed, Feb 2, 2022 at 6:50 AM John Naylor <john.naylor@enterprisedb.com> wrote:
>
> On Thu, Jan 27, 2022 at 8:28 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> > I'm sure you meant "&" here (fixed in attached patch to appease the cfbot):
> > +               if (options | VACOPT_MINIMAL)
>
> Thanks for catching that! That copy-pasto was also masking my failure
> to process the option properly -- fixed in the attached as v5.
>
> > It should either refuse to run if a list of tables is specified with MINIMAL,
> > or it should filter that list by XID condition.
>
> I went with the former for simplicity. As a single-purpose option, it
> makes sense.
>
> > As for the name, it could be MINIMAL or FAILSAFE or EMERGENCY or ??
> > I think the name should actually be a bit more descriptive, and maybe say XID,
> > like MINIMAL_XID or XID_EMERGENCY...
>
> I went with EMERGENCY in this version to reinforce its purpose in the
> mind of the user (and reader of this code).
>
> > Normally, options are independent, but VACUUM (MINIMAL) is a "shortcut" to a
> > hardcoded set of options: freeze on, truncate off, cleanup off.  So it refuses
> > to be combined with other options - good.
> >
> > This is effectively a shortcut to hypothetical parameters for selecting tables
> > by XID/MXID age.  In the future, someone could debate adding user-facing knobs
> > for table selection by age.
>
> I used the params struct in v5 for the emergency cutoff ages. Even
> with the values hard-coded, it seems cleaner to keep them here.
>
> > I still wonder if the relations should be processed in order of decreasing age.
> > An admin might have increased autovacuum_freeze_max_age up to 2e9, and your
> > query might return thousands of tables, with a wide range of sizes and ages.
> >
> > Processing them in order of decreasing age would allow the admin to quickly
> > vacuum the oldest tables, and optionally interrupt vacuum to get out of single
> > user mode ASAP - even if their just want to run VACUUM(MINIMAL) in a normal
> > backend when services aren't offline.  Processing them out of order might be
> > pretty surprising - they might run vacuum for an hour (or overnight), cancel
> > it, attempt to start the DB in normal mode, and conclude that it made no
> > visible progress.
>
> While that seems like a nice property to have, it does complicate
> things, so can be left for follow-on work.
>
> Also in v5:
>
> - It mentions the new command in the error hint in
> GetNewTransactionId(). I'm not sure if multi-word commands should be
> quoted like this.
> - A first draft of documentation

Thank you for updating the patch.

I have a few questions and comments:

+      The only other option that may be combined with
<literal>VERBOSE</literal>, although in single-user mode no client
messages are
+      output.

Given VERBOSE with EMERGENCY can work only in multi-user mode, why
only VERBOSE can be specified with EMERGENCY? I think the same is true
for other options like PARALLEL; PARALLEL can work only in multi-user
mode.

---
+      It performs a database-wide vacuum on tables, toast tables, and
materialized views whose
+      xid age or mxid age is older than 1 billion.

Do we need to allow the user to specify the threshold or need a higher
value (at least larger than 1.6 billion, default value of
vacuum_failsafe_age)? I imagined a case where there are a few very-old
tables (say 2 billion old) and many tables that are older than 1
billion. In this case, VACUUM (EMERGENCY) would take a long time to
complete. But to minimize the downtime, we might want to run VACUUM
(EMERGENCY) on only the very-old tables, start the cluster in
multi-user mode, and run vacuum on multiple tables in parallel.

---
+       if (params->options & VACOPT_EMERGENCY)
+       {
+           /*
+           * Only consider relations able to hold unfrozen XIDs (anything else
+           * should have InvalidTransactionId in relfrozenxid anyway).
+           */
+           if (classForm->relkind != RELKIND_RELATION &&
+               classForm->relkind != RELKIND_MATVIEW &&
+               classForm->relkind != RELKIND_TOASTVALUE)
+           {
+               Assert(!TransactionIdIsValid(classForm->relfrozenxid));
+               Assert(!MultiXactIdIsValid(classForm->relminmxid));
+               continue;
+           }
+
+           table_xid_age = DirectFunctionCall1(xid_age,
classForm->relfrozenxid);
+           table_mxid_age = DirectFunctionCall1(mxid_age,
classForm->relminmxid);
+

I think that instead of calling xid_age and mxid_age for each
relation, we can compute the thresholds for xid and mxid once, and
then compare them to relation's relfrozenxid and relminmxid.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: RFC: Logging plan of the running query
Next
From: Peter Smith
Date:
Subject: Re: row filtering for logical replication