Thread: Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
[ redirecting to -hackers because patch attached ] Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That is a really good point. How about teaching VACUUM to track >> the oldest original relfrozenxid and relminmxid among the table(s) >> it processed, and skip vac_update_datfrozenxid unless at least one >> of those matches the database's values? For extra credit, also >> skip if we didn't successfully advance the source rel's value. > Hmm. I think that that would probably work. I poked into that idea some more and concluded that getting VACUUM to manage it behind the user's back is not going to work very reliably. The key problem is explained by this existing comment in autovacuum.c: * Even if we didn't vacuum anything, it may still be important to do * this, because one indirect effect of vac_update_datfrozenxid() is to * update ShmemVariableCache->xidVacLimit. That might need to be done * even if we haven't vacuumed anything, because relations with older * relfrozenxid values or other databases with older datfrozenxid values * might have been dropped, allowing xidVacLimit to advance. That is, if the table that's holding back datfrozenxid gets dropped between VACUUM runs, VACUUM would never think that it might have advanced the global minimum. I'm forced to the conclusion that we have to expose some VACUUM options if we want this to work well. Attached is a draft patch that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options (name bikeshedding welcome) and teaches vacuumdb to use them. Light testing says that this is a win: even on the regression database, which isn't all that big, I see a drop in vacuumdb's runtime from ~260 ms to ~175 ms. Of course this is a case where VACUUM doesn't really have anything to do, so it's a best-case scenario ... but still, I was expecting the effect to be barely above noise with this many tables, yet it's a good bit more. regards, tom lane diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index e14ead8826..443928f286 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -36,6 +36,8 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet PROCESS_TOAST [ <replaceable class="parameter">boolean</replaceable> ] TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ] PARALLEL <replaceable class="parameter">integer</replaceable> + SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ] + ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ] <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase> @@ -295,6 +297,40 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet </listitem> </varlistentry> + <varlistentry> + <term><literal>SKIP_DATABASE_STATS</literal></term> + <listitem> + <para> + Specifies that <command>VACUUM</command> should skip updating the + database-wide statistics about oldest unfrozen XIDs. Normally + <command>VACUUM</command> will update these statistics once at the + end of the command. However, this can take awhile in a database + with a very large number of tables, and it will accomplish nothing + unless the table that had contained the oldest unfrozen XID was + among those vacuumed. Moreover, if multiple <command>VACUUM</command> + commands are issued in parallel, only one of them can update the + database-wide statistics at a time. Therefore, if an application + intends to issue a series of many <command>VACUUM</command> + commands, it can be helpful to set this option in all but the last + such command; or set it in all the commands and separately + issue <literal>VACUUM (ONLY_DATABASE_STATS)</literal> afterwards. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>ONLY_DATABASE_STATS</literal></term> + <listitem> + <para> + Specifies that <command>VACUUM</command> should do nothing except + update the database-wide statistics about oldest unfrozen XIDs. + When this option is specified, + the <replaceable class="parameter">table_and_columns</replaceable> + list must be empty, and no other option may be enabled. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><replaceable class="parameter">boolean</replaceable></term> <listitem> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index ba965b8c7b..055c1a146b 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -114,6 +114,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) bool full = false; bool disable_page_skipping = false; bool process_toast = true; + bool skip_database_stats = false; + bool only_database_stats = false; ListCell *lc; /* index_cleanup and truncate values unspecified for now */ @@ -200,6 +202,10 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) params.nworkers = nworkers; } } + else if (strcmp(opt->defname, "skip_database_stats") == 0) + skip_database_stats = defGetBoolean(opt); + else if (strcmp(opt->defname, "only_database_stats") == 0) + only_database_stats = defGetBoolean(opt); else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), @@ -216,7 +222,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) (freeze ? VACOPT_FREEZE : 0) | (full ? VACOPT_FULL : 0) | (disable_page_skipping ? VACOPT_DISABLE_PAGE_SKIPPING : 0) | - (process_toast ? VACOPT_PROCESS_TOAST : 0); + (process_toast ? VACOPT_PROCESS_TOAST : 0) | + (skip_database_stats ? VACOPT_SKIP_DATABASE_STATS : 0) | + (only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0); /* sanity checks on options */ Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE)); @@ -349,6 +357,23 @@ vacuum(List *relations, VacuumParams *params, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("PROCESS_TOAST required with VACUUM FULL"))); + /* sanity check for ONLY_DATABASE_STATS */ + if (params->options & VACOPT_ONLY_DATABASE_STATS) + { + Assert(params->options & VACOPT_VACUUM); + if (relations != NIL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("ONLY_DATABASE_STATS cannot be specified with a list of tables"))); + /* don't require people to turn off PROCESS_TOAST explicitly */ + if (params->options & ~(VACOPT_VACUUM | + VACOPT_PROCESS_TOAST | + VACOPT_ONLY_DATABASE_STATS)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("ONLY_DATABASE_STATS cannot be specified with other VACUUM options"))); + } + /* * Create special memory context for cross-transaction storage. * @@ -376,7 +401,12 @@ vacuum(List *relations, VacuumParams *params, * Build list of relation(s) to process, putting any new data in * vac_context for safekeeping. */ - if (relations != NIL) + if (params->options & VACOPT_ONLY_DATABASE_STATS) + { + /* We don't process any tables in this case */ + Assert(relations == NIL); + } + else if (relations != NIL) { List *newrels = NIL; ListCell *lc; @@ -528,11 +558,11 @@ vacuum(List *relations, VacuumParams *params, StartTransactionCommand(); } - if ((params->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess()) + if ((params->options & VACOPT_VACUUM) && + !(params->options & VACOPT_SKIP_DATABASE_STATS)) { /* * Update pg_database.datfrozenxid, and truncate pg_xact if possible. - * (autovacuum.c does this for itself.) */ vac_update_datfrozenxid(); } @@ -560,13 +590,14 @@ vacuum_is_permitted_for_relation(Oid relid, Form_pg_class reltuple, Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0); - /* + /*---------- * A role has privileges to vacuum or analyze the relation if any of the * following are true: * - the role is a superuser * - the role owns the relation * - the role owns the current database and the relation is not shared * - the role has been granted the MAINTAIN privilege on the relation + *---------- */ if (object_ownercheck(RelationRelationId, relid, GetUserId()) || (object_ownercheck(DatabaseRelationId, MyDatabaseId, GetUserId()) && !reltuple->relisshared) || diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 0746d80224..b8472b4c21 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2854,8 +2854,13 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, tab->at_relid = relid; tab->at_sharedrel = classForm->relisshared; - /* Note that this skips toast relations */ - tab->at_params.options = (dovacuum ? VACOPT_VACUUM : 0) | + /* + * Select VACUUM options. Note we don't say VACOPT_PROCESS_TOAST, so + * that vacuum() skips toast relations. Also note we tell vacuum() to + * skip vac_update_datfrozenxid(); we'll do that separately. + */ + tab->at_params.options = + (dovacuum ? (VACOPT_VACUUM | VACOPT_SKIP_DATABASE_STATS) : 0) | (doanalyze ? VACOPT_ANALYZE : 0) | (!wraparound ? VACOPT_SKIP_LOCKED : 0); diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 2a3921937c..84e25e1552 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -4552,8 +4552,9 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("FULL", "FREEZE", "ANALYZE", "VERBOSE", "DISABLE_PAGE_SKIPPING", "SKIP_LOCKED", "INDEX_CLEANUP", "PROCESS_TOAST", - "TRUNCATE", "PARALLEL"); - else if (TailMatches("FULL|FREEZE|ANALYZE|VERBOSE|DISABLE_PAGE_SKIPPING|SKIP_LOCKED|PROCESS_TOAST|TRUNCATE")) + "TRUNCATE", "PARALLEL", "SKIP_DATABASE_STATS", + "ONLY_DATABASE_STATS"); + else if (TailMatches("FULL|FREEZE|ANALYZE|VERBOSE|DISABLE_PAGE_SKIPPING|SKIP_LOCKED|PROCESS_TOAST|TRUNCATE|SKIP_DATABASE_STATS|ONLY_DATABASE_STATS")) COMPLETE_WITH("ON", "OFF"); else if (TailMatches("INDEX_CLEANUP")) COMPLETE_WITH("AUTO", "ON", "OFF"); diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl index e5343774fe..6154bb46b7 100644 --- a/src/bin/scripts/t/100_vacuumdb.pl +++ b/src/bin/scripts/t/100_vacuumdb.pl @@ -22,15 +22,15 @@ $node->issues_sql_like( 'SQL VACUUM run'); $node->issues_sql_like( [ 'vacuumdb', '-f', 'postgres' ], - qr/statement: VACUUM \(FULL\).*;/, + qr/statement: VACUUM \(SKIP_DATABASE_STATS, FULL\).*;/, 'vacuumdb -f'); $node->issues_sql_like( [ 'vacuumdb', '-F', 'postgres' ], - qr/statement: VACUUM \(FREEZE\).*;/, + qr/statement: VACUUM \(SKIP_DATABASE_STATS, FREEZE\).*;/, 'vacuumdb -F'); $node->issues_sql_like( [ 'vacuumdb', '-zj2', 'postgres' ], - qr/statement: VACUUM \(ANALYZE\).*;/, + qr/statement: VACUUM \(SKIP_DATABASE_STATS, ANALYZE\).*;/, 'vacuumdb -zj2'); $node->issues_sql_like( [ 'vacuumdb', '-Z', 'postgres' ], @@ -38,11 +38,11 @@ $node->issues_sql_like( 'vacuumdb -Z'); $node->issues_sql_like( [ 'vacuumdb', '--disable-page-skipping', 'postgres' ], - qr/statement: VACUUM \(DISABLE_PAGE_SKIPPING\).*;/, + qr/statement: VACUUM \(DISABLE_PAGE_SKIPPING, SKIP_DATABASE_STATS\).*;/, 'vacuumdb --disable-page-skipping'); $node->issues_sql_like( [ 'vacuumdb', '--skip-locked', 'postgres' ], - qr/statement: VACUUM \(SKIP_LOCKED\).*;/, + qr/statement: VACUUM \(SKIP_DATABASE_STATS, SKIP_LOCKED\).*;/, 'vacuumdb --skip-locked'); $node->issues_sql_like( [ 'vacuumdb', '--skip-locked', '--analyze-only', 'postgres' ], @@ -53,32 +53,32 @@ $node->command_fails( '--analyze-only and --disable-page-skipping specified together'); $node->issues_sql_like( [ 'vacuumdb', '--no-index-cleanup', 'postgres' ], - qr/statement: VACUUM \(INDEX_CLEANUP FALSE\).*;/, + qr/statement: VACUUM \(INDEX_CLEANUP FALSE, SKIP_DATABASE_STATS\).*;/, 'vacuumdb --no-index-cleanup'); $node->command_fails( [ 'vacuumdb', '--analyze-only', '--no-index-cleanup', 'postgres' ], '--analyze-only and --no-index-cleanup specified together'); $node->issues_sql_like( [ 'vacuumdb', '--no-truncate', 'postgres' ], - qr/statement: VACUUM \(TRUNCATE FALSE\).*;/, + qr/statement: VACUUM \(TRUNCATE FALSE, SKIP_DATABASE_STATS\).*;/, 'vacuumdb --no-truncate'); $node->command_fails( [ 'vacuumdb', '--analyze-only', '--no-truncate', 'postgres' ], '--analyze-only and --no-truncate specified together'); $node->issues_sql_like( [ 'vacuumdb', '--no-process-toast', 'postgres' ], - qr/statement: VACUUM \(PROCESS_TOAST FALSE\).*;/, + qr/statement: VACUUM \(PROCESS_TOAST FALSE, SKIP_DATABASE_STATS\).*;/, 'vacuumdb --no-process-toast'); $node->command_fails( [ 'vacuumdb', '--analyze-only', '--no-process-toast', 'postgres' ], '--analyze-only and --no-process-toast specified together'); $node->issues_sql_like( [ 'vacuumdb', '-P', 2, 'postgres' ], - qr/statement: VACUUM \(PARALLEL 2\).*;/, + qr/statement: VACUUM \(SKIP_DATABASE_STATS, PARALLEL 2\).*;/, 'vacuumdb -P 2'); $node->issues_sql_like( [ 'vacuumdb', '-P', 0, 'postgres' ], - qr/statement: VACUUM \(PARALLEL 0\).*;/, + qr/statement: VACUUM \(SKIP_DATABASE_STATS, PARALLEL 0\).*;/, 'vacuumdb -P 0'); $node->command_ok([qw(vacuumdb -Z --table=pg_am dbname=template1)], 'vacuumdb with connection string'); @@ -119,7 +119,7 @@ $node->command_fails([ 'vacuumdb', '-P', -1, 'postgres' ], 'negative parallel degree'); $node->issues_sql_like( [ 'vacuumdb', '--analyze', '--table', 'vactable(a, b)', 'postgres' ], - qr/statement: VACUUM \(ANALYZE\) public.vactable\(a, b\);/, + qr/statement: VACUUM \(SKIP_DATABASE_STATS, ANALYZE\) public.vactable\(a, b\);/, 'vacuumdb --analyze with complete column list'); $node->issues_sql_like( [ 'vacuumdb', '--analyze-only', '--table', 'vactable(b)', 'postgres' ], @@ -150,7 +150,7 @@ $node->issues_sql_like( 'vacuumdb --table --min-xid-age'); $node->issues_sql_like( [ 'vacuumdb', '--schema', '"Foo"', 'postgres' ], - qr/VACUUM "Foo".bar/, + qr/VACUUM \(SKIP_DATABASE_STATS\) "Foo".bar/, 'vacuumdb --schema'); $node->issues_sql_like( [ 'vacuumdb', '--exclude-schema', '"Foo"', 'postgres' ], diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index 272e37d290..698ca8791a 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -44,6 +44,7 @@ typedef struct vacuumingOptions bool force_index_cleanup; bool do_truncate; bool process_toast; + bool skip_database_stats; } vacuumingOptions; /* object filter options */ @@ -533,6 +534,9 @@ vacuum_one_database(ConnParams *cparams, pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s", "--parallel", "13"); + /* skip_database_stats is used automatically if server supports it */ + vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000); + if (!quiet) { if (stage != ANALYZE_NO_STAGE) @@ -792,6 +796,12 @@ vacuum_one_database(ConnParams *cparams, if (!ParallelSlotsWaitCompletion(sa)) failed = true; + /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ + if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE && !failed) + { + executeCommand(conn, "VACUUM (ONLY_DATABASE_STATS);", echo); + } + finish: ParallelSlotsTerminate(sa); pg_free(sa); @@ -957,6 +967,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion, appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep); sep = comma; } + if (vacopts->skip_database_stats) + { + /* SKIP_DATABASE_STATS is supported since v16 */ + Assert(serverVersion >= 160000); + appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep); + sep = comma; + } if (vacopts->skip_locked) { /* SKIP_LOCKED is supported since v12 */ diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index 2f274f2bec..7c3ca749fd 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -188,6 +188,8 @@ typedef struct VacAttrStats #define VACOPT_SKIP_LOCKED 0x20 /* skip if cannot get lock */ #define VACOPT_PROCESS_TOAST 0x40 /* process the TOAST table, if any */ #define VACOPT_DISABLE_PAGE_SKIPPING 0x80 /* don't skip any pages */ +#define VACOPT_SKIP_DATABASE_STATS 0x100 /* skip vac_update_datfrozenxid() */ +#define VACOPT_ONLY_DATABASE_STATS 0x200 /* only vac_update_datfrozenxid() */ /* * Values used by index_cleanup and truncate params. diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out index 0035d158b7..d860be0e20 100644 --- a/src/test/regress/expected/vacuum.out +++ b/src/test/regress/expected/vacuum.out @@ -282,6 +282,12 @@ ALTER TABLE vactst ALTER COLUMN t SET STORAGE EXTERNAL; VACUUM (PROCESS_TOAST FALSE) vactst; VACUUM (PROCESS_TOAST FALSE, FULL) vactst; ERROR: PROCESS_TOAST required with VACUUM FULL +-- SKIP_DATABASE_STATS option +VACUUM (SKIP_DATABASE_STATS) vactst; +-- ONLY_DATABASE_STATS option +VACUUM (ONLY_DATABASE_STATS); +VACUUM (ONLY_DATABASE_STATS) vactst; -- error +ERROR: ONLY_DATABASE_STATS cannot be specified with a list of tables DROP TABLE vaccluster; DROP TABLE vactst; DROP TABLE vacparted; diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql index 9faa8a34a6..9da8f3e830 100644 --- a/src/test/regress/sql/vacuum.sql +++ b/src/test/regress/sql/vacuum.sql @@ -237,6 +237,13 @@ ALTER TABLE vactst ALTER COLUMN t SET STORAGE EXTERNAL; VACUUM (PROCESS_TOAST FALSE) vactst; VACUUM (PROCESS_TOAST FALSE, FULL) vactst; +-- SKIP_DATABASE_STATS option +VACUUM (SKIP_DATABASE_STATS) vactst; + +-- ONLY_DATABASE_STATS option +VACUUM (ONLY_DATABASE_STATS); +VACUUM (ONLY_DATABASE_STATS) vactst; -- error + DROP TABLE vaccluster; DROP TABLE vactst; DROP TABLE vacparted;
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Nathan Bossart
Date:
On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote: > I'm forced to the conclusion that we have to expose some VACUUM > options if we want this to work well. Attached is a draft patch > that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options > (name bikeshedding welcome) and teaches vacuumdb to use them. This is the conclusion I arrived at, too. In fact, I was just about to post a similar patch set. I'm attaching it here anyway, but I'm fine with proceeding with your version. I think the main difference between your patch and mine is that I've exposed vac_update_datfrozenxid() via a function instead of a VACUUM option. IMHO that feels a little more natural, but I can't say I feel too strongly about it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes: > I think the main difference between your patch and mine is that I've > exposed vac_update_datfrozenxid() via a function instead of a VACUUM > option. IMHO that feels a little more natural, but I can't say I feel too > strongly about it. I thought about that but it seems fairly unsafe, because that means that vac_update_datfrozenxid is executing inside a user-controlled transaction. I don't think it will hurt us if the user does a ROLLBACK afterward --- but if he sits on the open transaction, that would be bad, if only because we're still holding the LockDatabaseFrozenIds lock which will block other VACUUMs. There might be more hazards besides that; certainly no one has ever tried to run vac_update_datfrozenxid that way before. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Nathan Bossart
Date:
On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote: > + /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ > + if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE && !failed) > + { > + executeCommand(conn, "VACUUM (ONLY_DATABASE_STATS);", echo); > + } When I looked at this, I thought it would be better to send the command through the parallel slot machinery so that failures would use the same code path as the rest of the VACUUM commands. However, you also need to adjust ParallelSlotsWaitCompletion() to mark the slots as idle so that the slot array can be reused after it is called. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Nathan Bossart
Date:
On Wed, Dec 28, 2022 at 04:20:19PM -0500, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> I think the main difference between your patch and mine is that I've >> exposed vac_update_datfrozenxid() via a function instead of a VACUUM >> option. IMHO that feels a little more natural, but I can't say I feel too >> strongly about it. > > I thought about that but it seems fairly unsafe, because that means > that vac_update_datfrozenxid is executing inside a user-controlled > transaction. I don't think it will hurt us if the user does a > ROLLBACK afterward --- but if he sits on the open transaction, > that would be bad, if only because we're still holding the > LockDatabaseFrozenIds lock which will block other VACUUMs. > There might be more hazards besides that; certainly no one has ever > tried to run vac_update_datfrozenxid that way before. That's a good point. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes: > On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote: >> + executeCommand(conn, "VACUUM (ONLY_DATABASE_STATS);", echo); > When I looked at this, I thought it would be better to send the command > through the parallel slot machinery so that failures would use the same > code path as the rest of the VACUUM commands. However, you also need to > adjust ParallelSlotsWaitCompletion() to mark the slots as idle so that the > slot array can be reused after it is called. Hm. I was just copying the way commands are issued further up in the same function. But I think you're right: once we've done ParallelSlotsAdoptConn(sa, conn); it's probably not entirely kosher to use the conn directly. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Justin Pryzby
Date:
On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> That is a really good point. How about teaching VACUUM to track > >> the oldest original relfrozenxid and relminmxid among the table(s) > >> it processed, and skip vac_update_datfrozenxid unless at least one > >> of those matches the database's values? For extra credit, also > >> skip if we didn't successfully advance the source rel's value. > > > Hmm. I think that that would probably work. > > I'm forced to the conclusion that we have to expose some VACUUM > options if we want this to work well. Attached is a draft patch > that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options > (name bikeshedding welcome) and teaches vacuumdb to use them. I was surprised to hear that this added *two* options. I assumed it would look like: VACUUM (UPDATE_DATABASE_STATS {yes,no,only}) -- Justin
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote: >> I'm forced to the conclusion that we have to expose some VACUUM >> options if we want this to work well. Attached is a draft patch >> that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options >> (name bikeshedding welcome) and teaches vacuumdb to use them. > I assumed it would look like: > VACUUM (UPDATE_DATABASE_STATS {yes,no,only}) Meh. We could do it like that, but I think options that look like booleans but aren't are messy. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Andres Freund
Date:
Hi, On 2022-12-28 15:13:23 -0500, Tom Lane wrote: > [ redirecting to -hackers because patch attached ] > > Peter Geoghegan <pg@bowt.ie> writes: > > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> That is a really good point. How about teaching VACUUM to track > >> the oldest original relfrozenxid and relminmxid among the table(s) > >> it processed, and skip vac_update_datfrozenxid unless at least one > >> of those matches the database's values? For extra credit, also > >> skip if we didn't successfully advance the source rel's value. > > > Hmm. I think that that would probably work. > > I poked into that idea some more and concluded that getting VACUUM to > manage it behind the user's back is not going to work very reliably. > The key problem is explained by this existing comment in autovacuum.c: > > * Even if we didn't vacuum anything, it may still be important to do > * this, because one indirect effect of vac_update_datfrozenxid() is to > * update ShmemVariableCache->xidVacLimit. That might need to be done > * even if we haven't vacuumed anything, because relations with older > * relfrozenxid values or other databases with older datfrozenxid values > * might have been dropped, allowing xidVacLimit to advance. > > That is, if the table that's holding back datfrozenxid gets dropped > between VACUUM runs, VACUUM would never think that it might have > advanced the global minimum. I wonder if a less aggressive version of this idea might still work. Perhaps we could use ShmemVariableCache->latestCompletedXid or ShmemVariableCache->nextXid to skip at least some updates? Obviously this isn't going to help if there's a lot of concurrent activity, but the case of just running vacuumdb -a might be substantially improved. Separately I wonder if it's worth micro-optimizing vac_update_datfrozenxid() a bit. I e.g. see a noticable speedup bypassing systable_getnext() and using heap_getnext(). It's really too bad that we want to check for "in the future" xids, otherwise we could use a ScanKey to filter at a lower level. Greetings, Andres Freund
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
I wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: >> I assumed it would look like: >> VACUUM (UPDATE_DATABASE_STATS {yes,no,only}) > Meh. We could do it like that, but I think options that look like > booleans but aren't are messy. Note that I'm not necessarily objecting to there being just one option, only to its values being a superset-of-boolean. Perhaps this'd work: VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY}) regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Nathan Bossart
Date:
On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote: >> Justin Pryzby <pryzby@telsasoft.com> writes: >>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only}) > VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY}) +1 for only introducing one option. IMHO UPDATE_DATABASE_STATS fits a little better since it states the action like most of the other options, but I think both choices are sufficiently clear. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes: > +1 for only introducing one option. IMHO UPDATE_DATABASE_STATS fits a > little better since it states the action like most of the other options, > but I think both choices are sufficiently clear. Consistency of VACUUM's options seems like a lost cause already :-(. Between them, DISABLE_PAGE_SKIPPING, SKIP_LOCKED, and PROCESS_TOAST cover just about the entire set of possibilities for describing a boolean option --- we couldn't even manage to be consistent about whether ON or OFF is the default, let alone where the verb is. And it's hard to argue that FULL, VERBOSE, or PARALLEL is a verb. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Nathan Bossart
Date:
On Wed, Dec 28, 2022 at 07:03:29PM -0800, Andres Freund wrote: > Separately I wonder if it's worth micro-optimizing vac_update_datfrozenxid() a > bit. I e.g. see a noticable speedup bypassing systable_getnext() and using > heap_getnext(). It's really too bad that we want to check for "in the future" > xids, otherwise we could use a ScanKey to filter at a lower level. Another thing I'm exploring is looking up the datfrozenxid/datminmxid before starting the pg_class scan so that the scan can be stopped early if it sees that we cannot possibly advance the values. The overwrite-corrupt-values logic might make this a little more complicated, but I think it'd be sufficient to force the pg_class scan to complete if we ever see a value "in the future." Overwriting the corrupt value might be delayed, but it would eventually happen once the table ages advance. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes: > On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote: >> Justin Pryzby <pryzby@telsasoft.com> writes: >>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only}) >>>> VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY}) > +1 for only introducing one option. IMHO UPDATE_DATABASE_STATS fits a > little better since it states the action like most of the other options, > but I think both choices are sufficiently clear. I tried to make a patch along these lines, and soon hit a stumbling block: ONLY is a fully-reserved SQL keyword. I don't think this syntax is attractive enough to justify requiring people to double-quote the option, so we are back to square one. Anybody have a different suggestion? regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Nathan Bossart
Date:
On Thu, Dec 29, 2022 at 03:29:15PM -0500, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote: >>> Justin Pryzby <pryzby@telsasoft.com> writes: >>>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only}) >>>>> VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY}) > >> +1 for only introducing one option. IMHO UPDATE_DATABASE_STATS fits a >> little better since it states the action like most of the other options, >> but I think both choices are sufficiently clear. > > I tried to make a patch along these lines, and soon hit a stumbling > block: ONLY is a fully-reserved SQL keyword. I don't think this > syntax is attractive enough to justify requiring people to > double-quote the option, so we are back to square one. Anybody > have a different suggestion? Hm. I thought about using PreventInTransactionBlock() for the function, but that probably won't work for a few reasons. AFAICT we'd need to restrict it to only be callable via "SELECT update_database_stats()", which feels a bit unnatural. There was some discussion elsewhere [0] about adding a PROCESS_MAIN_RELATION option or expanding PROCESS_TOAST to simplify vacuuming the TOAST table directly. If such an option existed, you could call VACUUM (PROCESS_MAIN_RELATION FALSE, PROCESS_TOAST FALSE, UPDATE_DATABASE_STATES TRUE) pg_class; to achieve roughly what we need. I'll admit this is hacky, though. So, adding both SKIP_DATABASE_STATS and ONLY_DATABASE_STATS might be the best bet. [0] https://postgr.es/m/20221215191246.GA252861%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes: > On Thu, Dec 29, 2022 at 03:29:15PM -0500, Tom Lane wrote: >> I tried to make a patch along these lines, and soon hit a stumbling >> block: ONLY is a fully-reserved SQL keyword. I don't think this >> syntax is attractive enough to justify requiring people to >> double-quote the option, so we are back to square one. Anybody >> have a different suggestion? > ... adding both SKIP_DATABASE_STATS and ONLY_DATABASE_STATS might be the > best bet. Nobody has proposed a different bikeshed color, so I'm going to proceed with that syntax. I'll incorporate the parallel-machinery fix from your patch and push to HEAD only (since it's hard to argue this isn't a new feature). This needn't foreclose pursuing the various ideas about making vac_update_datfrozenxid faster; but none of those would eliminate the fundamental O(N^2) issue AFAICS. regards, tom lane