Re: pg_upgrade failing for 200+ million Large Objects - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_upgrade failing for 200+ million Large Objects
Date
Msg-id 2422717.1722201869@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_upgrade failing for 200+ million Large Objects  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_upgrade failing for 200+ million Large Objects
List pgsql-hackers
I wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
>> J4F, I have an idea to count number of ';' sings and use it for
>> transaction size counter, since it is as upper bound estimate of
>> number of SQL commands :-)

> Hmm ... that's not a completely silly idea.  Let's keep it in
> the back pocket in case we can't easily reduce the number of
> SQL commands in some cases.

After poking at this for awhile, we can fix Justin's example
case by avoiding repeated UPDATEs on pg_attribute, so I think
we should do that.  It seems clearly a win, with no downside
other than a small increment of complexity in pg_dump.

However, that's probably not sufficient to mark this issue
as closed.  It seems likely that there are other patterns
that would cause backend memory bloat.  One case that I found
is tables with a lot of inherited constraints (not partitions,
but old-style inheritance).  For example, load the output of
this Perl script into a database:

-----
for (my $i = 0; $i < 100; $i++)
{
    print "CREATE TABLE test_inh_check$i (\n";
    for (my $j = 0; $j < 1000; $j++)
    {
        print "a$j float check (a$j > 10.2),\n";
    }
    print "b float);\n";
    print "CREATE TABLE test_inh_check_child$i() INHERITS(test_inh_check$i);\n";
}
-----

pg_dump is horrendously slow on this, thanks to O(N^2) behavior in
ruleutils.c, and pg_upgrade is worse --- and leaks memory too in
HEAD/v17.  The slowness was there before, so I think the lack of
field complaints indicates that this isn't a real-world use case.
Still, it's bad if pg_upgrade fails when it would not have before,
and there may be other similar issues.

So I'm forced to the conclusion that we'd better make the transaction
size adaptive as per Alexander's suggestion.

In addition to the patches attached, I experimented with making
dumpTableSchema fold all the ALTER TABLE commands for a single table
into one command.  That's do-able without too much effort, but I'm now
convinced that we shouldn't.  It would break the semicolon-counting
hack for detecting that tables like these involve extra work.
I'm also not very confident that the backend won't have trouble with
ALTER TABLE commands containing hundreds of subcommands.  That's
something we ought to work on probably, but it's not a project that
I want to condition v17 pg_upgrade's stability on.

Anyway, proposed patches attached.  0001 is some trivial cleanup
that I noticed while working on the failed single-ALTER-TABLE idea.
0002 merges the catalog-UPDATE commands that dumpTableSchema issues,
and 0003 is Alexander's suggestion.

            regards, tom lane

From 34ebed72e9029f690e5d3f0cb7464670e83caa5c Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 28 Jul 2024 13:02:27 -0400
Subject: [PATCH v1 1/3] Fix missing ONLY in one dumpTableSchema command.

The various ALTER [FOREIGN] TABLE commands issued by dumpTableSchema
all use ONLY, except for this one.  I think it's not a live bug
because we don't permit foreign tables to have children, but
it's still inconsistent.  I happened across this while refactoring
dumpTableSchema to merge all its ALTERs into one; while I'm not
certain we should actually do that, this seems like good cleanup.
---
 src/bin/pg_dump/pg_dump.c        | 2 +-
 src/bin/pg_dump/t/002_pg_dump.pl | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index b8b1888bd3..7cd6a7fb97 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16344,7 +16344,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
             if (tbinfo->relkind == RELKIND_FOREIGN_TABLE &&
                 tbinfo->attfdwoptions[j][0] != '\0')
                 appendPQExpBuffer(q,
-                                  "ALTER FOREIGN TABLE %s ALTER COLUMN %s OPTIONS (\n"
+                                  "ALTER FOREIGN TABLE ONLY %s ALTER COLUMN %s OPTIONS (\n"
                                   "    %s\n"
                                   ");\n",
                                   qualrelname,
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index d3dd8784d6..5bcc2244d5 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -1154,7 +1154,7 @@ my %tests = (

     'ALTER FOREIGN TABLE foreign_table ALTER COLUMN c1 OPTIONS' => {
         regexp => qr/^
-            \QALTER FOREIGN TABLE dump_test.foreign_table ALTER COLUMN c1 OPTIONS (\E\n
+            \QALTER FOREIGN TABLE ONLY dump_test.foreign_table ALTER COLUMN c1 OPTIONS (\E\n
             \s+\Qcolumn_name 'col1'\E\n
             \Q);\E\n
             /xm,
--
2.43.5

From c8f0d0252e292f276fe9631ae31e6aea11d294d2 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 28 Jul 2024 16:19:48 -0400
Subject: [PATCH v1 2/3] Reduce number of commands dumpTableSchema emits for
 binary upgrade.

Avoid issuing a separate SQL UPDATE command for each column when
directly manipulating pg_attribute contents in binary upgrade mode.
With the separate updates, we triggered a relcache invalidation with
each update.  For a table with N columns, that causes O(N^2) relcache
bloat in txn_size mode because the table's newly-created relcache
entry can't be flushed till end of transaction.  Reducing the number
of commands is marginally faster as well as avoiding that problem.

While at it, likewise avoid issuing a separate UPDATE on pg_constraint
for each inherited constraint.  This is less exciting, first because
inherited (non-partitioned) constraints are relatively rare, and
second because the backend has a good deal of trouble anyway with
restoring tables containing many such constraints, due to
MergeConstraintsIntoExisting being horribly inefficient.  But it seems
more consistent to do it this way here too, and it surely can't hurt.

Per report from Justin Pryzby.  Back-patch to v17 where txn_size mode
was introduced.

Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023
---
 src/bin/pg_dump/pg_dump.c | 132 ++++++++++++++++++++++++++++----------
 1 file changed, 97 insertions(+), 35 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7cd6a7fb97..2b02148559 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15670,6 +15670,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
     DumpOptions *dopt = fout->dopt;
     PQExpBuffer q = createPQExpBuffer();
     PQExpBuffer delq = createPQExpBuffer();
+    PQExpBuffer extra = createPQExpBuffer();
     char       *qrelname;
     char       *qualrelname;
     int            numParents;
@@ -15736,7 +15737,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
         char       *partkeydef = NULL;
         char       *ftoptions = NULL;
         char       *srvname = NULL;
-        char       *foreign = "";
+        const char *foreign = "";

         /*
          * Set reltypename, and collect any relkind-specific data that we
@@ -16094,51 +16095,98 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
              tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
              tbinfo->relkind == RELKIND_PARTITIONED_TABLE))
         {
+            bool        firstitem;
+
+            /*
+             * Drop any dropped columns.  Merge the pg_attribute manipulations
+             * into a single SQL command, so that we don't cause repeated
+             * relcache flushes on the target table.  Otherwise we risk O(N^2)
+             * relcache bloat while dropping N columns.
+             */
+            resetPQExpBuffer(extra);
+            firstitem = true;
             for (j = 0; j < tbinfo->numatts; j++)
             {
                 if (tbinfo->attisdropped[j])
                 {
-                    appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate dropped column.\n");
-                    appendPQExpBuffer(q, "UPDATE pg_catalog.pg_attribute\n"
-                                      "SET attlen = %d, "
-                                      "attalign = '%c', attbyval = false\n"
-                                      "WHERE attname = ",
+                    if (firstitem)
+                    {
+                        appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate dropped columns.\n"
+                                             "UPDATE pg_catalog.pg_attribute\n"
+                                             "SET attlen = v.dlen, "
+                                             "attalign = v.dalign, "
+                                             "attbyval = false\n"
+                                             "FROM (VALUES ");
+                        firstitem = false;
+                    }
+                    else
+                        appendPQExpBufferStr(q, ",\n             ");
+                    appendPQExpBufferChar(q, '(');
+                    appendStringLiteralAH(q, tbinfo->attnames[j], fout);
+                    appendPQExpBuffer(q, ", %d, '%c')",
                                       tbinfo->attlen[j],
                                       tbinfo->attalign[j]);
-                    appendStringLiteralAH(q, tbinfo->attnames[j], fout);
-                    appendPQExpBufferStr(q, "\n  AND attrelid = ");
-                    appendStringLiteralAH(q, qualrelname, fout);
-                    appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");
-
-                    if (tbinfo->relkind == RELKIND_RELATION ||
-                        tbinfo->relkind == RELKIND_PARTITIONED_TABLE)
-                        appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
-                                          qualrelname);
-                    else
-                        appendPQExpBuffer(q, "ALTER FOREIGN TABLE ONLY %s ",
-                                          qualrelname);
-                    appendPQExpBuffer(q, "DROP COLUMN %s;\n",
+                    /* The ALTER ... DROP COLUMN commands must come after */
+                    appendPQExpBuffer(extra, "ALTER %sTABLE ONLY %s ",
+                                      foreign, qualrelname);
+                    appendPQExpBuffer(extra, "DROP COLUMN %s;\n",
                                       fmtId(tbinfo->attnames[j]));
                 }
-                else if (!tbinfo->attislocal[j])
+            }
+            if (!firstitem)
+            {
+                appendPQExpBufferStr(q, ") v(dname, dlen, dalign)\n"
+                                     "WHERE attrelid = ");
+                appendStringLiteralAH(q, qualrelname, fout);
+                appendPQExpBufferStr(q, "::pg_catalog.regclass\n"
+                                     "  AND attname = v.dname;\n");
+                /* Now we can issue the actual DROP COLUMN commands */
+                appendBinaryPQExpBuffer(q, extra->data, extra->len);
+            }
+
+            /*
+             * Fix up inherited columns.  As above, do the pg_attribute
+             * manipulations in a single SQL command.
+             */
+            firstitem = true;
+            for (j = 0; j < tbinfo->numatts; j++)
+            {
+                if (!tbinfo->attisdropped[j] &&
+                    !tbinfo->attislocal[j])
                 {
-                    appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited column.\n");
-                    appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n"
-                                         "SET attislocal = false\n"
-                                         "WHERE attname = ");
+                    if (firstitem)
+                    {
+                        appendPQExpBufferStr(q, "\n-- For binary upgrade, recreate inherited columns.\n");
+                        appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_attribute\n"
+                                             "SET attislocal = false\n"
+                                             "WHERE attrelid = ");
+                        appendStringLiteralAH(q, qualrelname, fout);
+                        appendPQExpBufferStr(q, "::pg_catalog.regclass\n"
+                                             "  AND attname IN (");
+                        firstitem = false;
+                    }
+                    else
+                        appendPQExpBufferStr(q, ", ");
                     appendStringLiteralAH(q, tbinfo->attnames[j], fout);
-                    appendPQExpBufferStr(q, "\n  AND attrelid = ");
-                    appendStringLiteralAH(q, qualrelname, fout);
-                    appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");
                 }
             }
+            if (!firstitem)
+                appendPQExpBufferStr(q, ");\n");

             /*
              * Add inherited CHECK constraints, if any.
              *
              * For partitions, they were already dumped, and conislocal
              * doesn't need fixing.
+             *
+             * As above, issue only one direct manipulation of pg_constraint.
+             * Although it is tempting to merge the ALTER ADD CONSTRAINT
+             * commands into one as well, refrain for now due to concern about
+             * possible backend memory bloat if there are many such
+             * constraints.
              */
+            resetPQExpBuffer(extra);
+            firstitem = true;
             for (k = 0; k < tbinfo->ncheck; k++)
             {
                 ConstraintInfo *constr = &(tbinfo->checkexprs[k]);
@@ -16146,18 +16194,31 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
                 if (constr->separate || constr->conislocal || tbinfo->ispartition)
                     continue;

-                appendPQExpBufferStr(q, "\n-- For binary upgrade, set up inherited constraint.\n");
+                if (firstitem)
+                    appendPQExpBufferStr(q, "\n-- For binary upgrade, set up inherited constraints.\n");
                 appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ADD CONSTRAINT %s %s;\n",
                                   foreign, qualrelname,
                                   fmtId(constr->dobj.name),
                                   constr->condef);
-                appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_constraint\n"
-                                     "SET conislocal = false\n"
-                                     "WHERE contype = 'c' AND conname = ");
-                appendStringLiteralAH(q, constr->dobj.name, fout);
-                appendPQExpBufferStr(q, "\n  AND conrelid = ");
-                appendStringLiteralAH(q, qualrelname, fout);
-                appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");
+                /* Update pg_constraint after all the ALTER TABLEs */
+                if (firstitem)
+                {
+                    appendPQExpBufferStr(extra, "UPDATE pg_catalog.pg_constraint\n"
+                                         "SET conislocal = false\n"
+                                         "WHERE contype = 'c' AND conrelid = ");
+                    appendStringLiteralAH(extra, qualrelname, fout);
+                    appendPQExpBufferStr(extra, "::pg_catalog.regclass\n");
+                    appendPQExpBufferStr(extra, "  AND conname IN (");
+                    firstitem = false;
+                }
+                else
+                    appendPQExpBufferStr(extra, ", ");
+                appendStringLiteralAH(extra, constr->dobj.name, fout);
+            }
+            if (!firstitem)
+            {
+                appendPQExpBufferStr(extra, ");\n");
+                appendBinaryPQExpBuffer(q, extra->data, extra->len);
             }

             if (numParents > 0 && !tbinfo->ispartition)
@@ -16445,6 +16506,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)

     destroyPQExpBuffer(q);
     destroyPQExpBuffer(delq);
+    destroyPQExpBuffer(extra);
     free(qrelname);
     free(qualrelname);
 }
--
2.43.5

From 7cfea69d3f5df4c15681f4902a86b366f0ed4292 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 28 Jul 2024 16:40:35 -0400
Subject: [PATCH v1 3/3] Count individual SQL commands in pg_restore's
 --transaction-size mode.

The initial implementation counted one action per TOC entry (except
for some special cases for multi-blob BLOBS entries).  This assumes
that TOC entries are all about equally complex, but it turns out
that that assumption doesn't hold up very well in binary-upgrade
mode.  For example, even after the previous patch I was able to
cause backend bloat with tables having many inherited constraints.

To fix, count multi-command TOC entries as N actions, allowing the
transaction size to be scaled down when we hit a complex TOC entry.
Rather than add a SQL parser to pg_restore, approximate "multi
command" by counting semicolons in the TOC entry's defn string.
This will be fooled by semicolons appearing in string literals ---
but the error is in the conservative direction, so it doesn't seem
worth working harder.  The biggest risk is with function/procedure
TOC entries, but we can just explicitly skip those.

(This is undoubtedly a hack, and maybe someday we'll be able to
revert it after fixing the backend's bloat issues or rethinking
what pg_dump emits in binary upgrade mode.  But that surely isn't
a project for v17.)

Thanks to Alexander Korotkov for the let's-count-semicolons idea.

Per report from Justin Pryzby.  Back-patch to v17 where txn_size mode
was introduced.

Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023
---
 src/bin/pg_dump/pg_backup_archiver.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 68e321212d..8c20c263c4 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -3827,10 +3827,32 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
     {
         IssueACLPerBlob(AH, te);
     }
-    else
+    else if (te->defn && strlen(te->defn) > 0)
     {
-        if (te->defn && strlen(te->defn) > 0)
-            ahprintf(AH, "%s\n\n", te->defn);
+        ahprintf(AH, "%s\n\n", te->defn);
+
+        /*
+         * If the defn string contains multiple SQL commands, txn_size mode
+         * should count it as N actions not one.  But rather than build a full
+         * SQL parser, approximate this by counting semicolons.  One case
+         * where that tends to be badly fooled is function definitions, so
+         * ignore them.  (restore_toc_entry will count one action anyway.)
+         */
+        if (ropt->txn_size > 0 &&
+            strcmp(te->desc, "FUNCTION") != 0 &&
+            strcmp(te->desc, "PROCEDURE") != 0)
+        {
+            const char *p = te->defn;
+            int            nsemis = 0;
+
+            while ((p = strchr(p, ';')) != NULL)
+            {
+                nsemis++;
+                p++;
+            }
+            if (nsemis > 1)
+                AH->txnCount += nsemis - 1;
+        }
     }

     /*
--
2.43.5


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parent/child context relation in pg_get_backend_memory_contexts()
Next
From: "Joel Jacobson"
Date:
Subject: Re: Optimize mul_var() for var1ndigits >= 8