Thread: BUG #18618: pg_upgrade from 14 to 15+ fails for unlogged table with identity column

BUG #18618: pg_upgrade from 14 to 15+ fails for unlogged table with identity column

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18618
Logged by:          Anthony Hsu
Email address:      erwaman@gmail.com
PostgreSQL version: 15.7
Operating system:   Linux
Description:

If I create an unlogged table with an identity column as follows

CREATE UNLOGGED TABLE test (a INT GENERATED BY DEFAULT AS IDENTITY);

in PG14 and then run pg_upgrade to PG15+, it fails due to

```
pg_restore: error: could not execute query: ERROR:  unexpected request for
new relfilenode in binary upgrade mode
Command was:
-- For binary upgrade, must preserve pg_class oids and relfilenodes
SELECT
pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('16384'::pg_catalog.oid);
SELECT
pg_catalog.binary_upgrade_set_next_heap_relfilenode('16384'::pg_catalog.oid);

ALTER TABLE "public"."test" ALTER COLUMN "a" ADD GENERATED BY DEFAULT AS
IDENTITY (
    SEQUENCE NAME "public"."test_a_seq"
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1
);
ALTER SEQUENCE "public"."test_a_seq" SET LOGGED;
```

If I use pg_upgrade from 15.6, it succeeds; if I use pg_upgrade from 15.7 or
15.8, it fails with the above. Upon further testing, I found that if I
revert
https://github.com/postgres/postgres/commit/d17a3a4c6a34f61a3d4d9faa7a70c14d8d0c0ffb,
then pg_upgrade succeeds. This bug also does not happen if I try upgrading
from 15 -> 15 (same version), as the "ALTER SEQUENCE "public"."test_a_seq"
SET LOGGED;" line is not generated by pg_dump in this case.


PG Bug reporting form <noreply@postgresql.org> writes:
> If I create an unlogged table with an identity column as follows

> CREATE UNLOGGED TABLE test (a INT GENERATED BY DEFAULT AS IDENTITY);

> in PG14 and then run pg_upgrade to PG15+, it fails due to

> ```
> pg_restore: error: could not execute query: ERROR:  unexpected request for
> new relfilenode in binary upgrade mode
> Command was:
> -- For binary upgrade, must preserve pg_class oids and relfilenodes
> SELECT
> pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('16384'::pg_catalog.oid);
> SELECT
> pg_catalog.binary_upgrade_set_next_heap_relfilenode('16384'::pg_catalog.oid);

> ALTER TABLE "public"."test" ALTER COLUMN "a" ADD GENERATED BY DEFAULT AS
> IDENTITY (
>     SEQUENCE NAME "public"."test_a_seq"
>     START WITH 1
>     INCREMENT BY 1
>     NO MINVALUE
>     NO MAXVALUE
>     CACHE 1
> );
> ALTER SEQUENCE "public"."test_a_seq" SET LOGGED;
> ```

Yeah.  pg_dump is trying to recreate the state of affairs in the
source database, where the table is unlogged but its owned sequence
is logged (because v14 didn't have unlogged sequences).  And
ALTER SEQUENCE SET LOGGED works by assigning the sequence a new
relfilenode and then writing its data into that.  That doesn't work
in binary-upgrade mode, and if it did work it'd be the wrong thing
because we have to preserve the sequence's relfilenode.

However, in binary-upgrade mode there seems little need to be
concerned about whether we produce a nice WAL trace of the SET LOGGED
operation.  I think we can just summarily change the sequence's
relpersistence field and be done with it, more or less as attached.
Anybody see a flaw in that reasoning?

The only alternative I can see is to extend the ALTER TABLE ADD
GENERATED syntax to allow correct specification of the sequence's
LOGGED/UNLOGGED status to begin with.  While cleaner, that would be
a lot more work and probably result in compatibility problems for
normal uses of pg_dump (where the output might get loaded into a
server version that lacks the syntax extension).

            regards, tom lane

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index b37fd688d3..73ab609153 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -545,6 +545,46 @@ SequenceChangePersistence(Oid relid, char newrelpersistence)
     Buffer        buf;
     HeapTupleData seqdatatuple;

+    /*
+     * In binary-upgrade mode, we cannot assign a new relfilenumber to the
+     * sequence, so the normal code path below doesn't work.  However, we
+     * don't really need to go through all the pushups of assigning a new
+     * relfilenumber in this mode, since there's no concern for concurrent
+     * operations, rollback-ability, or replication.  It seems sufficient to
+     * just summarily change the sequence's relpersistence value.
+     */
+    if (IsBinaryUpgrade)
+    {
+        Relation    pg_class;
+        HeapTuple    tuple;
+        Form_pg_class classform;
+
+        /* Get a writable copy of the pg_class tuple for the sequence. */
+        pg_class = table_open(RelationRelationId, RowExclusiveLock);
+
+        tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+        if (!HeapTupleIsValid(tuple))
+            elog(ERROR, "could not find tuple for relation %u", relid);
+        classform = (Form_pg_class) GETSTRUCT(tuple);
+
+        /* Apply the update. */
+        classform->relpersistence = newrelpersistence;
+
+        CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+
+        heap_freetuple(tuple);
+
+        table_close(pg_class, RowExclusiveLock);
+
+        /*
+         * Make the pg_class row change visible.  This will cause the
+         * sequence's relcache entry to get updated, too.
+         */
+        CommandCounterIncrement();
+
+        return;
+    }
+
     /*
      * ALTER SEQUENCE acquires this lock earlier.  If we're processing an
      * owned sequence for ALTER TABLE, lock now.  Without the lock, we'd

I wrote:
> However, in binary-upgrade mode there seems little need to be
> concerned about whether we produce a nice WAL trace of the SET LOGGED
> operation.  I think we can just summarily change the sequence's
> relpersistence field and be done with it, more or less as attached.

Nope, after further testing that doesn't work at all.  In the first
place, the SET UNLOGGED case falls foul of some assertions in
the relcache code having to do with WAL-skipping; and that code looks
complex and fragile enough that I don't want to try to poke holes
in it.  In the second place, I'd forgotten the need to add or remove
an init fork.

> The only alternative I can see is to extend the ALTER TABLE ADD
> GENERATED syntax to allow correct specification of the sequence's
> LOGGED/UNLOGGED status to begin with.  While cleaner, that would be
> a lot more work and probably result in compatibility problems for
> normal uses of pg_dump (where the output might get loaded into a
> server version that lacks the syntax extension).

So we have to do it like that, and it seems not that bad, especially
if we follow the lead of the SEQUENCE NAME option and don't bother
to document this stuff.  (I don't think that's a great precedent,
but I didn't change it here.)

Looking at this draft patch a second time, one way that we could
reduce the surface area for compatibility problems is to emit these
new ADD GENERATED options only in binary-upgrade mode, and continue to
use ALTER SEQUENCE SET in standard dumps.  But I kind of think that
it's not worth the complication: sequences that don't match their
table's persistence must be quite rare, else we'd have heard about
this problem long ago.

            regards, tom lane

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index b37fd688d3..d49d53f35a 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -1347,7 +1347,9 @@ init_params(ParseState *pstate, List *options, bool for_identity,
             /*
              * The parser allows this, but it is only for identity columns, in
              * which case it is filtered out in parse_utilcmd.c.  We only get
-             * here if someone puts it into a CREATE SEQUENCE.
+             * here if someone puts it into a CREATE SEQUENCE.  (The same is
+             * true for the equally-undocumented LOGGED and UNLOGGED options,
+             * but for those, the default error below seems sufficient.)
              */
             ereport(ERROR,
                     (errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 84cef57a70..4fcab9e3f1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4908,6 +4908,11 @@ SeqOptElem: AS SimpleTypename
                 {
                     $$ = makeDefElem("increment", (Node *) $3, @1);
                 }
+            | LOGGED
+                {
+                    /* not documented, only used by pg_dump */
+                    $$ = makeDefElem("logged", NULL, @1);
+                }
             | MAXVALUE NumericOnly
                 {
                     $$ = makeDefElem("maxvalue", (Node *) $2, @1);
@@ -4945,6 +4950,11 @@ SeqOptElem: AS SimpleTypename
                 {
                     $$ = makeDefElem("restart", (Node *) $3, @1);
                 }
+            | UNLOGGED
+                {
+                    /* not documented, only used by pg_dump */
+                    $$ = makeDefElem("unlogged", NULL, @1);
+                }
         ;

 opt_by:        BY
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 79cad4ab30..2cae834c2c 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -365,30 +365,20 @@ generateSerialExtraStmts(CreateStmtContext *cxt, ColumnDef *column,
 {
     ListCell   *option;
     DefElem    *nameEl = NULL;
+    DefElem    *loggedEl = NULL;
     Oid            snamespaceid;
     char       *snamespace;
     char       *sname;
     CreateSeqStmt *seqstmt;
     AlterSeqStmt *altseqstmt;
     List       *attnamelist;
-    int            nameEl_idx = -1;

     /* Make a copy of this as we may end up modifying it in the code below */
     seqoptions = list_copy(seqoptions);

     /*
-     * Determine namespace and name to use for the sequence.
-     *
-     * First, check if a sequence name was passed in as an option.  This is
-     * used by pg_dump.  Else, generate a name.
-     *
-     * Although we use ChooseRelationName, it's not guaranteed that the
-     * selected sequence name won't conflict; given sufficiently long field
-     * names, two different serial columns in the same table could be assigned
-     * the same sequence name, and we'd not notice since we aren't creating
-     * the sequence quite yet.  In practice this seems quite unlikely to be a
-     * problem, especially since few people would need two serial columns in
-     * one table.
+     * Check for special (undocumented) options used by pg_dump, and remove
+     * them from the seqoptions list if found.
      */
     foreach(option, seqoptions)
     {
@@ -399,12 +389,24 @@ generateSerialExtraStmts(CreateStmtContext *cxt, ColumnDef *column,
             if (nameEl)
                 errorConflictingDefElem(defel, cxt->pstate);
             nameEl = defel;
-            nameEl_idx = foreach_current_index(option);
+            seqoptions = foreach_delete_current(seqoptions, option);
+        }
+        else if (strcmp(defel->defname, "logged") == 0 ||
+                 strcmp(defel->defname, "unlogged") == 0)
+        {
+            if (loggedEl)
+                errorConflictingDefElem(defel, cxt->pstate);
+            loggedEl = defel;
+            seqoptions = foreach_delete_current(seqoptions, option);
         }
     }

+    /*
+     * Determine namespace and name to use for the sequence.
+     */
     if (nameEl)
     {
+        /* Use specified name */
         RangeVar   *rv = makeRangeVarFromNameList(castNode(List, nameEl->arg));

         snamespace = rv->schemaname;
@@ -418,11 +420,20 @@ generateSerialExtraStmts(CreateStmtContext *cxt, ColumnDef *column,
             snamespace = get_namespace_name(snamespaceid);
         }
         sname = rv->relname;
-        /* Remove the SEQUENCE NAME item from seqoptions */
-        seqoptions = list_delete_nth_cell(seqoptions, nameEl_idx);
     }
     else
     {
+        /*
+         * Generate a name.
+         *
+         * Although we use ChooseRelationName, it's not guaranteed that the
+         * selected sequence name won't conflict; given sufficiently long
+         * field names, two different serial columns in the same table could
+         * be assigned the same sequence name, and we'd not notice since we
+         * aren't creating the sequence quite yet.  In practice this seems
+         * quite unlikely to be a problem, especially since few people would
+         * need two serial columns in one table.
+         */
         if (cxt->rel)
             snamespaceid = RelationGetNamespace(cxt->rel);
         else
@@ -452,13 +463,22 @@ generateSerialExtraStmts(CreateStmtContext *cxt, ColumnDef *column,
     seqstmt->sequence = makeRangeVar(snamespace, sname, -1);

     /*
-     * Copy the persistence of the table.  For CREATE TABLE, we get the
-     * persistence from cxt->relation, which comes from the CreateStmt in
-     * progress.  For ALTER TABLE, the parser won't set
+     * Determine the persistence of the sequence.  If LOGGED or UNLOGGED was
+     * specified, use that; otherwise copy the persistence of the table.  For
+     * CREATE TABLE, we get the persistence from cxt->relation, which comes
+     * from the CreateStmt in progress.  For ALTER TABLE, the parser won't set
      * cxt->relation->relpersistence, but we have cxt->rel as the existing
      * table, so we copy the persistence from there.
      */
-    seqstmt->sequence->relpersistence = cxt->rel ? cxt->rel->rd_rel->relpersistence : cxt->relation->relpersistence;
+    if (loggedEl)
+    {
+        if (strcmp(loggedEl->defname, "logged") == 0)
+            seqstmt->sequence->relpersistence = RELPERSISTENCE_PERMANENT;
+        else
+            seqstmt->sequence->relpersistence = RELPERSISTENCE_UNLOGGED;
+    }
+    else
+        seqstmt->sequence->relpersistence = cxt->rel ? cxt->rel->rd_rel->relpersistence :
cxt->relation->relpersistence;

     seqstmt->options = seqoptions;

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 546e7e4ce1..d140d6067b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -17556,6 +17556,15 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo)
             appendPQExpBufferStr(query, "BY DEFAULT");
         appendPQExpBuffer(query, " AS IDENTITY (\n    SEQUENCE NAME %s\n",
                           fmtQualifiedDumpable(tbinfo));
+
+        /*
+         * Emit persistence option only if it's different from the owning
+         * table's.  This avoids using this new syntax unnecessarily.
+         */
+        if (tbinfo->relpersistence != owning_tab->relpersistence)
+            appendPQExpBuffer(query, "    %s\n",
+                              tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED ?
+                              "UNLOGGED" : "LOGGED");
     }
     else
     {
@@ -17588,15 +17597,7 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo)
                       seq->cache, (seq->cycled ? "\n    CYCLE" : ""));

     if (tbinfo->is_identity_sequence)
-    {
         appendPQExpBufferStr(query, "\n);\n");
-        if (tbinfo->relpersistence != owning_tab->relpersistence)
-            appendPQExpBuffer(query,
-                              "ALTER SEQUENCE %s SET %s;\n",
-                              fmtQualifiedDumpable(tbinfo),
-                              tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED ?
-                              "UNLOGGED" : "LOGGED");
-    }
     else
         appendPQExpBufferStr(query, ";\n");

diff --git a/src/test/regress/expected/identity.out b/src/test/regress/expected/identity.out
index 3d554fe327..f14bfccfb1 100644
--- a/src/test/regress/expected/identity.out
+++ b/src/test/regress/expected/identity.out
@@ -905,3 +905,19 @@ SELECT * FROM itest16;

 DROP TABLE itest15;
 DROP TABLE itest16;
+-- For testing of pg_dump and pg_upgrade, leave behind some identity
+-- sequences whose logged-ness doesn't match their owning table's.
+CREATE TABLE identity_dump_logged (a INT GENERATED ALWAYS AS IDENTITY);
+ALTER SEQUENCE identity_dump_logged_a_seq SET UNLOGGED;
+CREATE UNLOGGED TABLE identity_dump_unlogged (a INT GENERATED ALWAYS AS IDENTITY);
+ALTER SEQUENCE identity_dump_unlogged_a_seq SET LOGGED;
+SELECT relname, relpersistence FROM pg_class
+  WHERE relname ~ '^identity_dump_' ORDER BY 1;
+           relname            | relpersistence
+------------------------------+----------------
+ identity_dump_logged         | p
+ identity_dump_logged_a_seq   | u
+ identity_dump_unlogged       | u
+ identity_dump_unlogged_a_seq | p
+(4 rows)
+
diff --git a/src/test/regress/sql/identity.sql b/src/test/regress/sql/identity.sql
index 84c43a19a3..cb0e05a2f1 100644
--- a/src/test/regress/sql/identity.sql
+++ b/src/test/regress/sql/identity.sql
@@ -528,3 +528,12 @@ SELECT * FROM itest15;
 SELECT * FROM itest16;
 DROP TABLE itest15;
 DROP TABLE itest16;
+
+-- For testing of pg_dump and pg_upgrade, leave behind some identity
+-- sequences whose logged-ness doesn't match their owning table's.
+CREATE TABLE identity_dump_logged (a INT GENERATED ALWAYS AS IDENTITY);
+ALTER SEQUENCE identity_dump_logged_a_seq SET UNLOGGED;
+CREATE UNLOGGED TABLE identity_dump_unlogged (a INT GENERATED ALWAYS AS IDENTITY);
+ALTER SEQUENCE identity_dump_unlogged_a_seq SET LOGGED;
+SELECT relname, relpersistence FROM pg_class
+  WHERE relname ~ '^identity_dump_' ORDER BY 1;

I wrote:
>> The only alternative I can see is to extend the ALTER TABLE ADD
>> GENERATED syntax to allow correct specification of the sequence's
>> LOGGED/UNLOGGED status to begin with.  While cleaner, that would be
>> a lot more work and probably result in compatibility problems for
>> normal uses of pg_dump (where the output might get loaded into a
>> server version that lacks the syntax extension).

> So we have to do it like that, and it seems not that bad, especially
> if we follow the lead of the SEQUENCE NAME option and don't bother
> to document this stuff.  (I don't think that's a great precedent,
> but I didn't change it here.)

Pushed with a bit of further polishing, including adding that
missing documentation.

            regards, tom lane