Thread: ALTER TABLE ALTER COLUMN SET TYPE crash

ALTER TABLE ALTER COLUMN SET TYPE crash

From
Robins Tharakan
Date:
Hi,

Unlike a recently reported similar issue, executing the following ALTER TABLE on the regression database crashes Postgres (master).

Admittedly it doesn't do anything constructive (and am new to the tool), but do let me know if such reports are interesting and / or if you need more details for reproduction.

Steps to reproduce:
==================

ubuntu@laptop:~$ dropdb --if-exists tempdel && createdb tempdel
ubuntu@laptop:~$ ./pg_regress --use-existing --schedule=serial_schedule --dbname=tempdel >/dev/null
ubuntu@laptop:~$ sleep 5 && psql -c "SELECT version();" tempdel
                                                version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)

ubuntu@laptop:~$ psql -c "begin; ALTER TABLE ONLY public.xacttest ALTER COLUMN a SET DATA TYPE  TEXT  USING public.max_xacttest(); ROLLBACK;" tempdel
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
ubuntu@laptop:~$ sleep 5 && psql -c "SELECT 1;" tempdel
 ?column?
----------
        1
(1 row)

-
robins tharakan

Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Michael Paquier
Date:
On Tue, Aug 25, 2020 at 02:14:06PM +1000, Robins Tharakan wrote:
> Unlike a recently reported similar issue, executing the following ALTER
> TABLE on the regression database crashes Postgres (master).
>
> Admittedly it doesn't do anything constructive (and am new to the tool),
> but do let me know if such reports are interesting and / or if you need
> more details for reproduction.

Such reports are constructive!  I can reproduce the crash here down to
9.5.  From what I can see, the problem comes from ATRewriteTable() ->
ExecEvalExpr() when we evaluate expressions with inputs coming from
the old tuple.  It looks like a memory corruption issue or a context
issue at quick glance, and I cannot get a clean backtrace.
--
Michael

Attachment

Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Bruce Momjian
Date:
On Tue, Aug 25, 2020 at 03:05:11PM +0900, Michael Paquier wrote:
> On Tue, Aug 25, 2020 at 02:14:06PM +1000, Robins Tharakan wrote:
> > Unlike a recently reported similar issue, executing the following ALTER
> > TABLE on the regression database crashes Postgres (master).
> > 
> > Admittedly it doesn't do anything constructive (and am new to the tool),
> > but do let me know if such reports are interesting and / or if you need
> > more details for reproduction.
> 
> Such reports are constructive!  I can reproduce the crash here down to
> 9.5.  From what I can see, the problem comes from ATRewriteTable() ->
> ExecEvalExpr() when we evaluate expressions with inputs coming from
> the old tuple.  It looks like a memory corruption issue or a context
> issue at quick glance, and I cannot get a clean backtrace.

OK, I have generated a attached backtrace by manually walking down
through the stack using break points.  Notice len2 on the top line:

    #0  varstr_cmp (arg1=0x55e0ef64262c "\252\061\306BT\026", len1=21,
    arg2=0x7f274db17ebc "-\317\303=T\026", len2=-4, collid=100) at
    varlena.c:1633                         -------

-4 is not a good len.  I will do more research to see why that -4 len is
happening.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Tue, Aug 25, 2020 at 03:05:11PM +0900, Michael Paquier wrote:
>> On Tue, Aug 25, 2020 at 02:14:06PM +1000, Robins Tharakan wrote:
>>> Unlike a recently reported similar issue, executing the following ALTER
>>> TABLE on the regression database crashes Postgres (master).

>> Such reports are constructive!  I can reproduce the crash here down to
>> 9.5.  From what I can see, the problem comes from ATRewriteTable() ->
>> ExecEvalExpr() when we evaluate expressions with inputs coming from
>> the old tuple.  It looks like a memory corruption issue or a context
>> issue at quick glance, and I cannot get a clean backtrace.

I think the nature of the problem (and Robins' other report too) is pretty
clear.  We have a SQL or plpgsql function that's trying to access a table
that is inconsistent during an ALTER TABLE operation.  The function would
be locked out from seeing that transient state if it were in another
session, thanks to normal locking rules; but the lock acquisition rules
don't prevent same-session accesses.

ALTER TABLE and related utility commands contain guards against the
reverse form of this problem: CheckTableNotInUse will bitch if there's
some already-active outer query referencing the table.  But we haven't
thought about the possibility of one of these commands executing
user-defined code midway through.

            regards, tom lane



Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Bruce Momjian
Date:
On Tue, Aug 25, 2020 at 12:14:20PM -0400, Tom Lane wrote:
> I think the nature of the problem (and Robins' other report too) is pretty
> clear.  We have a SQL or plpgsql function that's trying to access a table
> that is inconsistent during an ALTER TABLE operation.  The function would
> be locked out from seeing that transient state if it were in another
> session, thanks to normal locking rules; but the lock acquisition rules
> don't prevent same-session accesses.
> 
> ALTER TABLE and related utility commands contain guards against the
> reverse form of this problem: CheckTableNotInUse will bitch if there's
> some already-active outer query referencing the table.  But we haven't
> thought about the possibility of one of these commands executing
> user-defined code midway through.

That does explain why it fails on the _second_ time through those
lower-level functions.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Robins Tharakan
Date:

On Tue, 25 Aug 2020 at 16:05, Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Aug 25, 2020 at 02:14:06PM +1000, Robins Tharakan wrote:
> Admittedly it doesn't do anything constructive (and am new to the tool),
> but do let me know if such reports are interesting and / or if you need
> more details for reproduction.

Such reports are constructive!   
--
Michael

Thanks all for picking this up promptly and Michael for confirming the need. I'll try to add support for more DDLs and see if it can throw up similar corner-cases.

For now though, I need to see how to exclude some functions. The above use-case has pretty much rendered my test-setup useless - since it easily triggers engine restarts every few minutes.
-
robins

Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Aug 25, 2020 at 01:35:11PM -0400, Bruce Momjian wrote:
>> On Tue, Aug 25, 2020 at 12:14:20PM -0400, Tom Lane wrote:
>>> I think the nature of the problem (and Robins' other report too) is pretty
>>> clear.  We have a SQL or plpgsql function that's trying to access a table
>>> that is inconsistent during an ALTER TABLE operation.  The function would
>>> be locked out from seeing that transient state if it were in another
>>> session, thanks to normal locking rules; but the lock acquisition rules
>>> don't prevent same-session accesses.

> There are already some safeguards to prevent directly the use of
> aggregates in USING, and here we have a function that itself calls an
> aggregate on the table.

That's an independent issue though.  It stems mostly from not wanting
to use the full-scale planner or executor for subexpressions of utility
commands.  (Of course, the PL function handler does so internally, but
that's its problem.)

The core issue here is that the table's catalog entries, as visible within
this transaction, don't match what's in its disk file.  ALTER TABLE knows
that and is careful not to touch the old table using the new rowtype ---
but nothing else knows that.  So I think we need to block other code
from touching the table-under-modification.  As you say, there's not
any existing infrastructure that will serve for that directly.  We might
be able to invent something comparable to the existing relcache entry
refcount, but counting exclusive opens ("there can be only one").

            regards, tom lane



Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Tom Lane
Date:
I wrote:
> The core issue here is that the table's catalog entries, as visible within
> this transaction, don't match what's in its disk file.  ALTER TABLE knows
> that and is careful not to touch the old table using the new rowtype ---
> but nothing else knows that.  So I think we need to block other code
> from touching the table-under-modification.  As you say, there's not
> any existing infrastructure that will serve for that directly.  We might
> be able to invent something comparable to the existing relcache entry
> refcount, but counting exclusive opens ("there can be only one").

I initially tried to fix this with a hard restriction that you can't open
a relation that's marked as being exclusively accessed, as in the 0001
patch below.  That crashed and burned most spectacularly.  To understand
the second attempt, it's helpful to enumerate some of the problems:

1. We have felt free to allow ALTER TABLE and similar commands to call
subroutines that re-open the target relation.  Refactoring to the point
where that wouldn't happen (by means of always passing down an open
Relation) seems entirely impractical.

2. Keeping the exclusive-access state in relcache entries, as I first
tried to do, simply doesn't work, because ALTER TABLE doesn't
hold the relcache entry open throughout --- for instance, the initial
pin is released just after Phase 1 in ATController.  I did try not
doing that, but it breaks other things.  Besides I'm pretty sure that
that's designed that way intentionally, to reduce the number of forced
rebuilds of the relcache entry.

3. We can't have CheckTableNotInUse enforcing this either, because trying
to do so fails in numerous situation.  ALTER TABLE itself is recursive to
some extent, and it turns out that most of the other callers neither need
nor want any such restriction.  For instance there's actually a regression
test that fails if TRUNCATE tries to lock out accesses by called code.

In view of point 1, I have set this up so that we only check for
disallowed re-entrancy in the parse/rewrite/plan/execute code path.
(The checks added there are in the places where we first acquire lock
on any given relation.)  Perhaps there are other places where we need
to check, but this at least plugs the primary hole.

In view of point 2, there's a new separate hashtable that holds the
exclusive-marking state.  The extra cycles to probe it are slightly
annoying, but in the big scheme of things I doubt it's measurable.
(In any case, in a typical session that has never done ALTER TABLE,
no lookups are required.)

As for point 3, right now only ALTER TABLE is applying such marking
at all, and it's careful to release it as soon as consistency has
been restored.  I looked through the other CheckTableNotInUse
callers and tentatively concluded that none of the other ones have
a problem; but more eyeballs on that would be a good thing.

Anyway, 0001 is just for amusement's sake; 0002 is the proposed patch.

            regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d2b15a3387..9f3f200e7b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3594,6 +3594,10 @@ CheckTableNotInUse(Relation rel, const char *stmt)
         /* translator: first %s is a SQL command, eg ALTER TABLE */
                  errmsg("cannot %s \"%s\" because it has pending trigger events",
                         stmt, RelationGetRelationName(rel))));
+
+    /* Mark the relation to disallow re-entrant access */
+    Assert(rel->rd_exclusive == NULL);
+    rel->rd_exclusive = stmt;
 }

 /*
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index a2453cf1f4..beff48accf 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1095,6 +1095,7 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
      * new relations, it won't be new.  It could be temp though.
      */
     relation->rd_refcnt = 0;
+    relation->rd_exclusive = NULL;
     relation->rd_isnailed = false;
     relation->rd_createSubid = InvalidSubTransactionId;
     relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
@@ -2069,6 +2070,9 @@ RelationIdGetRelation(Oid relationId)
 void
 RelationIncrementReferenceCount(Relation rel)
 {
+    if (rel->rd_exclusive)
+        elog(ERROR, "relation %s already in use by %s",
+             RelationGetRelationName(rel), rel->rd_exclusive);
     ResourceOwnerEnlargeRelationRefs(CurrentResourceOwner);
     rel->rd_refcnt += 1;
     if (!IsBootstrapProcessingMode())
@@ -2086,6 +2090,8 @@ RelationDecrementReferenceCount(Relation rel)
     rel->rd_refcnt -= 1;
     if (!IsBootstrapProcessingMode())
         ResourceOwnerForgetRelationRef(CurrentResourceOwner, rel);
+    if (rel->rd_refcnt == 0)
+        rel->rd_exclusive = NULL;
 }

 /*
@@ -2617,8 +2623,9 @@ RelationClearRelation(Relation relation, bool rebuild)

         /* rd_smgr must not be swapped, due to back-links from smgr level */
         SWAPFIELD(SMgrRelation, rd_smgr);
-        /* rd_refcnt must be preserved */
+        /* rd_refcnt and rd_exclusive must be preserved */
         SWAPFIELD(int, rd_refcnt);
+        SWAPFIELD(const char *, rd_exclusive);
         /* isnailed shouldn't change */
         Assert(newrel->rd_isnailed == relation->rd_isnailed);
         /* creation sub-XIDs must be preserved */
@@ -5917,6 +5924,7 @@ load_relcache_init_file(bool shared)
             rel->rd_refcnt = 1;
         else
             rel->rd_refcnt = 0;
+        rel->rd_exclusive = NULL;
         rel->rd_indexvalid = false;
         rel->rd_indexlist = NIL;
         rel->rd_pkindex = InvalidOid;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 0b5957ba02..22cf7d4e70 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -57,6 +57,7 @@ typedef struct RelationData
     struct SMgrRelationData *rd_smgr;    /* cached file handle, or NULL */
     int            rd_refcnt;        /* reference count */
     BackendId    rd_backend;        /* owning backend id, if temporary relation */
+    const char *rd_exclusive;    /* if not NULL, disallow re-entrant access */
     bool        rd_islocaltemp; /* rel is a temp rel of this session */
     bool        rd_isnailed;    /* rel is nailed in cache */
     bool        rd_isvalid;        /* relcache entry is valid */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d2b15a3387..ddcbf88c38 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3571,6 +3571,9 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
  * we are worried about active indexscans on the index.  The trigger-event
  * check can be skipped, since we are doing no damage to the parent table.
  *
+ * Callers should consider also calling RelationMarkExclusiveAccess.
+ * We don't force that, because for many it's safe not to require it.
+ *
  * The statement name (eg, "ALTER TABLE") is passed for use in error messages.
  */
 void
@@ -3594,6 +3597,11 @@ CheckTableNotInUse(Relation rel, const char *stmt)
         /* translator: first %s is a SQL command, eg ALTER TABLE */
                  errmsg("cannot %s \"%s\" because it has pending trigger events",
                         stmt, RelationGetRelationName(rel))));
+
+    /*
+     * Ideally, we'd also do RelationCheckExclusiveAccess(rel) here, but
+     * currently ALTER TABLE can recurse in ways that would trigger an error.
+     */
 }

 /*
@@ -4001,22 +4009,45 @@ ATController(AlterTableStmt *parsetree,
     List       *wqueue = NIL;
     ListCell   *lcmd;

-    /* Phase 1: preliminary examination of commands, create work queue */
-    foreach(lcmd, cmds)
+    /*
+     * Tables are marked for exclusive access when they are added to the work
+     * queue.  We must be certain that the marks are released on exit, so use
+     * a PG_TRY block.  (If we get through without error, ATRewriteTables will
+     * have released the marks, so we needn't do so again.)
+     */
+    PG_TRY();
     {
-        AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);
+        /* Phase 1: preliminary examination of commands, create work queue */
+        foreach(lcmd, cmds)
+        {
+            AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);

-        ATPrepCmd(&wqueue, rel, cmd, recurse, false, lockmode, context);
-    }
+            ATPrepCmd(&wqueue, rel, cmd, recurse, false, lockmode, context);
+        }

-    /* Close the relation, but keep lock until commit */
-    relation_close(rel, NoLock);
+        /* Close the relation, but keep lock until commit */
+        relation_close(rel, NoLock);
+
+        /* Phase 2: update system catalogs */
+        ATRewriteCatalogs(&wqueue, lockmode, context);
+
+        /* Phase 3: scan/rewrite tables as needed, and run afterStmts */
+        ATRewriteTables(parsetree, &wqueue, lockmode, context);
+    }
+    PG_CATCH();
+    {
+        /* Release exclusive-access marks after an error */
+        ListCell   *ltab;

-    /* Phase 2: update system catalogs */
-    ATRewriteCatalogs(&wqueue, lockmode, context);
+        foreach(ltab, wqueue)
+        {
+            AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);

-    /* Phase 3: scan/rewrite tables as needed, and run afterStmts */
-    ATRewriteTables(parsetree, &wqueue, lockmode, context);
+            RelationReleaseExclusiveAccess(tab->relid);
+        }
+        PG_RE_THROW();
+    }
+    PG_END_TRY();
 }

 /*
@@ -5064,6 +5095,19 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
         }
     }

+    /*
+     * At this point, the table(s) are self-consistent again, so it's safe to
+     * release our exclusive-access marks.  We must do so, in fact, because
+     * foreign key revalidation will use the executor which would fail;
+     * likewise, any afterStmts could fail.
+     */
+    foreach(ltab, *wqueue)
+    {
+        AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
+
+        RelationReleaseExclusiveAccess(tab->relid);
+    }
+
     /*
      * Foreign key constraints are checked in a final pass, since (a) it's
      * generally best to examine each one separately, and (b) it's at least
@@ -5536,6 +5580,14 @@ ATGetQueueEntry(List **wqueue, Relation rel)

     *wqueue = lappend(*wqueue, tab);

+    /*
+     * Mark the table to disallow re-entrant access by any user-defined
+     * queries that might be called from ALTER TABLE actions.  Now that the
+     * table is entered in the work queue, we can be sure that the marking
+     * will be released on error exit.
+     */
+    RelationMarkExclusiveAccess(relid, "ALTER TABLE");
+
     return tab;
 }

diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 3132fd35a5..0bb4c50c96 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -181,9 +181,15 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
             RangeTblEntry *childrte;
             Index        childRTindex;

-            /* Open rel if needed; we already have required locks */
+            /*
+             * Open rel if needed.  We already have required locks, but for
+             * child tables, we must also check for exclusive-access marks.
+             */
             if (childOID != parentOID)
+            {
                 newrelation = table_open(childOID, NoLock);
+                RelationCheckExclusiveAccess(newrelation);
+            }
             else
                 newrelation = oldrelation;

@@ -360,6 +366,9 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
         /* Open rel, acquiring required locks */
         childrel = table_open(childOID, lockmode);

+        /* Check for exclusive-access marks, too */
+        RelationCheckExclusiveAccess(childrel);
+
         /*
          * Temporary partitions belonging to other sessions should have been
          * disallowed at definition, but for paranoia's sake, let's double
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index b875a50646..7e6416bc79 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1336,6 +1336,8 @@ buildNSItemFromLists(RangeTblEntry *rte, Index rtindex,
  * This is essentially just the same as table_openrv(), except that it caters
  * to some parser-specific error reporting needs, notably that it arranges
  * to include the RangeVar's parse location in any resulting error.
+ * Also, we check to see if we're inside some command that has marked the
+ * table for exclusive access.
  *
  * Note: properly, lockmode should be declared LOCKMODE not int, but that
  * would require importing storage/lock.h into parse_relation.h.  Since
@@ -1379,6 +1381,14 @@ parserOpenTable(ParseState *pstate, const RangeVar *relation, int lockmode)
                                 relation->relname)));
         }
     }
+
+    /*
+     * Verify that table isn't being munged by, eg, ALTER TABLE.  Our lock
+     * ensures that that's not happening in other sessions, but it doesn't
+     * protect against an ALTER in our own session.
+     */
+    RelationCheckExclusiveAccess(rel);
+
     cancel_parser_errposition_callback(&pcbstate);
     return rel;
 }
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index fe777c3103..356c1c9e1a 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -182,6 +182,14 @@ AcquireRewriteLocks(Query *parsetree,

                 rel = table_open(rte->relid, lockmode);

+                /*
+                 * Verify that table isn't being munged by, eg, ALTER TABLE.
+                 * Our lock ensures that that's not happening in other
+                 * sessions, but it doesn't protect against an ALTER in our
+                 * own session.
+                 */
+                RelationCheckExclusiveAccess(rel);
+
                 /*
                  * While we have the relation open, update the RTE's relkind,
                  * just in case it changed since this rule was made.
@@ -3051,6 +3059,11 @@ rewriteTargetView(Query *parsetree, Relation view)
      */
     base_rel = table_open(base_rte->relid, RowExclusiveLock);

+    /*
+     * Verify that table isn't being munged by, eg, ALTER TABLE.
+     */
+    RelationCheckExclusiveAccess(base_rel);
+
     /*
      * While we have the relation open, update the RTE's relkind, just in case
      * it changed since this view was made (cf. AcquireRewriteLocks).
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 50d6ad28b4..fe47bab8f3 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1779,7 +1779,11 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
              * acquire a non-conflicting lock.
              */
             if (acquire)
+            {
                 LockRelationOid(rte->relid, rte->rellockmode);
+                /* ... and check for exclusive-access marks */
+                RelidCheckExclusiveAccess(rte->relid);
+            }
             else
                 UnlockRelationOid(rte->relid, rte->rellockmode);
         }
@@ -1839,7 +1843,11 @@ ScanQueryForLocks(Query *parsetree, bool acquire)
             case RTE_RELATION:
                 /* Acquire or release the appropriate type of lock */
                 if (acquire)
+                {
                     LockRelationOid(rte->relid, rte->rellockmode);
+                    /* ... and check for exclusive-access marks */
+                    RelidCheckExclusiveAccess(rte->relid);
+                }
                 else
                     UnlockRelationOid(rte->relid, rte->rellockmode);
                 break;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index a2453cf1f4..6fada587f7 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -182,6 +182,17 @@ static TupleDesc *EOXactTupleDescArray;
 static int    NextEOXactTupleDescNum = 0;
 static int    EOXactTupleDescArrayLen = 0;

+/*
+ * Table of relation OIDs that are marked for exclusive access.
+ */
+typedef struct
+{
+    Oid            reloid;            /* lookup key: OID of marked relation */
+    const char *stmt;            /* statement name of marking code */
+} ExclusiveMarkEnt;
+
+static HTAB *ExclusiveMarkTab;
+
 /*
  *        macros to manipulate the lookup hashtable
  */
@@ -6397,3 +6408,130 @@ unlink_initfile(const char *initfilename, int elevel)
                             initfilename)));
     }
 }
+
+
+/*
+ * RelationMarkExclusiveAccess
+ *        Mark relation as not being available for re-entrant access.
+ *
+ * This function and its siblings allow us to prevent access to a table
+ * that's being modified by other code in the same session.  Taking
+ * AccessExclusiveLock prevents *other* sessions from touching the table,
+ * but not our own session.  Therefore, logic that might call arbitrary code
+ * while making critical changes to a table needs to use this to ensure that
+ * the table is not accessed by user code while it's in an inconsistent state.
+ *
+ * The marking persists until RelationReleaseExclusiveAccess is called.
+ * It is caller's responsibility to be sure that happens even after an
+ * error; there is no transaction-related cleanup logic!
+ *
+ * "stmt" should be a literal constant string, or otherwise guaranteed to
+ * remain stable until exclusive access is released.
+ */
+void
+RelationMarkExclusiveAccess(Oid relid, const char *stmt)
+{
+    ExclusiveMarkEnt *entry;
+
+    if (ExclusiveMarkTab == NULL)
+    {
+        /* First time through: initialize the hash table */
+        HASHCTL        ctl;
+
+        MemSet(&ctl, 0, sizeof(ctl));
+        ctl.keysize = sizeof(Oid);
+        ctl.entrysize = sizeof(ExclusiveMarkEnt);
+        ExclusiveMarkTab = hash_create("Exclusive access table", 16,
+                                       &ctl, HASH_ELEM | HASH_BLOBS);
+    }
+
+    /*
+     * Create an entry.  We don't worry about whether there already was one.
+     */
+    entry = (ExclusiveMarkEnt *) hash_search(ExclusiveMarkTab,
+                                             (void *) &relid,
+                                             HASH_ENTER, NULL);
+    entry->stmt = stmt;
+}
+
+/*
+ * RelationReleaseExclusiveAccess
+ *        Release exclusive-access marking of relation.
+ *
+ * It's not an error for the relation not to be marked.
+ */
+void
+RelationReleaseExclusiveAccess(Oid relid)
+{
+    /* Nothing to do if no table */
+    if (ExclusiveMarkTab == NULL)
+        return;
+    /* Delete entry, if any */
+    (void) hash_search(ExclusiveMarkTab,
+                       (void *) &relid,
+                       HASH_REMOVE, NULL);
+
+}
+
+/*
+ * RelationCheckExclusiveAccess
+ *        Throws error if relation is marked for exclusive access.
+ */
+void
+RelationCheckExclusiveAccess(Relation rel)
+{
+    ExclusiveMarkEnt *entry;
+
+    /* Nothing to do if no table */
+    if (ExclusiveMarkTab == NULL)
+        return;
+    /* Is rel in the table? */
+    entry = (ExclusiveMarkEnt *) hash_search(ExclusiveMarkTab,
+                                             (void *) &(rel->rd_id),
+                                             HASH_FIND, NULL);
+    if (entry)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_IN_USE),
+        /* translator: second %s is a SQL command, eg ALTER TABLE */
+                 errmsg("relation \"%s\" is already in use by %s",
+                        RelationGetRelationName(rel), entry->stmt)));
+}
+
+/*
+ * RelidCheckExclusiveAccess
+ *        As above, but we only have a relation OID.
+ */
+void
+RelidCheckExclusiveAccess(Oid relid)
+{
+    ExclusiveMarkEnt *entry;
+
+    /* Nothing to do if no table */
+    if (ExclusiveMarkTab == NULL)
+        return;
+    /* Is rel in the table? */
+    entry = (ExclusiveMarkEnt *) hash_search(ExclusiveMarkTab,
+                                             (void *) &relid,
+                                             HASH_FIND, NULL);
+    if (entry)
+    {
+        /*
+         * If the OID is in the mark table, it seems reasonably safe to assume
+         * that get_rel_name will succeed.  But just in case ...
+         */
+        char       *relname = get_rel_name(relid);
+
+        if (relname)
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_IN_USE),
+            /* translator: second %s is a SQL command, eg ALTER TABLE */
+                     errmsg("relation \"%s\" is already in use by %s",
+                            relname, entry->stmt)));
+        else
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_IN_USE),
+            /* translator: %s is a SQL command, eg ALTER TABLE */
+                     errmsg("relation with OID %u is already in use by %s",
+                            relid, entry->stmt)));
+    }
+}
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 0b5957ba02..0887b53a38 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -640,5 +640,9 @@ typedef struct ViewOptions
 /* routines in utils/cache/relcache.c */
 extern void RelationIncrementReferenceCount(Relation rel);
 extern void RelationDecrementReferenceCount(Relation rel);
+extern void RelationMarkExclusiveAccess(Oid relid, const char *stmt);
+extern void RelationReleaseExclusiveAccess(Oid relid);
+extern void RelationCheckExclusiveAccess(Relation rel);
+extern void RelidCheckExclusiveAccess(Oid relid);

 #endif                            /* REL_H */
diff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out
index 1b03310029..9ef09c3a15 100644
--- a/src/test/regress/expected/transactions.out
+++ b/src/test/regress/expected/transactions.out
@@ -534,6 +534,14 @@ select * from xacttest;
 (5 rows)

 rollback;
+-- test case for trying to access a table within an ALTER TABLE on it
+alter table xacttest alter column a set data type text
+  using max_xacttest()::text;
+ERROR:  relation "xacttest" is already in use by ALTER TABLE
+LINE 1: SELECT max(a) from xacttest
+                           ^
+QUERY:  SELECT max(a) from xacttest
+CONTEXT:  PL/pgSQL function max_xacttest() line 1 at RETURN
 -- test case for problems with dropping an open relation during abort
 BEGIN;
     savepoint x;
diff --git a/src/test/regress/sql/transactions.sql b/src/test/regress/sql/transactions.sql
index bf1016489d..1ca9b335aa 100644
--- a/src/test/regress/sql/transactions.sql
+++ b/src/test/regress/sql/transactions.sql
@@ -316,6 +316,10 @@ select * from xacttest;
 rollback;


+-- test case for trying to access a table within an ALTER TABLE on it
+alter table xacttest alter column a set data type text
+  using max_xacttest()::text;
+
 -- test case for problems with dropping an open relation during abort
 BEGIN;
     savepoint x;

Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From
Alvaro Herrera
Date:
On 2020-Aug-26, Robins Tharakan wrote:

> For now though, I need to see how to exclude some functions. The above
> use-case has pretty much rendered my test-setup useless - since it easily
> triggers engine restarts every few minutes.

An easy workaround from Postgres POV might be to patch it to return an
error on ATExecAlterColumnType or so.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services