Relations being opened without any lock whatever - Mailing list pgsql-hackers

From Tom Lane
Subject Relations being opened without any lock whatever
Date
Msg-id 2038.1538335244@sss.pgh.pa.us
Whole thread Raw
Responses Re: Relations being opened without any lock whatever  (Michael Paquier <michael@paquier.xyz>)
Re: Relations being opened without any lock whatever  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-hackers
Running the regression tests with the patch I showed in
https://www.postgresql.org/message-id/16565.1538327894@sss.pgh.pa.us
exposes two places where HEAD is opening relations without having
any lock at all on them:

1. ALTER TABLE ... ALTER COLUMN TYPE, on a column that is part of
a foreign key constraint, opens the rel that has the other end of
the constraint before it's acquired a lock on said rel.

The comment in ATPostAlterTypeCleanup claims this is "safe because the
parser won't actually look at the catalogs to detect the existing entry",
but I think that's largely horsepucky.  The parser absolutely does do
relation_open, and it expects the caller to have gotten a lock sufficient
to protect that (cf transformAlterTableStmt).

It's possible that this doesn't have any real effect.  Since we're
already holding AccessExclusiveLock on our own end of the FK constraint,
it'd be impossible for another session to drop the FK constraint, or
by extension the other table.  Still, running a relcache load on a
table we have no lock on seems pretty unsafe, especially so in branches
before we used MVCC for catalog reads.  So I'm inclined to apply the
attached patch all the way back.  (The mentioned comment also needs
rewritten; this is just the minimal code change to get rid of the test
failure.)

2. pageinspect's tuple_data_split_internal supposes that it needs no
lock on the referenced table.  Perhaps there was an expectation that
some earlier function would have taken a lock and not released it,
but this is demonstrably not happening in the module's own regression
test.  I think we should just take AccessShareLock there and not try
to be cute.  Again, this seems to be back-patch material.

Thoughts, objections?

            regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9e60bb3..9e7ae73 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9905,6 +9905,7 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
         Form_pg_constraint con;
         Oid            relid;
         Oid            confrelid;
+        char        contype;
         bool        conislocal;

         tup = SearchSysCache1(CONSTROID, ObjectIdGetDatum(oldId));
@@ -9921,6 +9922,7 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
                 elog(ERROR, "could not identify relation associated with constraint %u", oldId);
         }
         confrelid = con->confrelid;
+        contype = con->contype;
         conislocal = con->conislocal;
         ReleaseSysCache(tup);

@@ -9936,6 +9938,15 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
         if (!conislocal)
             continue;

+        /*
+         * When rebuilding an FK constraint that references the table we're
+         * modifying, we might not yet have any lock on the FK's table, so get
+         * one now.  We'll need AccessExclusiveLock for the DROP CONSTRAINT
+         * step, so there's no value in asking for anything weaker.
+         */
+        if (relid != tab->relid && contype == CONSTRAINT_FOREIGN)
+            LockRelationOid(relid, AccessExclusiveLock);
+
         ATPostAlterTypeParse(oldId, relid, confrelid,
                              (char *) lfirst(def_item),
                              wqueue, lockmode, tab->rewrite);
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 7438257..513db4b 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -298,9 +298,8 @@ tuple_data_split_internal(Oid relid, char *tupdata,
     TupleDesc    tupdesc;
 
     /* Get tuple descriptor from relation OID */
-    rel = relation_open(relid, NoLock);
-    tupdesc = CreateTupleDescCopyConstr(rel->rd_att);
-    relation_close(rel, NoLock);
+    rel = relation_open(relid, AccessShareLock);
+    tupdesc = RelationGetDescr(rel);
 
     raw_attrs = initArrayResult(BYTEAOID, CurrentMemoryContext, false);
     nattrs = tupdesc->natts;
@@ -317,7 +316,6 @@ tuple_data_split_internal(Oid relid, char *tupdata,
         bytea       *attr_data = NULL;
 
         attr = TupleDescAttr(tupdesc, i);
-        is_null = (t_infomask & HEAP_HASNULL) && att_isnull(i, t_bits);
 
         /*
          * Tuple header can specify less attributes than tuple descriptor as
@@ -327,6 +325,8 @@ tuple_data_split_internal(Oid relid, char *tupdata,
          */
         if (i >= (t_infomask2 & HEAP_NATTS_MASK))
             is_null = true;
+        else
+            is_null = (t_infomask & HEAP_HASNULL) && att_isnull(i, t_bits);
 
         if (!is_null)
         {
@@ -385,6 +385,7 @@ tuple_data_split_internal(Oid relid, char *tupdata,
         ereport(ERROR,
                 (errcode(ERRCODE_DATA_CORRUPTED),
                  errmsg("end of tuple reached without looking at all its data")));
+    relation_close(rel, AccessShareLock);
 
     return makeArrayResult(raw_attrs, CurrentMemoryContext);
 }

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: executor relation handling
Next
From: Christoph Moench-Tegeder
Date:
Subject: Function for listing archive_status directory