Thread: Relations being opened without any lock whatever

Relations being opened without any lock whatever

From
Tom Lane
Date:
Running the regression tests with the patch I showed in
https://www.postgresql.org/message-id/16565.1538327894@sss.pgh.pa.us
exposes two places where HEAD is opening relations without having
any lock at all on them:

1. ALTER TABLE ... ALTER COLUMN TYPE, on a column that is part of
a foreign key constraint, opens the rel that has the other end of
the constraint before it's acquired a lock on said rel.

The comment in ATPostAlterTypeCleanup claims this is "safe because the
parser won't actually look at the catalogs to detect the existing entry",
but I think that's largely horsepucky.  The parser absolutely does do
relation_open, and it expects the caller to have gotten a lock sufficient
to protect that (cf transformAlterTableStmt).

It's possible that this doesn't have any real effect.  Since we're
already holding AccessExclusiveLock on our own end of the FK constraint,
it'd be impossible for another session to drop the FK constraint, or
by extension the other table.  Still, running a relcache load on a
table we have no lock on seems pretty unsafe, especially so in branches
before we used MVCC for catalog reads.  So I'm inclined to apply the
attached patch all the way back.  (The mentioned comment also needs
rewritten; this is just the minimal code change to get rid of the test
failure.)

2. pageinspect's tuple_data_split_internal supposes that it needs no
lock on the referenced table.  Perhaps there was an expectation that
some earlier function would have taken a lock and not released it,
but this is demonstrably not happening in the module's own regression
test.  I think we should just take AccessShareLock there and not try
to be cute.  Again, this seems to be back-patch material.

Thoughts, objections?

            regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9e60bb3..9e7ae73 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9905,6 +9905,7 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
         Form_pg_constraint con;
         Oid            relid;
         Oid            confrelid;
+        char        contype;
         bool        conislocal;

         tup = SearchSysCache1(CONSTROID, ObjectIdGetDatum(oldId));
@@ -9921,6 +9922,7 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
                 elog(ERROR, "could not identify relation associated with constraint %u", oldId);
         }
         confrelid = con->confrelid;
+        contype = con->contype;
         conislocal = con->conislocal;
         ReleaseSysCache(tup);

@@ -9936,6 +9938,15 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode)
         if (!conislocal)
             continue;

+        /*
+         * When rebuilding an FK constraint that references the table we're
+         * modifying, we might not yet have any lock on the FK's table, so get
+         * one now.  We'll need AccessExclusiveLock for the DROP CONSTRAINT
+         * step, so there's no value in asking for anything weaker.
+         */
+        if (relid != tab->relid && contype == CONSTRAINT_FOREIGN)
+            LockRelationOid(relid, AccessExclusiveLock);
+
         ATPostAlterTypeParse(oldId, relid, confrelid,
                              (char *) lfirst(def_item),
                              wqueue, lockmode, tab->rewrite);
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 7438257..513db4b 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -298,9 +298,8 @@ tuple_data_split_internal(Oid relid, char *tupdata,
     TupleDesc    tupdesc;
 
     /* Get tuple descriptor from relation OID */
-    rel = relation_open(relid, NoLock);
-    tupdesc = CreateTupleDescCopyConstr(rel->rd_att);
-    relation_close(rel, NoLock);
+    rel = relation_open(relid, AccessShareLock);
+    tupdesc = RelationGetDescr(rel);
 
     raw_attrs = initArrayResult(BYTEAOID, CurrentMemoryContext, false);
     nattrs = tupdesc->natts;
@@ -317,7 +316,6 @@ tuple_data_split_internal(Oid relid, char *tupdata,
         bytea       *attr_data = NULL;
 
         attr = TupleDescAttr(tupdesc, i);
-        is_null = (t_infomask & HEAP_HASNULL) && att_isnull(i, t_bits);
 
         /*
          * Tuple header can specify less attributes than tuple descriptor as
@@ -327,6 +325,8 @@ tuple_data_split_internal(Oid relid, char *tupdata,
          */
         if (i >= (t_infomask2 & HEAP_NATTS_MASK))
             is_null = true;
+        else
+            is_null = (t_infomask & HEAP_HASNULL) && att_isnull(i, t_bits);
 
         if (!is_null)
         {
@@ -385,6 +385,7 @@ tuple_data_split_internal(Oid relid, char *tupdata,
         ereport(ERROR,
                 (errcode(ERRCODE_DATA_CORRUPTED),
                  errmsg("end of tuple reached without looking at all its data")));
+    relation_close(rel, AccessShareLock);
 
     return makeArrayResult(raw_attrs, CurrentMemoryContext);
 }

Re: Relations being opened without any lock whatever

From
Michael Paquier
Date:
On Sun, Sep 30, 2018 at 03:20:44PM -0400, Tom Lane wrote:
> 1. ALTER TABLE ... ALTER COLUMN TYPE, on a column that is part of
> a foreign key constraint, opens the rel that has the other end of
> the constraint before it's acquired a lock on said rel.
>
> The comment in ATPostAlterTypeCleanup claims this is "safe because the
> parser won't actually look at the catalogs to detect the existing entry",
> but I think that's largely horsepucky.  The parser absolutely does do
> relation_open, and it expects the caller to have gotten a lock sufficient
> to protect that (cf transformAlterTableStmt).
>
> It's possible that this doesn't have any real effect.  Since we're
> already holding AccessExclusiveLock on our own end of the FK constraint,
> it'd be impossible for another session to drop the FK constraint, or
> by extension the other table.  Still, running a relcache load on a
> table we have no lock on seems pretty unsafe, especially so in branches
> before we used MVCC for catalog reads.  So I'm inclined to apply the
> attached patch all the way back.  (The mentioned comment also needs
> rewritten; this is just the minimal code change to get rid of the test
> failure.)

Okay, that's bad.  Wouldn't it be sufficient to use what the caller
passes out as lockmode instead of enforcing AEL though?

> 2. pageinspect's tuple_data_split_internal supposes that it needs no
> lock on the referenced table.  Perhaps there was an expectation that
> some earlier function would have taken a lock and not released it,
> but this is demonstrably not happening in the module's own regression
> test.  I think we should just take AccessShareLock there and not try
> to be cute.  Again, this seems to be back-patch material.

Yes, that's incorrect.  So +1 on this one.
--
Michael

Attachment

Re: Relations being opened without any lock whatever

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Okay, that's bad.  Wouldn't it be sufficient to use what the caller
> passes out as lockmode instead of enforcing AEL though?

No, because at the bottom of that function we're going to do a DROP
CONSTRAINT on the old FK constraint, and that needs AEL anyway.
If we tried to take a lesser lock first we'd just be creating a
lock-upgrade deadlock risk.

            regards, tom lane


Re: Relations being opened without any lock whatever

From
Amit Langote
Date:
On 2018/10/01 4:20, Tom Lane wrote:
> Running the regression tests with the patch I showed in
> https://www.postgresql.org/message-id/16565.1538327894@sss.pgh.pa.us
> exposes two places where HEAD is opening relations without having
> any lock at all on them:

Maybe you've noticed but the relation_open calls coming from bootstrap.c
all pass NoLock which trigger the WARNING:

$ initdb -D /tmp/foo
<snip>
WARNING:  relation_open: no lock held on pg_type
WARNING:  relation_open: no lock held on pg_attrdef
WARNING:  relation_open: no lock held on pg_constraint
WARNING:  relation_open: no lock held on pg_inherits
WARNING:  relation_open: no lock held on pg_index
WARNING:  relation_open: no lock held on pg_operator
WARNING:  relation_open: no lock held on pg_opfamily
<so on>

Do we need to do something about that, like teaching boot_openrel() and
gettype() in bootstrap.c to pass AccessShareLock instead of NoLock?

Thanks,
Amit



Re: Relations being opened without any lock whatever

From
Tom Lane
Date:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2018/10/01 4:20, Tom Lane wrote:
>> Running the regression tests with the patch I showed in
>> https://www.postgresql.org/message-id/16565.1538327894@sss.pgh.pa.us

> Maybe you've noticed but the relation_open calls coming from bootstrap.c
> all pass NoLock which trigger the WARNING:

Yeah, I'd missed noticing that at the time I posted that patch, but I
sure noticed after changing the WARNING to an Assert ;-)

> Do we need to do something about that, like teaching boot_openrel() and
> gettype() in bootstrap.c to pass AccessShareLock instead of NoLock?

No, bootstrap mode has no need for locking.  I think the right fix is
just to skip the check:
 
+    /*
+     * If we didn't get the lock ourselves, assert that caller holds one,
+     * except in bootstrap mode where no locks are used.
+     */
+    Assert(lockmode != NoLock ||
+           IsBootstrapProcessingMode() ||
+           CheckRelationLockedByMe(r, AccessShareLock, true));

It's possible that at some point we'd decide to make bootstrap mode
do locking the same as normal mode, but that's not a change I want
to make as part of this patch.

            regards, tom lane