Thread: Re: Dangling operator family after DROP TYPE

Re: Dangling operator family after DROP TYPE

From
Tom Lane
Date:
Yoran Heling <contact@yorhel.nl> writes:
> pg_upgrade was failing on one of my databases and, while digging a bit,
> I found that the database in question contained a dangling operator
> family for a type that didn't exist anymore.

> I've attached a script to recreate this situation: creating a new type,
> defining an operator class for it and then dropping the type causes the
> implicitly created operator family to remain.

Thanks for the report.  I don't think it's wrong for the now-empty
operator family to stick around: it has no direct dependency on the
dropped type.  Also, trying to make it go away would cause problems
if another operator class for another type had been added to the
family meanwhile.  However, these things are bad:

> Attempting to drop this operator family results in an error. Attempting
> to do a dump/restore results in a syntax error on restore.

The problem seems to be that the pg_amproc entry for the opclass'
function 4 doesn't get dropped.  Examining the pre-drop pg_depend
entries for the opclass and opfamily, we find

# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid)
asref, deptype from pg_depend where ... 
 objid |                                             obj                                             |
      ref                         | deptype  

-------+---------------------------------------------------------------------------------------------+-----------------------------------------------------+---------
 ...
 45105 | operator family t_btree_ops for access method btree                                         | schema public
                                  | n 
 45107 | operator 1 (t, t) of operator family t_btree_ops for access method btree: <(t,t)            | operator <(t,t)
                                  | n 
 45107 | operator 1 (t, t) of operator family t_btree_ops for access method btree: <(t,t)            | operator class
t_btree_opsfor access method btree  | i 
 45108 | operator 2 (t, t) of operator family t_btree_ops for access method btree: <=(t,t)           | operator <=(t,t)
                                  | n 
 45108 | operator 2 (t, t) of operator family t_btree_ops for access method btree: <=(t,t)           | operator class
t_btree_opsfor access method btree  | i 
 45109 | operator 3 (t, t) of operator family t_btree_ops for access method btree: =(t,t)            | operator =(t,t)
                                  | n 
 45109 | operator 3 (t, t) of operator family t_btree_ops for access method btree: =(t,t)            | operator class
t_btree_opsfor access method btree  | i 
 45110 | operator 4 (t, t) of operator family t_btree_ops for access method btree: >=(t,t)           | operator >=(t,t)
                                  | n 
 45110 | operator 4 (t, t) of operator family t_btree_ops for access method btree: >=(t,t)           | operator class
t_btree_opsfor access method btree  | i 
 45111 | operator 5 (t, t) of operator family t_btree_ops for access method btree: >(t,t)            | operator >(t,t)
                                  | n 
 45111 | operator 5 (t, t) of operator family t_btree_ops for access method btree: >(t,t)            | operator class
t_btree_opsfor access method btree  | i 
 45112 | function 1 (t, t) of operator family t_btree_ops for access method btree: t_cmp(t,t)        | function
t_cmp(t,t)                                | n 
 45112 | function 1 (t, t) of operator family t_btree_ops for access method btree: t_cmp(t,t)        | operator class
t_btree_opsfor access method btree  | i 
 45113 | function 4 (t, t) of operator family t_btree_ops for access method btree: btequalimage(oid) | operator family
t_btree_opsfor access method btree | a 
 45106 | operator class t_btree_ops for access method btree                                          | schema public
                                  | n 
 45106 | operator class t_btree_ops for access method btree                                          | operator family
t_btree_opsfor access method btree | a 
 45106 | operator class t_btree_ops for access method btree                                          | type t
                                  | n 
 ...

pg_amproc OID 45112 (function 1) has a dependency on t_cmp(t,t), which
of course depends in turn on type t.  It also has a dependency on
the operator class, which also depends on t, so for sure it's going
away during "DROP TYPE t".  But look at 45113 (function 4).  It would
have a dependency on btequalimage(), but we don't record that because
btequalimage() is a pinned built-in function.  Its other dependency
is on the operator family not the operator class.  This seems like the
wrong thing.  It's intentional according to the code: in nbtvalidate.c
we have

        if (op->is_func && op->number != BTORDER_PROC)
        {
            /* Optional support proc, so always a soft family dependency */
            op->ref_is_hard = false;
            op->ref_is_family = true;
            op->refobjid = opfamilyoid;
        }

But I think we copied that pattern from other index AMs without
thinking too hard about it.  In AMs like GiST, the argument is

     * Operator members of a GiST opfamily should never have hard
     * dependencies, since their connection to the opfamily depends only on
     * what the support functions think, and that can be altered.  For
     * consistency, we make all soft dependencies point to the opfamily,
     * though a soft dependency on the opclass would work as well in the
     * CREATE OPERATOR CLASS case.

It seems like maybe btree should be using soft dependencies on the
opclass for optional support functions?  Not quite sure about that.
There were a lot of moving parts in these choices IIRC.

Now the big reason that the leftover pg_amproc entry causes problems
is that its amproclefttype/amprocrighttype entries are still
referencing the deleted type "t".  That wouldn't really stop us from
deleting the opfamily, except that during DROP CASCADE we try to print
descriptions of all the dropped objects, and getObjectDescription
calls format_type_extended which fails.  The leftover entry also
causes issues for pg_dump, which will emit something like

CREATE OPERATOR FAMILY public.t_btree_ops USING btree;
ALTER OPERATOR FAMILY public.t_btree_ops USING btree ADD
    FUNCTION 4 (45086, 45086) btequalimage(oid);

which of course doesn't parse during restore.

So we really need to fix things so that the pg_amproc entry goes away.
Switching its dependency to be on the opclass would do.  A different
approach that might solve more problems is to be careful to record
a dependency from a pg_amproc entry to the type(s) mentioned in it.
This would be redundant in the case where the referenced function
has those types as input, but we can't really assume that for
support functions.

At least in the back branches, I'm inclined to also fix
getObjectDescription to use FORMAT_TYPE_ALLOW_INVALID when
printing the types of a pg_amproc entry, so that you're not
quite so thoroughly hosed if you already have this situation
in your catalog.

Peter, any thoughts about this?

            regards, tom lane



Re: Dangling operator family after DROP TYPE

From
Peter Geoghegan
Date:
On Fri, Dec 6, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thanks for the report.  I don't think it's wrong for the now-empty
> operator family to stick around: it has no direct dependency on the
> dropped type.  Also, trying to make it go away would cause problems
> if another operator class for another type had been added to the
> family meanwhile.  However, these things are bad:
>
> > Attempting to drop this operator family results in an error. Attempting
> > to do a dump/restore results in a syntax error on restore.

Agreed.

> It's intentional according to the code: in nbtvalidate.c
> we have
>
>         if (op->is_func && op->number != BTORDER_PROC)
>         {
>             /* Optional support proc, so always a soft family dependency */
>             op->ref_is_hard = false;
>             op->ref_is_family = true;
>             op->refobjid = opfamilyoid;
>         }
>
> But I think we copied that pattern from other index AMs without
> thinking too hard about it.

That is accurate.

> Peter, any thoughts about this?

Nothing much to say about it.

I would just point out that using the built-in allequalimage function
is specifically documented as bad practice. After all, you as an
individual non-core opclass author don't have any control over its
behavior. At the same time, I do understand the temptation to use the
built-in allequalimage function. In practice most individual B-Tree
opclasses are *obviously* deduplication-safe, and it's convenient to
have a trivial function for that.

--
Peter Geoghegan



Re: Dangling operator family after DROP TYPE

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Dec 6, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Peter, any thoughts about this?

> Nothing much to say about it.

> I would just point out that using the built-in allequalimage function
> is specifically documented as bad practice.

Perhaps, but there are surely plenty of other ways to get into this
situation, when you have support functions whose signatures don't
involve the data type of the indexed column.

Here's a couple of proposed patches.  The first just makes
getObjectDescription robust against dangling
amproclefttype/amprocrighttype links.  (I did the same for pg_amop
entries, though that may be dead code, per comments below.)  I checked
that this allows dropping the busted opfamily.

The second one solves the problem more permanently by adding
dependencies on the types whenever we don't have an indirect
dependency through the operator or function.  Coverage checking shows
that the function case is actually hit in our regression tests (during
creation of contrib GiST opclasses), but the operator case isn't.
I think that the check for operators may be dead code, because AFAICS
from a quick look through opclasscmds.c, assignOperTypes will always
fill lefttype/righttype from the operator's input types and there's
nothing to override that.  But it's at least conceivable that the
index AM's amadjustmembers function would modify the
lefttype/righttype settings.  So I'm inclined to include that code
even if it does nothing today.

I looked at whether we could add a regression test for this, but
all of the cases that presently hit it are contrib extensions.
So there's no way to drop the data type without also dropping the
opfamily (which'd be likewise a member of the extension).  That
probably explains the lack of field reports of this old problem.
We could devise something no doubt, but it doesn't quite seem
worth the trouble and test cycles.

            regards, tom lane

diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 7520a98215..be7e4a5dd0 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -3235,6 +3235,12 @@ getObjectDescription(const ObjectAddress *object, bool missing_ok)
                 initStringInfo(&opfam);
                 getOpFamilyDescription(&opfam, amopForm->amopfamily, false);

+                /*
+                 * We use FORMAT_TYPE_ALLOW_INVALID here so as not to fail
+                 * completely if the type links are dangling, which is a form
+                 * of catalog corruption that could occur due to old bugs.
+                 */
+
                 /*------
                    translator: %d is the operator strategy (a number), the
                    first two %s's are data type names, the third %s is the
@@ -3242,8 +3248,10 @@ getObjectDescription(const ObjectAddress *object, bool missing_ok)
                    textual form of the operator with arguments.  */
                 appendStringInfo(&buffer, _("operator %d (%s, %s) of %s: %s"),
                                  amopForm->amopstrategy,
-                                 format_type_be(amopForm->amoplefttype),
-                                 format_type_be(amopForm->amoprighttype),
+                                 format_type_extended(amopForm->amoplefttype,
+                                                      -1, FORMAT_TYPE_ALLOW_INVALID),
+                                 format_type_extended(amopForm->amoprighttype,
+                                                      -1, FORMAT_TYPE_ALLOW_INVALID),
                                  opfam.data,
                                  format_operator(amopForm->amopopr));

@@ -3292,6 +3300,12 @@ getObjectDescription(const ObjectAddress *object, bool missing_ok)
                 initStringInfo(&opfam);
                 getOpFamilyDescription(&opfam, amprocForm->amprocfamily, false);

+                /*
+                 * We use FORMAT_TYPE_ALLOW_INVALID here so as not to fail
+                 * completely if the type links are dangling, which is a form
+                 * of catalog corruption that could occur due to old bugs.
+                 */
+
                 /*------
                    translator: %d is the function number, the first two %s's
                    are data type names, the third %s is the description of the
@@ -3299,8 +3313,10 @@ getObjectDescription(const ObjectAddress *object, bool missing_ok)
                    function with arguments.  */
                 appendStringInfo(&buffer, _("function %d (%s, %s) of %s: %s"),
                                  amprocForm->amprocnum,
-                                 format_type_be(amprocForm->amproclefttype),
-                                 format_type_be(amprocForm->amprocrighttype),
+                                 format_type_extended(amprocForm->amproclefttype,
+                                                      -1, FORMAT_TYPE_ALLOW_INVALID),
+                                 format_type_extended(amprocForm->amprocrighttype,
+                                                      -1, FORMAT_TYPE_ALLOW_INVALID),
                                  opfam.data,
                                  format_procedure(amprocForm->amproc));

diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index b8b5c147c5..d024c547cc 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -65,6 +65,7 @@ static void storeOperators(List *opfamilyname, Oid amoid, Oid opfamilyoid,
                            List *operators, bool isAdd);
 static void storeProcedures(List *opfamilyname, Oid amoid, Oid opfamilyoid,
                             List *procedures, bool isAdd);
+static bool typeDepNeeded(Oid typid, OpFamilyMember *member);
 static void dropOperators(List *opfamilyname, Oid amoid, Oid opfamilyoid,
                           List *operators);
 static void dropProcedures(List *opfamilyname, Oid amoid, Oid opfamilyoid,
@@ -1507,6 +1508,29 @@ storeOperators(List *opfamilyname, Oid amoid, Oid opfamilyoid,
         recordDependencyOn(&myself, &referenced,
                            op->ref_is_hard ? DEPENDENCY_INTERNAL : DEPENDENCY_AUTO);

+        if (typeDepNeeded(op->lefttype, op))
+        {
+            referenced.classId = TypeRelationId;
+            referenced.objectId = op->lefttype;
+            referenced.objectSubId = 0;
+
+            /* see comments in amapi.h about dependency strength */
+            recordDependencyOn(&myself, &referenced,
+                               op->ref_is_hard ? DEPENDENCY_NORMAL : DEPENDENCY_AUTO);
+        }
+
+        if (op->lefttype != op->righttype &&
+            typeDepNeeded(op->righttype, op))
+        {
+            referenced.classId = TypeRelationId;
+            referenced.objectId = op->righttype;
+            referenced.objectSubId = 0;
+
+            /* see comments in amapi.h about dependency strength */
+            recordDependencyOn(&myself, &referenced,
+                               op->ref_is_hard ? DEPENDENCY_NORMAL : DEPENDENCY_AUTO);
+        }
+
         /* A search operator also needs a dep on the referenced opfamily */
         if (OidIsValid(op->sortfamily))
         {
@@ -1608,6 +1632,29 @@ storeProcedures(List *opfamilyname, Oid amoid, Oid opfamilyoid,
         recordDependencyOn(&myself, &referenced,
                            proc->ref_is_hard ? DEPENDENCY_INTERNAL : DEPENDENCY_AUTO);

+        if (typeDepNeeded(proc->lefttype, proc))
+        {
+            referenced.classId = TypeRelationId;
+            referenced.objectId = proc->lefttype;
+            referenced.objectSubId = 0;
+
+            /* see comments in amapi.h about dependency strength */
+            recordDependencyOn(&myself, &referenced,
+                               proc->ref_is_hard ? DEPENDENCY_NORMAL : DEPENDENCY_AUTO);
+        }
+
+        if (proc->lefttype != proc->righttype &&
+            typeDepNeeded(proc->righttype, proc))
+        {
+            referenced.classId = TypeRelationId;
+            referenced.objectId = proc->righttype;
+            referenced.objectSubId = 0;
+
+            /* see comments in amapi.h about dependency strength */
+            recordDependencyOn(&myself, &referenced,
+                               proc->ref_is_hard ? DEPENDENCY_NORMAL : DEPENDENCY_AUTO);
+        }
+
         /* Post create hook of access method procedure */
         InvokeObjectPostCreateHook(AccessMethodProcedureRelationId,
                                    entryoid, 0);
@@ -1616,6 +1663,57 @@ storeProcedures(List *opfamilyname, Oid amoid, Oid opfamilyoid,
     table_close(rel, RowExclusiveLock);
 }

+/*
+ * Detect whether a pg_amop or pg_amproc entry needs an explicit dependency
+ * on its lefttype or righttype.
+ *
+ * We make such a dependency unless the entry has an indirect dependency
+ * via its referenced operator or function.  That's nearly always true
+ * for operators, but might well not be true for support functions.
+ */
+static bool
+typeDepNeeded(Oid typid, OpFamilyMember *member)
+{
+    bool        result = true;
+
+    /*
+     * If the type is pinned, we don't need a dependency.  This is a bit of a
+     * layering violation perhaps (recordDependencyOn would ignore the request
+     * anyway), but it's a cheap test and will frequently save a syscache
+     * lookup here.
+     */
+    if (IsPinnedObject(TypeRelationId, typid))
+        return false;
+
+    /* Nope, so check the input types of the function or operator. */
+    if (member->is_func)
+    {
+        Oid           *argtypes;
+        int            nargs;
+
+        (void) get_func_signature(member->object, &argtypes, &nargs);
+        for (int i = 0; i < nargs; i++)
+        {
+            if (typid == argtypes[i])
+            {
+                result = false; /* match, no dependency needed */
+                break;
+            }
+        }
+        pfree(argtypes);
+    }
+    else
+    {
+        Oid            lefttype,
+                    righttype;
+
+        op_input_types(member->object, &lefttype, &righttype);
+        if (typid == lefttype || typid == righttype)
+            result = false;        /* match, no dependency needed */
+    }
+    return result;
+}
+

 /*
  * Remove operator entries from an opfamily.
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index c51de742ea..a4c0b43aa9 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -76,6 +76,10 @@ typedef enum IndexAMProperty
  * opfamily.  This allows ALTER OPERATOR FAMILY DROP, and causes that to
  * happen automatically if the operator or support func is dropped.  This
  * is the right behavior for inessential ("loose") objects.
+ *
+ * We also make dependencies on lefttype/righttype, of the same strength as
+ * the dependency on the operator or support func, unless these dependencies
+ * are redundant with the dependency on the operator or support func.
  */
 typedef struct OpFamilyMember
 {