Re: ALTER TABLE ADD COLUMN fast default - Mailing list pgsql-hackers

From Tom Lane
Subject Re: ALTER TABLE ADD COLUMN fast default
Date
Msg-id 171843.1617552324@sss.pgh.pa.us
Whole thread Raw
In response to Re: ALTER TABLE ADD COLUMN fast default  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: ALTER TABLE ADD COLUMN fast default  (Andrew Dunstan <andrew@dunslane.net>)
Re: ALTER TABLE ADD COLUMN fast default  (Zhihong Yu <zyu@yugabyte.com>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> On 4/3/21 10:09 PM, Tom Lane wrote:
>> Looking around at the other touches of AttrDefault.adbin in the backend
>> (of which there are not that many), some of them are prepared for it to be
>> NULL and some are not.  I don't immediately have a strong opinion whether
>> that should be an allowed state; but if it is not allowed then it's not
>> okay for AttrDefaultFetch to leave such a state behind.  Alternatively,
>> if it is allowed then equalTupleDesc is in the wrong.  We should choose
>> one definition and make all the relevant code match it.

> There's already a warning if it sets an explicit NULL value, so I'm
> inclined to say your suggestion "it's not okay for AttrDefaultFetch to
> leave such a state behind" is what we should go with.

Yeah, I came to the same conclusion after looking around a bit more.
There are two places in tupdesc.c that defend against adbin being NULL,
which seems a bit silly when their sibling equalTupleDesc does not.
Every other touch in the backend will segfault on a NULL value,
if it doesn't Assert first :-(

Another thing that annoyed me while I was looking at this is the
potentially O(N^2) behavior in equalTupleDesc due to not wanting
to assume that the AttrDefault arrays are in the same order.
It seems far smarter to have AttrDefaultFetch sort the arrays.

So attached is a proposed patch to clean all this up.

* Don't link the AttrDefault array into the relcache entry until
it's fully built and valid.

* elog, don't just Assert, if we fail to find an expected default
value.  (Perhaps there's a case for ereport instead.)

* Remove now-useless null checks in tupdesc.c.

* Sort the AttrDefault array, remove double loops in equalTupleDesc.

I made CheckConstraintFetch likewise not install its array until
it's done.  I notice that it is throwing elog(ERROR) not WARNING
for the equivalent cases of not finding the right number of
entries.  I wonder if we ought to back that off to a WARNING too.
elog(ERROR) during relcache entry load is kinda nasty, because
it prevents essentially *all* use of the relation.  On the other
hand, failing to enforce check constraints isn't lovely either.
Anybody have opinions about that?

            regards, tom lane

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index cb76465050..d81f6b8a60 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -174,10 +174,7 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
             cpy->defval = (AttrDefault *) palloc(cpy->num_defval * sizeof(AttrDefault));
             memcpy(cpy->defval, constr->defval, cpy->num_defval * sizeof(AttrDefault));
             for (i = cpy->num_defval - 1; i >= 0; i--)
-            {
-                if (constr->defval[i].adbin)
-                    cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin);
-            }
+                cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin);
         }

         if (constr->missing)
@@ -328,10 +325,7 @@ FreeTupleDesc(TupleDesc tupdesc)
             AttrDefault *attrdef = tupdesc->constr->defval;

             for (i = tupdesc->constr->num_defval - 1; i >= 0; i--)
-            {
-                if (attrdef[i].adbin)
-                    pfree(attrdef[i].adbin);
-            }
+                pfree(attrdef[i].adbin);
             pfree(attrdef);
         }
         if (tupdesc->constr->missing)
@@ -412,7 +406,6 @@ bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 {
     int            i,
-                j,
                 n;

     if (tupdesc1->natts != tupdesc2->natts)
@@ -488,22 +481,13 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
         n = constr1->num_defval;
         if (n != (int) constr2->num_defval)
             return false;
+        /* We assume here that both AttrDefault arrays are in adnum order */
         for (i = 0; i < n; i++)
         {
             AttrDefault *defval1 = constr1->defval + i;
-            AttrDefault *defval2 = constr2->defval;
-
-            /*
-             * We can't assume that the items are always read from the system
-             * catalogs in the same order; so use the adnum field to identify
-             * the matching item to compare.
-             */
-            for (j = 0; j < n; defval2++, j++)
-            {
-                if (defval1->adnum == defval2->adnum)
-                    break;
-            }
-            if (j >= n)
+            AttrDefault *defval2 = constr2->defval + i;
+
+            if (defval1->adnum != defval2->adnum)
                 return false;
             if (strcmp(defval1->adbin, defval2->adbin) != 0)
                 return false;
@@ -534,25 +518,21 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
         n = constr1->num_check;
         if (n != (int) constr2->num_check)
             return false;
+
+        /*
+         * Similarly, we rely here on the ConstrCheck entries being sorted by
+         * name.  If there are duplicate names, the outcome of the comparison
+         * is uncertain, but that should not happen.
+         */
         for (i = 0; i < n; i++)
         {
             ConstrCheck *check1 = constr1->check + i;
-            ConstrCheck *check2 = constr2->check;
-
-            /*
-             * Similarly, don't assume that the checks are always read in the
-             * same order; match them up by name and contents. (The name
-             * *should* be unique, but...)
-             */
-            for (j = 0; j < n; check2++, j++)
-            {
-                if (strcmp(check1->ccname, check2->ccname) == 0 &&
-                    strcmp(check1->ccbin, check2->ccbin) == 0 &&
-                    check1->ccvalid == check2->ccvalid &&
-                    check1->ccnoinherit == check2->ccnoinherit)
-                    break;
-            }
-            if (j >= n)
+            ConstrCheck *check2 = constr2->check + i;
+
+            if (!(strcmp(check1->ccname, check2->ccname) == 0 &&
+                  strcmp(check1->ccbin, check2->ccbin) == 0 &&
+                  check1->ccvalid == check2->ccvalid &&
+                  check1->ccnoinherit == check2->ccnoinherit))
                 return false;
         }
     }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 88a68a4697..87f9bdaef0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2501,21 +2501,24 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
             if (attribute->atthasdef)
             {
                 Node       *this_default = NULL;
-                AttrDefault *attrdef;
-                int            i;

                 /* Find default in constraint structure */
-                Assert(constr != NULL);
-                attrdef = constr->defval;
-                for (i = 0; i < constr->num_defval; i++)
+                if (constr != NULL)
                 {
-                    if (attrdef[i].adnum == parent_attno)
+                    AttrDefault *attrdef = constr->defval;
+
+                    for (int i = 0; i < constr->num_defval; i++)
                     {
-                        this_default = stringToNode(attrdef[i].adbin);
-                        break;
+                        if (attrdef[i].adnum == parent_attno)
+                        {
+                            this_default = stringToNode(attrdef[i].adbin);
+                            break;
+                        }
                     }
                 }
-                Assert(this_default != NULL);
+                if (this_default == NULL)
+                    elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                         parent_attno, RelationGetRelationName(relation));

                 /*
                  * If it's a GENERATED default, it might contain Vars that
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index b968c25dd6..2c57b200f8 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1276,7 +1276,9 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause)
                         break;
                     }
                 }
-                Assert(this_default != NULL);
+                if (this_default == NULL)
+                    elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                         parent_attno, RelationGetRelationName(relation));

                 atsubcmd = makeNode(AlterTableCmd);
                 atsubcmd->subtype = AT_CookedColumnDefault;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 92661abae2..531edaacdb 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1247,6 +1247,9 @@ build_column_default(Relation rel, int attrno)
                 break;
             }
         }
+        if (expr == NULL)
+            elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                 attrno, RelationGetRelationName(rel));
     }

     /*
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index ff7395c85b..b05026f1c5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -282,7 +282,8 @@ static void RelationInitPhysicalAddr(Relation relation);
 static void load_critical_index(Oid indexoid, Oid heapoid);
 static TupleDesc GetPgClassDescriptor(void);
 static TupleDesc GetPgIndexDescriptor(void);
-static void AttrDefaultFetch(Relation relation);
+static void AttrDefaultFetch(Relation relation, int ndef);
+static int    AttrDefaultCmp(const void *a, const void *b);
 static void CheckConstraintFetch(Relation relation);
 static int    CheckConstraintCmp(const void *a, const void *b);
 static void InitIndexAmRoutine(Relation relation);
@@ -503,7 +504,6 @@ RelationBuildTupleDesc(Relation relation)
     ScanKeyData skey[2];
     int            need;
     TupleConstr *constr;
-    AttrDefault *attrdef = NULL;
     AttrMissing *attrmiss = NULL;
     int            ndef = 0;

@@ -512,8 +512,8 @@ RelationBuildTupleDesc(Relation relation)
         relation->rd_rel->reltype ? relation->rd_rel->reltype : RECORDOID;
     relation->rd_att->tdtypmod = -1;    /* just to be sure */

-    constr = (TupleConstr *) MemoryContextAlloc(CacheMemoryContext,
-                                                sizeof(TupleConstr));
+    constr = (TupleConstr *) MemoryContextAllocZero(CacheMemoryContext,
+                                                    sizeof(TupleConstr));
     constr->has_not_null = false;
     constr->has_generated_stored = false;

@@ -560,7 +560,6 @@ RelationBuildTupleDesc(Relation relation)
             elog(ERROR, "invalid attribute number %d for %s",
                  attp->attnum, RelationGetRelationName(relation));

-
         memcpy(TupleDescAttr(relation->rd_att, attnum - 1),
                attp,
                ATTRIBUTE_FIXED_PART_SIZE);
@@ -570,22 +569,10 @@ RelationBuildTupleDesc(Relation relation)
             constr->has_not_null = true;
         if (attp->attgenerated == ATTRIBUTE_GENERATED_STORED)
             constr->has_generated_stored = true;
-
-        /* If the column has a default, fill it into the attrdef array */
         if (attp->atthasdef)
-        {
-            if (attrdef == NULL)
-                attrdef = (AttrDefault *)
-                    MemoryContextAllocZero(CacheMemoryContext,
-                                           RelationGetNumberOfAttributes(relation) *
-                                           sizeof(AttrDefault));
-            attrdef[ndef].adnum = attnum;
-            attrdef[ndef].adbin = NULL;
-
             ndef++;
-        }

-        /* Likewise for a missing value */
+        /* If the column has a "missing" value, put it in the attrmiss array */
         if (attp->atthasmissing)
         {
             Datum        missingval;
@@ -680,33 +667,19 @@ RelationBuildTupleDesc(Relation relation)
         constr->has_generated_stored ||
         ndef > 0 ||
         attrmiss ||
-        relation->rd_rel->relchecks)
+        relation->rd_rel->relchecks > 0)
     {
         relation->rd_att->constr = constr;

         if (ndef > 0)            /* DEFAULTs */
-        {
-            if (ndef < RelationGetNumberOfAttributes(relation))
-                constr->defval = (AttrDefault *)
-                    repalloc(attrdef, ndef * sizeof(AttrDefault));
-            else
-                constr->defval = attrdef;
-            constr->num_defval = ndef;
-            AttrDefaultFetch(relation);
-        }
+            AttrDefaultFetch(relation, ndef);
         else
             constr->num_defval = 0;

         constr->missing = attrmiss;

         if (relation->rd_rel->relchecks > 0)    /* CHECKs */
-        {
-            constr->num_check = relation->rd_rel->relchecks;
-            constr->check = (ConstrCheck *)
-                MemoryContextAllocZero(CacheMemoryContext,
-                                       constr->num_check * sizeof(ConstrCheck));
             CheckConstraintFetch(relation);
-        }
         else
             constr->num_check = 0;
     }
@@ -4251,21 +4224,28 @@ GetPgIndexDescriptor(void)

 /*
  * Load any default attribute value definitions for the relation.
+ *
+ * ndef is the number of attributes that were marked atthasdef.
+ *
+ * Note: we don't make it a hard error to be missing some pg_attrdef records.
+ * We can limp along as long as no INSERT or UPDATE needs to use the entry.
  */
 static void
-AttrDefaultFetch(Relation relation)
+AttrDefaultFetch(Relation relation, int ndef)
 {
-    AttrDefault *attrdef = relation->rd_att->constr->defval;
-    int            ndef = relation->rd_att->constr->num_defval;
+    AttrDefault *attrdef;
     Relation    adrel;
     SysScanDesc adscan;
     ScanKeyData skey;
     HeapTuple    htup;
-    Datum        val;
-    bool        isnull;
-    int            found;
-    int            i;
+    int            found = 0;
+
+    /* Allocate array with room for as many entries as expected */
+    attrdef = (AttrDefault *)
+        MemoryContextAllocZero(CacheMemoryContext,
+                               ndef * sizeof(AttrDefault));

+    /* Search pg_attrdef for relevant entries */
     ScanKeyInit(&skey,
                 Anum_pg_attrdef_adrelid,
                 BTEqualStrategyNumber, F_OIDEQ,
@@ -4274,49 +4254,69 @@ AttrDefaultFetch(Relation relation)
     adrel = table_open(AttrDefaultRelationId, AccessShareLock);
     adscan = systable_beginscan(adrel, AttrDefaultIndexId, true,
                                 NULL, 1, &skey);
-    found = 0;

     while (HeapTupleIsValid(htup = systable_getnext(adscan)))
     {
         Form_pg_attrdef adform = (Form_pg_attrdef) GETSTRUCT(htup);
-        Form_pg_attribute attr = TupleDescAttr(relation->rd_att, adform->adnum - 1);
+        Datum        val;
+        bool        isnull;

-        for (i = 0; i < ndef; i++)
+        /* protect limited size of array */
+        if (found >= ndef)
         {
-            if (adform->adnum != attrdef[i].adnum)
-                continue;
-            if (attrdef[i].adbin != NULL)
-                elog(WARNING, "multiple attrdef records found for attr %s of rel %s",
-                     NameStr(attr->attname),
-                     RelationGetRelationName(relation));
-            else
-                found++;
-
-            val = fastgetattr(htup,
-                              Anum_pg_attrdef_adbin,
-                              adrel->rd_att, &isnull);
-            if (isnull)
-                elog(WARNING, "null adbin for attr %s of rel %s",
-                     NameStr(attr->attname),
-                     RelationGetRelationName(relation));
-            else
-            {
-                /* detoast and convert to cstring in caller's context */
-                char       *s = TextDatumGetCString(val);
-
-                attrdef[i].adbin = MemoryContextStrdup(CacheMemoryContext, s);
-                pfree(s);
-            }
+            elog(WARNING, "unexpected attrdef record found for attr %d of rel %s",
+                 adform->adnum, RelationGetRelationName(relation));
             break;
         }

-        if (i >= ndef)
-            elog(WARNING, "unexpected attrdef record found for attr %d of rel %s",
+        val = fastgetattr(htup,
+                          Anum_pg_attrdef_adbin,
+                          adrel->rd_att, &isnull);
+        if (isnull)
+            elog(WARNING, "null adbin for attr %d of rel %s",
                  adform->adnum, RelationGetRelationName(relation));
+        else
+        {
+            /* detoast and convert to cstring in caller's context */
+            char       *s = TextDatumGetCString(val);
+
+            attrdef[found].adnum = adform->adnum;
+            attrdef[found].adbin = MemoryContextStrdup(CacheMemoryContext, s);
+            found++;
+            pfree(s);
+        }
     }

     systable_endscan(adscan);
     table_close(adrel, AccessShareLock);
+
+    if (found != ndef)
+        elog(WARNING, "%d attrdef record(s) missing for rel %s",
+             ndef - found, RelationGetRelationName(relation));
+
+    /*
+     * Sort the AttrDefault entries by adnum, for the convenience of
+     * equalTupleDescs().  (Usually, they already will be in order, but this
+     * might not be so if systable_getnext isn't using an index.)
+     */
+    if (found > 1)
+        qsort(attrdef, found, sizeof(AttrDefault), AttrDefaultCmp);
+
+    /* Install array only after it's fully valid */
+    relation->rd_att->constr->defval = attrdef;
+    relation->rd_att->constr->num_defval = found;
+}
+
+/*
+ * qsort comparator to sort AttrDefault entries by adnum
+ */
+static int
+AttrDefaultCmp(const void *a, const void *b)
+{
+    const AttrDefault *ada = (const AttrDefault *) a;
+    const AttrDefault *adb = (const AttrDefault *) b;
+
+    return ada->adnum - adb->adnum;
 }

 /*
@@ -4325,14 +4325,20 @@ AttrDefaultFetch(Relation relation)
 static void
 CheckConstraintFetch(Relation relation)
 {
-    ConstrCheck *check = relation->rd_att->constr->check;
-    int            ncheck = relation->rd_att->constr->num_check;
+    ConstrCheck *check;
+    int            ncheck = relation->rd_rel->relchecks;
     Relation    conrel;
     SysScanDesc conscan;
     ScanKeyData skey[1];
     HeapTuple    htup;
     int            found = 0;

+    /* Allocate array with room for as many entries as expected */
+    check = (ConstrCheck *)
+        MemoryContextAllocZero(CacheMemoryContext,
+                               ncheck * sizeof(ConstrCheck));
+
+    /* Search pg_constraint for relevant entries */
     ScanKeyInit(&skey[0],
                 Anum_pg_constraint_conrelid,
                 BTEqualStrategyNumber, F_OIDEQ,
@@ -4353,6 +4359,7 @@ CheckConstraintFetch(Relation relation)
         if (conform->contype != CONSTRAINT_CHECK)
             continue;

+        /* protect limited size of array */
         if (found >= ncheck)
             elog(ERROR, "unexpected constraint record found for rel %s",
                  RelationGetRelationName(relation));
@@ -4385,9 +4392,16 @@ CheckConstraintFetch(Relation relation)
         elog(ERROR, "%d constraint record(s) missing for rel %s",
              ncheck - found, RelationGetRelationName(relation));

-    /* Sort the records so that CHECKs are applied in a deterministic order */
-    if (ncheck > 1)
-        qsort(check, ncheck, sizeof(ConstrCheck), CheckConstraintCmp);
+    /*
+     * Sort the records by name.  This ensures that CHECKs are applied in a
+     * deterministic order, and it also makes equalTupleDescs() faster.
+     */
+    if (found > 1)
+        qsort(check, found, sizeof(ConstrCheck), CheckConstraintCmp);
+
+    /* Install array only after it's fully valid */
+    relation->rd_att->constr->check = check;
+    relation->rd_att->constr->num_check = found;
 }

 /*

pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: Unused variable found in AttrDefaultFetch
Next
From: Tom Lane
Date:
Subject: Re: Unused variable found in AttrDefaultFetch