Thread: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

From
Hadi Moshayedi
Date:
It seems that sometimes when DELETE cascades to referencing tables we fail to acquire locks on replica identity index.

To reproduce, set wal_level to logical, and run 1.sql.

I can look into this, but I thought first I should send it here in case someone who is more familiar with these related functions can solve it quickly.

I get the following backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f301154b801 in __GI_abort () at abort.c:79
#2  0x000055df8858a923 in ExceptionalCondition (
    conditionName=conditionName@entry=0x55df885fd138 "!(CheckRelationLockedByMe(idx_rel, 1, 1))", errorType=errorType@entry=0x55df885de8fd "FailedAssertion",
    fileName=fileName@entry=0x55df885fca32 "heapam.c", lineNumber=lineNumber@entry=7646)
    at assert.c:54
#3  0x000055df88165e53 in ExtractReplicaIdentity (relation=relation@entry=0x7f3012b54db0,
    tp=tp@entry=0x7ffcf47d53f0, key_changed=key_changed@entry=true,
    copy=copy@entry=0x7ffcf47d53d3) at heapam.c:7646
#4  0x000055df8816c22b in heap_delete (relation=0x7f3012b54db0, tid=<optimized out>,
    cid=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7ffcf47d54b0, changingPart=false)
    at heapam.c:2676
#5  0x000055df88318b62 in table_tuple_delete (changingPart=false, tmfd=0x7ffcf47d54b0,
    wait=true, crosscheck=<optimized out>, snapshot=<optimized out>, cid=<optimized out>,
    tid=0x7ffcf47d558a, rel=0x7f3012b54db0) at ../../../src/include/access/tableam.h:1216
#6  ExecDelete (mtstate=mtstate@entry=0x55df8a8196a0, tupleid=0x7ffcf47d558a, oldtuple=0x0,
    planSlot=planSlot@entry=0x55df8a81a8e8, epqstate=epqstate@entry=0x55df8a819798,
    estate=estate@entry=0x55df8a819058, processReturning=true, canSetTag=true,
    changingPart=false, tupleDeleted=0x0, epqreturnslot=0x0) at nodeModifyTable.c:769
#7  0x000055df8831aa25 in ExecModifyTable (pstate=0x55df8a8196a0) at nodeModifyTable.c:2230
#8  0x000055df882efa9a in ExecProcNode (node=0x55df8a8196a0)
    at ../../../src/include/executor/executor.h:239
#9  ExecutePlan (execute_once=<optimized out>, dest=0x55df88a89a00 <spi_printtupDR>,
    direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
    operation=CMD_DELETE, use_parallel_mode=<optimized out>, planstate=0x55df8a8196a0,
    estate=0x55df8a819058) at execMain.c:1648
#10 standard_ExecutorRun (queryDesc=0x55df8a7de4b0, direction=<optimized out>, count=0,
    execute_once=<optimized out>) at execMain.c:365
#11 0x000055df8832b90c in _SPI_pquery (tcount=0, fire_triggers=false,
    queryDesc=0x55df8a7de4b0) at spi.c:2521
#12 _SPI_execute_plan (plan=plan@entry=0x55df8a812828, paramLI=<optimized out>,
    snapshot=snapshot@entry=0x0, crosscheck_snapshot=crosscheck_snapshot@entry=0x0,
    read_only=read_only@entry=false, fire_triggers=fire_triggers@entry=false,
    tcount=<optimized out>) at spi.c:2296
#13 0x000055df8832c15c in SPI_execute_snapshot (plan=plan@entry=0x55df8a812828,
    Values=Values@entry=0x7ffcf47d5820, Nulls=Nulls@entry=0x7ffcf47d5a20 " ",
    snapshot=snapshot@entry=0x0, crosscheck_snapshot=crosscheck_snapshot@entry=0x0,
    read_only=read_only@entry=false, fire_triggers=false, tcount=0) at spi.c:616
#14 0x000055df88522f32 in ri_PerformCheck (riinfo=riinfo@entry=0x55df8a7f8050,
    qkey=qkey@entry=0x7ffcf47d5b28, qplan=0x55df8a812828,
    fk_rel=fk_rel@entry=0x7f3012b54db0, pk_rel=pk_rel@entry=0x7f3012b44a28,
    oldslot=oldslot@entry=0x55df8a826f88, newslot=0x0, detectNewRows=true, expect_OK=8)
    at ri_triggers.c:2276
#15 0x000055df88524653 in RI_FKey_cascade_del (fcinfo=<optimized out>) at ri_triggers.c:819
#16 0x000055df882c9996 in ExecCallTriggerFunc (trigdata=trigdata@entry=0x7ffcf47d5ff0,
    tgindx=tgindx@entry=0, finfo=finfo@entry=0x55df8a825710, instr=instr@entry=0x0,
    per_tuple_context=per_tuple_context@entry=0x55df8a812f10) at trigger.c:2432
#17 0x000055df882cb459 in AfterTriggerExecute (trigdesc=0x55df8a825530,
    trigdesc=0x55df8a825530, trig_tuple_slot2=0x0, trig_tuple_slot1=0x0,
    per_tuple_context=0x55df8a812f10, instr=0x0, finfo=0x55df8a825710,
    relInfo=0x55df8a825418, event=0x55df8a81f0a8, estate=0x55df8a825188) at trigger.c:4342
#18 afterTriggerInvokeEvents (events=events@entry=0x55df8a7c3e40, firing_id=1,
    estate=estate@entry=0x55df8a825188, delete_ok=delete_ok@entry=false) at trigger.c:4539
#19 0x000055df882d1408 in AfterTriggerEndQuery (estate=estate@entry=0x55df8a825188)
    at trigger.c:4850
#20 0x000055df882efd99 in standard_ExecutorFinish (queryDesc=0x55df8a722ab8)
    at execMain.c:440
#21 0x000055df88464bdd in ProcessQuery (plan=<optimized out>,
    sourceText=0x55df8a702f78 "DELETE FROM t1 RETURNING id;", params=0x0, queryEnv=0x0,
    dest=0x55df8a722a20, completionTag=0x7ffcf47d6180 "DELETE 11") at pquery.c:203
#22 0x000055df88464e0b in PortalRunMulti (portal=portal@entry=0x55df8a7692f8,
    isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=true,
    dest=dest@entry=0x55df8a722a20, altdest=0x55df88a81040 <donothingDR>,
    completionTag=completionTag@entry=0x7ffcf47d6180 "DELETE 11") at pquery.c:1283
#23 0x000055df88465119 in FillPortalStore (portal=portal@entry=0x55df8a7692f8,
    isTopLevel=isTopLevel@entry=true) at pquery.c:1030
#24 0x000055df88465d1d in PortalRun (portal=portal@entry=0x55df8a7692f8,
    count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
    run_once=run_once@entry=true, dest=dest@entry=0x55df8a7ddb08,
    altdest=altdest@entry=0x55df8a7ddb08, completionTag=0x7ffcf47d63b0 "") at pquery.c:765
#25 0x000055df88461512 in exec_simple_query (
    query_string=0x55df8a702f78 "DELETE FROM t1 RETURNING id;") at postgres.c:1215
#26 0x000055df8846344e in PostgresMain (argc=<optimized out>,
    argv=argv@entry=0x55df8a72d4b0, dbname=<optimized out>, username=<optimized out>)
    at postgres.c:4236
#27 0x000055df8811906d in BackendRun (port=0x55df8a7234d0, port=0x55df8a7234d0)
    at postmaster.c:4431
#28 BackendStartup (port=0x55df8a7234d0) at postmaster.c:4122
#29 ServerLoop () at postmaster.c:1704
#30 0x000055df883dc53e in PostmasterMain (argc=3, argv=0x55df8a6fb7c0) at postmaster.c:1377
#31 0x000055df8811ad5f in main (argc=3, argv=0x55df8a6fb7c0) at main.c:228

Attachment

Re: REL_12_STABLE crashing with assertion failure inExtractReplicaIdentity

From
Andres Freund
Date:
Hi,

On 2019-08-16 09:44:15 -0700, Hadi Moshayedi wrote:
> It seems that sometimes when DELETE cascades to referencing tables we fail
> to acquire locks on replica identity index.
> 
> To reproduce, set wal_level to logical, and run 1.sql.
> 
> I can look into this, but I thought first I should send it here in case
> someone who is more familiar with these related functions can solve it
> quickly.

I suspect this "always" has been broken, it's just that we previously
didn't have checks in place that detect these cases. I don't think it's
likely to cause actual harm, due to the locking on the table itself when
dropping indexes etc.  But we still should fix it.

The relevant code is:

        /*
         * If there are indices on the result relation, open them and save
         * descriptors in the result relation info, so that we can add new
         * index entries for the tuples we add/update.  We need not do this
         * for a DELETE, however, since deletion doesn't affect indexes. Also,
         * inside an EvalPlanQual operation, the indexes might be open
         * already, since we share the resultrel state with the original
         * query.
         */
        if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
            operation != CMD_DELETE &&
            resultRelInfo->ri_IndexRelationDescs == NULL)
            ExecOpenIndices(resultRelInfo,
                            node->onConflictAction != ONCONFLICT_NONE);


I'm not quite sure what the best way to fix this would be however. It
seems like a bad idea to make locking dependent on wal_level, but I'm
also not sure we want to incur the price of locking one more table to
every delete on a table with a primary key?


Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> On 2019-08-16 09:44:15 -0700, Hadi Moshayedi wrote:
>> It seems that sometimes when DELETE cascades to referencing tables we fail
>> to acquire locks on replica identity index.

> I suspect this "always" has been broken, it's just that we previously
> didn't have checks in place that detect these cases. I don't think it's
> likely to cause actual harm, due to the locking on the table itself when
> dropping indexes etc.  But we still should fix it.

Yeah ... see the discussion leading up to 9c703c169,

https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us

We didn't pull the trigger on removing the CMD_DELETE exception here,
but I think the handwriting has been on the wall for some time.

            regards, tom lane



Re: REL_12_STABLE crashing with assertion failure inExtractReplicaIdentity

From
Andres Freund
Date:
Hi,

On 2019-08-17 01:43:45 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-08-16 09:44:15 -0700, Hadi Moshayedi wrote:
> >> It seems that sometimes when DELETE cascades to referencing tables we fail
> >> to acquire locks on replica identity index.
>
> > I suspect this "always" has been broken, it's just that we previously
> > didn't have checks in place that detect these cases. I don't think it's
> > likely to cause actual harm, due to the locking on the table itself when
> > dropping indexes etc.  But we still should fix it.
>
> Yeah ... see the discussion leading up to 9c703c169,
>
> https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us
>
> We didn't pull the trigger on removing the CMD_DELETE exception here,
> but I think the handwriting has been on the wall for some time.

ISTM there's a few different options here:

1a) We build all index infos, unconditionally. As argued in the thread
    you reference, future tableams may eventually require that anyway,
    by doing more proactive index maintenance somehow. Currently there's
    however no support for such AMs via tableam (mostly because I wasn't
    sure how exactly that'd look, and none of the already in-development
    AMs needed it).

2a) We separate acquisition of index locks from ExecOpenIndices(), and
    acquire index locks even for CMD_DELETE. Do so either during
    executor startup, or as part of AcquireExecutorLocks() (the latter
    on the basis that parsing/planning would have required the locks
    already).

There's also corresponding *b) options, where we only do additional work
for CMD_DELETE if wal_level = logical, and the table has a replica
identity requiring use of the index during deleteions. But I think
that's clearly enough a bad idea that we can just dismiss it out of
hand.


3)  Remove the CheckRelationLockedByMe() assert from
    ExtractReplicaIdentity(), at least for 12. I don't think this is an
    all that convicing option, but it'd reduce churn relatively late in
    beta.

4)  Add a index_open(RowExclusiveLock) to ExtractReplicaIdentity(). That
    seems very unconvincing to me, because we'd do so for every row.


I think there's some appeal in going towards 2), because batching lock
acquisition into a more central place has the chance to yield some
speedups on its own, but more importantly would allow for batched
operations one day. Centralizing lock acquisition also seems like it
might make things easier to understand than today, where a lot of
different parts of the system acquire the locks, even just for
execution.  But it also seems to be likely too invasive for 12 - making
me think that 1a) is the way to go for now.

Comments?

Greetings,

Andres Freund



Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2019-08-17 01:43:45 -0400, Tom Lane wrote:
>> Yeah ... see the discussion leading up to 9c703c169,
>> https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us
>> We didn't pull the trigger on removing the CMD_DELETE exception here,
>> but I think the handwriting has been on the wall for some time.

> ISTM there's a few different options here:

> 1a) We build all index infos, unconditionally. As argued in the thread
>     you reference, future tableams may eventually require that anyway,
>     by doing more proactive index maintenance somehow. Currently there's
>     however no support for such AMs via tableam (mostly because I wasn't
>     sure how exactly that'd look, and none of the already in-development
>     AMs needed it).
> 2a) We separate acquisition of index locks from ExecOpenIndices(), and
>     acquire index locks even for CMD_DELETE. Do so either during
>     executor startup, or as part of AcquireExecutorLocks() (the latter
>     on the basis that parsing/planning would have required the locks
>     already).
> There's also corresponding *b) options, where we only do additional work
> for CMD_DELETE if wal_level = logical, and the table has a replica
> identity requiring use of the index during deleteions. But I think
> that's clearly enough a bad idea that we can just dismiss it out of
> hand.
> 3)  Remove the CheckRelationLockedByMe() assert from
>     ExtractReplicaIdentity(), at least for 12. I don't think this is an
>     all that convicing option, but it'd reduce churn relatively late in
>     beta.
> 4)  Add a index_open(RowExclusiveLock) to ExtractReplicaIdentity(). That
>     seems very unconvincing to me, because we'd do so for every row.

As far as 4) goes, I think the code in ExtractReplicaIdentity is pretty
duff anyway, because it doesn't bother to check for the defined failure
return for RelationIdGetRelation.  But if we're concerned about the
cost of recalculating this stuff per-row, couldn't we cache it a little
better?  It should be safe to assume the set of index columns isn't
changing intra-query.

... in fact, isn't all the infrastructure for that present already?
Why is this code looking directly at the index at all, rather than
using the relcache's rd_idattr bitmap?

I suspect we'll have to do 1a) eventually anyway, but this particular
problem seems like it has a better solution.  Will try to produce a
patch in a bit.

            regards, tom lane



Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

From
Tom Lane
Date:
I wrote:
> As far as 4) goes, I think the code in ExtractReplicaIdentity is pretty
> duff anyway, because it doesn't bother to check for the defined failure
> return for RelationIdGetRelation.  But if we're concerned about the
> cost of recalculating this stuff per-row, couldn't we cache it a little
> better?  It should be safe to assume the set of index columns isn't
> changing intra-query.
> ... in fact, isn't all the infrastructure for that present already?
> Why is this code looking directly at the index at all, rather than
> using the relcache's rd_idattr bitmap?

Here's a proposed patch along those lines.  It fixes Hadi's original
crash case and passes check-world.

I'm a bit suspicious of the exclusion for idattrs being empty, but
if I remove that, some of the contrib/test_decoding test results
change.  Anybody want to comment on that?  If that's actually an
expected situation, why is there an elog(DEBUG) in that path?

            regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb811d3..8a3c7dc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7594,18 +7594,20 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
  *
  * Returns NULL if there's no need to log an identity or if there's no suitable
  * key in the Relation relation.
+ *
+ * *copy is set to true if the returned tuple is a modified copy rather than
+ * the same tuple that was passed in.
  */
 static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *copy)
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+                       bool *copy)
 {
     TupleDesc    desc = RelationGetDescr(relation);
-    Oid            replidindex;
-    Relation    idx_rel;
     char        replident = relation->rd_rel->relreplident;
-    HeapTuple    key_tuple = NULL;
+    Bitmapset  *idattrs;
+    HeapTuple    key_tuple;
     bool        nulls[MaxHeapAttributeNumber];
     Datum        values[MaxHeapAttributeNumber];
-    int            natt;

     *copy = false;

@@ -7624,7 +7626,7 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
         if (HeapTupleHasExternal(tp))
         {
             *copy = true;
-            tp = toast_flatten_tuple(tp, RelationGetDescr(relation));
+            tp = toast_flatten_tuple(tp, desc);
         }
         return tp;
     }
@@ -7633,41 +7635,37 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
     if (!key_changed)
         return NULL;

-    /* find the replica identity index */
-    replidindex = RelationGetReplicaIndex(relation);
-    if (!OidIsValid(replidindex))
+    /* find out the replica identity columns */
+    idattrs = RelationGetIndexAttrBitmap(relation,
+                                         INDEX_ATTR_BITMAP_IDENTITY_KEY);
+
+    if (bms_is_empty(idattrs))
     {
         elog(DEBUG4, "could not find configured replica identity for table \"%s\"",
              RelationGetRelationName(relation));
         return NULL;
     }

-    idx_rel = RelationIdGetRelation(replidindex);
-
-    Assert(CheckRelationLockedByMe(idx_rel, AccessShareLock, true));
-
-    /* deform tuple, so we have fast access to columns */
-    heap_deform_tuple(tp, desc, values, nulls);
-
-    /* set all columns to NULL, regardless of whether they actually are */
-    memset(nulls, 1, sizeof(nulls));
-
     /*
-     * Now set all columns contained in the index to NOT NULL, they cannot
-     * currently be NULL.
+     * Construct a new tuple containing only the replica identity columns,
+     * with nulls elsewhere.  While we're at it, assert that the replica
+     * identity columns aren't null.
      */
-    for (natt = 0; natt < IndexRelationGetNumberOfKeyAttributes(idx_rel); natt++)
-    {
-        int            attno = idx_rel->rd_index->indkey.values[natt];
+    heap_deform_tuple(tp, desc, values, nulls);

-        if (attno < 0)
-            elog(ERROR, "system column in index");
-        nulls[attno - 1] = false;
+    for (int i = 0; i < desc->natts; i++)
+    {
+        if (bms_is_member(i + 1 - FirstLowInvalidHeapAttributeNumber,
+                          idattrs))
+            Assert(!nulls[i]);
+        else
+            nulls[i] = true;
     }

     key_tuple = heap_form_tuple(desc, values, nulls);
     *copy = true;
-    RelationClose(idx_rel);
+
+    bms_free(idattrs);

     /*
      * If the tuple, which by here only contains indexed columns, still has
@@ -7680,7 +7678,7 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
     {
         HeapTuple    oldtup = key_tuple;

-        key_tuple = toast_flatten_tuple(oldtup, RelationGetDescr(relation));
+        key_tuple = toast_flatten_tuple(oldtup, desc);
         heap_freetuple(oldtup);
     }


Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

From
Amit Langote
Date:
On Mon, Sep 2, 2019 at 6:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > As far as 4) goes, I think the code in ExtractReplicaIdentity is pretty
> > duff anyway, because it doesn't bother to check for the defined failure
> > return for RelationIdGetRelation.  But if we're concerned about the
> > cost of recalculating this stuff per-row, couldn't we cache it a little
> > better?  It should be safe to assume the set of index columns isn't
> > changing intra-query.
> > ... in fact, isn't all the infrastructure for that present already?
> > Why is this code looking directly at the index at all, rather than
> > using the relcache's rd_idattr bitmap?
>
> Here's a proposed patch along those lines.  It fixes Hadi's original
> crash case and passes check-world.

Agree that this patch would be a better solution for Hadi's report,
although I also agree that the situation with index locking for DELETE
isn't perfect.

> I'm a bit suspicious of the exclusion for idattrs being empty, but
> if I remove that, some of the contrib/test_decoding test results
> change.  Anybody want to comment on that?  If that's actually an
> expected situation, why is there an elog(DEBUG) in that path?

ISTM that the exclusion case may occur with the table's replica
identity being REPLICA_IDENTITY_DEFAULT and there being no primary
index defined, in which case nothing needs to get logged.

The elog(DEBUG) may just be a remnant from the days when this was
being developed.  I couldn't find any notes on it though in the
archives [1] though.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/flat/20131204155510.GO24801%40awork2.anarazel.de



Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Mon, Sep 2, 2019 at 6:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Here's a proposed patch along those lines.  It fixes Hadi's original
>> crash case and passes check-world.

> Agree that this patch would be a better solution for Hadi's report,
> although I also agree that the situation with index locking for DELETE
> isn't perfect.

Thanks for checking!

>> I'm a bit suspicious of the exclusion for idattrs being empty, but
>> if I remove that, some of the contrib/test_decoding test results
>> change.  Anybody want to comment on that?  If that's actually an
>> expected situation, why is there an elog(DEBUG) in that path?

> ISTM that the exclusion case may occur with the table's replica
> identity being REPLICA_IDENTITY_DEFAULT and there being no primary
> index defined, in which case nothing needs to get logged.

Looking more closely, the case is unreachable in the heap_update
path because key_changed will necessarily be false if the idattrs
set is empty.  But it is reachable in heap_delete because that
just passes key_changed = constant true, whether or not there's
any defined replica identity.  In view of that, I think
we should just remove the elog(DEBUG) ... and maybe add a comment
explaining this.

            regards, tom lane



Re: REL_12_STABLE crashing with assertion failure inExtractReplicaIdentity

From
Andres Freund
Date:
Hi,

On 2019-09-01 16:50:00 -0400, Tom Lane wrote:
> As far as 4) goes, I think the code in ExtractReplicaIdentity is pretty
> duff anyway, because it doesn't bother to check for the defined failure
> return for RelationIdGetRelation.  But if we're concerned about the
> cost of recalculating this stuff per-row, couldn't we cache it a little
> better?  It should be safe to assume the set of index columns isn't
> changing intra-query.

I agree that it ought to be more efficent - but also about as equally
safe? I.e. if the previous code wasn't safe, the new code wouldn't be
safe either? As in, we're "just" avoiding the assert, but not increasing
safety?


> ... in fact, isn't all the infrastructure for that present already?
> Why is this code looking directly at the index at all, rather than
> using the relcache's rd_idattr bitmap?

No idea, that's too long ago :(


> I'm a bit suspicious of the exclusion for idattrs being empty, but
> if I remove that, some of the contrib/test_decoding test results
> change.  Anybody want to comment on that?  If that's actually an
> expected situation, why is there an elog(DEBUG) in that path?

I think Amit's explanation here is probably accurate.

Greetings,

Andres Freund



Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I agree that it ought to be more efficent - but also about as equally
> safe? I.e. if the previous code wasn't safe, the new code wouldn't be
> safe either? As in, we're "just" avoiding the assert, but not increasing
> safety?

Well, the point is that the old code risks performing a relcache load
without holding any lock on the relation.  In practice, given that
we do hold a lock on the parent table, it's probably safe ... but
it's at best bad practice.  It's not too hard to imagine future
optimizations that would allow this to result in a corrupt relcache entry.

I don't believe that there's any equivalent risk in the modified code.

            regards, tom lane