Re: BUG #18815: Logical replication worker Segmentation fault - Mailing list pgsql-bugs

From Sergey Belyashov
Subject Re: BUG #18815: Logical replication worker Segmentation fault
Date
Msg-id CAOe0RDwUeZduRUcD1N=BcAk5z3ANPpdyZtr4qNjiY6fPQu=sDw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18815: Logical replication worker Segmentation fault  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #18815: Logical replication worker Segmentation fault
List pgsql-bugs
Hi,

Do I need to apply this patch for debugging purposes?

I want to remove brin indexes from active partitions and start
replication. When the issue is fixed I will return brin indexes back.

Best regards,
Sergey Belyashov

вт, 18 февр. 2025 г. в 02:37, Tom Lane <tgl@sss.pgh.pa.us>:
>
> I wrote:
> > Further to this ... I'd still really like to have a reproducer.
> > While brininsertcleanup is clearly being less robust than it should
> > be, I now suspect that there is another bug somewhere further down
> > the call stack.  We're getting to this point via ExecCloseIndices,
> > and that should be paired with ExecOpenIndices, and that would have
> > created a fresh IndexInfo.  So it looks a lot like some path in a
> > logrep worker is able to call ExecCloseIndices twice on the same
> > working data.  That would probably lead to a "releasing a lock you
> > don't own" error if we weren't hitting this crash first.
>
> Hmm ... I tried modifying ExecCloseIndices to blow up if called
> twice, as in the attached.  This gets through core regression
> just fine, but it blows up in three different subscription TAP
> tests, all with a stack trace matching Sergey's:
>
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f064bfe3e65 in __GI_abort () at abort.c:79
> #2  0x00000000009e9253 in ExceptionalCondition (
>     conditionName=conditionName@entry=0xb8717b "indexDescs[i] != NULL",
>     fileName=fileName@entry=0xb87139 "execIndexing.c",
>     lineNumber=lineNumber@entry=249) at assert.c:66
> #3  0x00000000006f0b13 in ExecCloseIndices (
>     resultRelInfo=resultRelInfo@entry=0x2f11c18) at execIndexing.c:249
> #4  0x00000000006f86d8 in ExecCleanupTupleRouting (mtstate=0x2ef92d8,
>     proute=0x2ef94e8) at execPartition.c:1273
> #5  0x0000000000848cb6 in finish_edata (edata=0x2ef8f50) at worker.c:717
> #6  0x000000000084d0a0 in apply_handle_insert (s=<optimized out>)
>     at worker.c:2460
> #7  apply_dispatch (s=<optimized out>) at worker.c:3389
> #8  0x000000000084e494 in LogicalRepApplyLoop (last_received=25066600)
>     at worker.c:3680
> #9  start_apply (origin_startpos=0) at worker.c:4507
> #10 0x000000000084e711 in run_apply_worker () at worker.c:4629
> #11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:4798
> #12 0x00000000008138f9 in BackgroundWorkerMain (startup_data=<optimized out>,
>     startup_data_len=<optimized out>) at bgworker.c:842
>
> The problem seems to be that apply_handle_insert_internal does
> ExecOpenIndices and then ExecCloseIndices, and then
> ExecCleanupTupleRouting does ExecCloseIndices again, which nicely
> explains why brininsertcleanup blows up if you happen to have a BRIN
> index involved.  What it doesn't explain is how come we don't see
> other symptoms from the duplicate index_close calls, regardless of
> index type.  I'd have expected an assertion failure from
> RelationDecrementReferenceCount, and/or an assertion failure for
> nonzero rd_refcnt at transaction end, and/or a "you don't own a lock
> of type X" gripe from LockRelease.  We aren't getting any of those,
> but why not, if this code is as broken as I think it is?
>
> (On closer inspection, we seem to have about 99% broken relcache.c's
> ability to notice rd_refcnt being nonzero at transaction end, but
> the other two things should still be happening.)
>
>                         regards, tom lane
>



pgsql-bugs by date:

Previous
From: Richard Guo
Date:
Subject: Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally
Next
From: PG Bug reporting form
Date:
Subject: BUG #18817: Security Bug Report: Plaintext Password Exposure in Logs