On Sat, Sep 10, 2022 at 12:07:30PM +0800, Zhang Mingli wrote:
> That’s interesting, dig into it for a while but not too much progress.
>
> Maybe we could add some logs to print MultiXactMembers’ xid and status if xid is 0.
>
> Inside MultiXactIdGetUpdateXid()
>
> ```
> nmembers = GetMultiXactIdMembers(xmax, &members, false, false);
>
> if (nmembers > 0)
> {
> int i;
>
> for (i = 0; i < nmembers; i++)
> {
> /* Ignore lockers */
> if (!ISUPDATE_from_mxstatus(members[i].status))
> continue;
>
> /* there can be at most one updater */
> Assert(update_xact == InvalidTransactionId);
> update_xact = members[i].xid;
>
> // log here if xid is invalid
> #ifndef USE_ASSERT_CHECKING
>
> /*
> * in an assert-enabled build, walk the whole array to ensure
> * there's no other updater.
> */
> break;
> #endif
> }
>
> pfree(members);
> }
> // and here if didn’t update update_xact at all (it shouldn’t happen as designed)
Yeah. I added assertions for the above case inside the loop, and for
this one, and this fails right before "return".
TRAP: FailedAssertion("update_xact != InvalidTransactionId", File: "src/backend/access/heap/heapam.c", Line: 6939, PID:
4743)
It looks like nmembers==2, both of which are lockers and being ignored.
> And could we see multixact reply in logs if db does recover?
Do you mean waldump or ??
BTW, after a number of sigabrt's, I started seeing these during
recovery:
< 2022-09-09 19:44:04.180 CDT >LOG: unexpected pageaddr 1214/AF0FE000 in log segment 0000000100001214000000B4, offset
1040384
< 2022-09-09 23:20:50.830 CDT >LOG: unexpected pageaddr 1214/CF65C000 in log segment 0000000100001214000000D8, offset
6668288
--
Justin