Re: Assertion failure in SnapBuildInitialSnapshot() - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Assertion failure in SnapBuildInitialSnapshot()
Date
Msg-id CAA4eK1LBJm48515uAoSqiD-qHxXQO9-nAVzps5U73abrrtdFVw@mail.gmail.com
Whole thread Raw
In response to Re: Assertion failure in SnapBuildInitialSnapshot()  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: Assertion failure in SnapBuildInitialSnapshot()  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Thu, Dec 8, 2022 at 8:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> The same assertion failure has been reported on another thread[1].
> Since I could reproduce this issue several times in my environment
> I've investigated the root cause.
>
> I think there is a race condition of updating
> procArray->replication_slot_xmin by CreateInitDecodingContext() and
> LogicalConfirmReceivedLocation().
>
> What I observed in the test was that a walsender process called:
> SnapBuildProcessRunningXacts()
>   LogicalIncreaseXminForSlot()
>     LogicalConfirmReceivedLocation()
>       ReplicationSlotsComputeRequiredXmin(false).
>
> In ReplicationSlotsComputeRequiredXmin() it acquired the
> ReplicationSlotControlLock and got 0 as the minimum xmin since there
> was no wal sender having effective_xmin.
>

What about the current walsender process which is processing
running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender
slot's effective_xmin have a non-zero value? If not, then why?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Next
From: Nathan Bossart
Date:
Subject: Re: recovery modules