Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus() - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()
Date
Msg-id CAA4eK1J62R6HuTDE+WDisc2Me_o0WO0ND84ixyfYekbM8su47w@mail.gmail.com
Whole thread Raw
In response to BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()  (Alexander Lakhin <exclusion@gmail.com>)
List pgsql-bugs
On Fri, Jun 25, 2021 at 12:20 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The offending (the one that leaved a "valid" clogGroupNext) proccess is
> 60d48c2d.ea21. It looks like it got from the
> pg_atomic_compare_exchange_u32() the nextidx value that was written in the
> clogGroupFirst by the process 60d48c2e.ebc5, and exited just after that.
>

Your analysis seems to be in the right direction. Can you try by
setting clogGroupNext to INVALID_PGPROCNO
(pg_atomic_write_u32(&proc->clogGroupNext, INVALID_PGPROCNO);) before
we return false in the first while(true) loop in function
TransactionGroupUpdateXidStatus()?

I think this should be reproducible on all branches from HEAD till
v11. Have you tried in any other branch? I'll also try to reproduce
it.

-- 
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: BUG #17066: Cache lookup failed when null (iso-8859-1) is passed as anycompatiblemultirange
Next
From: Alexander Lakhin
Date:
Subject: Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()