Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Adding REPACK [concurrently]
Date
Msg-id 46846.1774267234@localhost
Whole thread Raw
In response to Re: Adding REPACK [concurrently]  (Antonin Houska <ah@cybertec.at>)
Responses Re: Adding REPACK [concurrently]
List pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:

> Antonin Houska <ah@cybertec.at> wrote:
> 
> > Antonin Houska <ah@cybertec.at> wrote:
> > 
> > > Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
> > > 
> > > > The concurrency test failed once. I tried to reproduce the below scenario
> > > > but no luck,i think the reason the assert failure happened because
> > > > after speculative insert there might be no spec CONFIRM or ABORT, thoughts?
> > > 
> > > Perhaps, I'll try. I'm not sure the REPACK decoding worker does anthing
> > > special regarding decoding. If you happen to see the problem again, please try
> > > to preserve the related WAL segments - if this is a bug in PG executor,
> > > pg_waldump might reveal that.
> > 
> > I could not reproduce the failure, and have no idea how speculative insert can
> > stay w/o CONFIRM / ABORT record. The only problem I could imagine is that
> > change_useless_for_repack() filters out the CONFIRM / ABORT record
> > accidentally, but neither code review nor debugger proves that
> > theory. (Actually if this was the problem, the test failure probably wouldn't
> > be that rare.)
> 
> I confirm that I was able to reproduce the crash using debugger and your more
> recent diagnosis [1]. Indeed, filtering was the problem.
> 
> Unfortunately, I wasn't able to make the crash easily reproducible using
> isolation tester. The problem is that the logical decoding is performed by a
> background worker, and when the backend executing REPACK waits for the
> background worker, which in turn waits on an injection point, the isolation
> tester does not recognize that it's effectively the backend who is waiting on
> the injection point. Therefore the isolation tester does not proceed to the
> next step.

I could not resist digging in it deeper :-) Attached is a test that reproduces
the crash - it includes the isolation tester enhancement that I posted
separately [1]. It crashes reliably with v43 [2] if your fix v43-0005 is
omitted.

[1] https://www.postgresql.org/message-id/4703.1774250534%40localhost
[2] https://www.postgresql.org/message-id/202603191855.fzsgsnyzfvpt%40alvherre.pgsql

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Propagate XLogFindNextRecord error to callers
Next
From: Anthonin Bonnefoy
Date:
Subject: Re: Propagate XLogFindNextRecord error to callers