Álvaro Herrera <alvherre@kurilemu.de> wrote:
> I realized that I hadn't posted the patch version I described somewhere
> downthread. Here it is, as 0001.
>
> While thinking about it for posting just now, I wondered if it would
> work to consider that any transaction whose commit record has been
> decoded but which nevertheless gets a true return from
> TransactionIdIsInProgress(), just is not committed yet and so should be
> omitted from the xip array of the snapshot. That is, if it's
> still-running, then we just don't copy it into the output snapshot.
> That's implemented as 0002 here. This seems somehow less controversial,
> as we don't have to test TransactionIdDidCommit() for a transaction that
> we haven't seen as not-running per PGPROC; and it should also be faster,
> because we don't have to wait for anybody to commit.However it gives
> me pause that perhaps the snapshot would not be fully correct. (Indeed
> there are a few failing tests in the subscription suite).
I recall that in some of the patches for REPACK enhancements (snapshot
switching, MVCC-safety, etc.) for v20 I had a problem with
TransactionIdIsInProgress(). In particular, I tried to add a flag like
PROC_IN_VACUUM, to limit the REPACK's impact on VACUUM xmin horizon.
The problem was that TransactionIdIsInProgress() compares the xid to
RecentXmin before accessing CLOG. IIRC VACUUM's RecentXmin skipped the xid of
REPACK and considered its xid not in progress anymore, but since the
transaction wasn't committed yet per CLOG, VACCUM incorrectly considered it
aborted (per HeapTupleSatisfiesVacuumHorizon).
Thus I imagine that with 0002, transaction having PROC_IN_SAFE_IC set might be
added to the snapshot's list of committed transactions although its still in
progress.
> Failing other ideas, I think we should just go with 0001. We'd need more
> commentary on why is TransactionIdDidCommit() OK, when we haven't
> scanned PGPROC for that xid, though.
Mihail already told me that I should consider adding this patch to the CF. I
said that I'm aware of its importance (because REPACK probably exposes the bug
more than logical replication does) and that I'll remind you if you happen to
forget about it. However I thought that fixes of existing bugs are not subject
to feature freeze, so I did not bring it up yet.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com