Dear Alexander,
> Unfortunately, the testing procedure I shared above still produces failures
> with the patched 009_twophase.pl.
Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
based on your log.
> With wal_debug = on
> in /tmp/temp.config, I can see:
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: statement: PREPARE TRANSACTION
'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: INSERT @ 0/030227F8: - Transaction/PREPARE:
gidxact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write
0/00000000;flush 0/00000000
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid
791latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
> 2026-02-17 07:06:44.322 EET client backend[754949] 009_twophase.pl LOG: statement: SELECT
pg_current_wal_flush_lsn()
> 2026-02-17 07:06:44.330 EET client backend[754974] 009_twophase.pl LOG: statement: SELECT '0/030227F8' <= replay_lsn
ANDstate = 'streaming'
> FROM pg_catalog.pg_stat_replication
> WHERE application_name IN ('paris', 'walreceiver')
> 2026-02-17 07:06:44.337 EET postmaster[754306] LOG: received immediate shutdown request
> 2026-02-17 07:06:44.339 EET postmaster[754306] LOG: database system is shut down
I have few experience to see the wal_debug output, but background writer seems to
generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
did you really enable the injection point? For now 009_twophase can work without
the `-Dinjection_points=true` but it should be set to avoid random failures.
Another possibility is that bgwriter has already been passed `IS_INJECTION_POINT_ATTACHED("skip-log-running-xacts")`
before attaching the injection point. In this case background writer would
generate the RUNNING_XACTS records at the bad timing. One possible workaround
is to attach the injeciton_point bit earlier, it does not solve the root cause
though. Attached patch applies the band-aid.
> Regarding the approach with "skip-log-running-xacts", I wonder if there is
> a strong guarantee that no other WAL write can happen in the same window
> (e.g., from autovacuum)?
Good point, autovacuum and checkpoint should be also avoided. IIUC it can be done
by setting GUCs.
Best regards,
Hayato Kuroda
FUJITSU LIMITED