RE: BUG: Former primary node might stuck when started as a standby - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: BUG: Former primary node might stuck when started as a standby
Date
Msg-id OS9PR01MB12149D4F1A2BC23637688CE4DF56BA@OS9PR01MB12149.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: BUG: Former primary node might stuck when started as a standby  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: BUG: Former primary node might stuck when started as a standby
List pgsql-hackers
Dear Alexander,

> Unfortunately, the testing procedure I shared above still produces failures
> with the patched 009_twophase.pl.

Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
based on your log.

> With wal_debug = on
> in /tmp/temp.config, I can see:

> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  statement: PREPARE TRANSACTION
'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  INSERT @ 0/030227F8:  - Transaction/PREPARE:
gidxact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790
 
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  xlog flush request 0/030227F8; write
0/00000000;flush 0/00000000
 
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET background writer[754333] LOG:  INSERT @ 0/03022838:  - Standby/RUNNING_XACTS: nextXid
791latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
 
> 2026-02-17 07:06:44.322 EET client backend[754949] 009_twophase.pl LOG:  statement: SELECT
pg_current_wal_flush_lsn()
> 2026-02-17 07:06:44.330 EET client backend[754974] 009_twophase.pl LOG:  statement: SELECT '0/030227F8' <= replay_lsn
ANDstate = 'streaming'
 
>             FROM pg_catalog.pg_stat_replication
>             WHERE application_name IN ('paris', 'walreceiver')
> 2026-02-17 07:06:44.337 EET postmaster[754306] LOG:  received immediate shutdown request 
> 2026-02-17 07:06:44.339 EET postmaster[754306] LOG:  database system is shut down

I have few experience to see the wal_debug output, but background writer seems to
generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
did you really enable the injection point? For now 009_twophase can work without
the `-Dinjection_points=true` but it should be set to avoid random failures.

Another possibility is that bgwriter has already been passed `IS_INJECTION_POINT_ATTACHED("skip-log-running-xacts")`
before attaching the injection point. In this case background writer would
generate the RUNNING_XACTS records at the bad timing. One possible workaround
is to attach the injeciton_point bit earlier, it does not solve the root cause
though. Attached patch applies the band-aid.

> Regarding the approach with "skip-log-running-xacts", I wonder if there is
> a strong guarantee that no other WAL write can happen in the same window
> (e.g., from autovacuum)?

Good point, autovacuum and checkpoint should be also avoided. IIUC it can be done
by setting GUCs.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: DOCS - Add an "Options" section for pg_controldata
Next
From: Michael Paquier
Date:
Subject: Re: Flush some statistics within running transactions