Thread: Visibility bug with prepared transaction with subtransactions on standby

Visibility bug with prepared transaction with subtransactions on standby

From
Heikki Linnakangas
Date:
Hi,

Konstantin and I found an MVCC bug with:

- a prepared transaction,
- which has a subtransaction,
- on a hot standby,
- after starting the standby from a shutdown checkpoint.

See the test case in the attached patch to demonstrate this. The last 
query in the new test returns incorrect result on master, causing the 
test to fail.

The problem
-----------

When you shut down a primary with a prepared transaction, and start a 
hot standby server from the shutdown checkpoint, the hot standby server 
goes through this code at startup:

>             if (wasShutdown)
>                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
>             else
>                 oldestActiveXID = checkPoint.oldestActiveXid;
>             Assert(TransactionIdIsValid(oldestActiveXID));
> 
>             /* Tell procarray about the range of xids it has to deal with */
>             ProcArrayInitRecovery(XidFromFullTransactionId(TransamVariables->nextXid));
> 
>             /*
>              * Startup subtrans only.  CLOG, MultiXact and commit timestamp
>              * have already been started up and other SLRUs are not maintained
>              * during recovery and need not be started yet.
>              */
>             StartupSUBTRANS(oldestActiveXID);
> 
>             /*
>              * If we're beginning at a shutdown checkpoint, we know that
>              * nothing was running on the primary at this point. So fake-up an
>              * empty running-xacts record and use that here and now. Recover
>              * additional standby state for prepared transactions.
>              */
>             if (wasShutdown)
>             {
>                 RunningTransactionsData running;
>                 TransactionId latestCompletedXid;
> 
>                 /*
>                  * Construct a RunningTransactions snapshot representing a
>                  * shut down server, with only prepared transactions still
>                  * alive. We're never overflowed at this point because all
>                  * subxids are listed with their parent prepared transactions.
>                  */
>                 running.xcnt = nxids;
>                 running.subxcnt = 0;
>                 running.subxid_overflow = false;
>                 running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
>                 running.oldestRunningXid = oldestActiveXID;
>                 latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
>                 TransactionIdRetreat(latestCompletedXid);
>                 Assert(TransactionIdIsNormal(latestCompletedXid));
>                 running.latestCompletedXid = latestCompletedXid;
>                 running.xids = xids;
> 
>                 ProcArrayApplyRecoveryInfo(&running);
> 
>                 StandbyRecoverPreparedTransactions();
>             }

The problem is that the RunningTransactions snapshot constructed here 
does not include subtransaction XIDs of the prepared transactions, only 
the main XIDs. Because of that, snapshots taken in the standby will 
consider the sub-XIDs as aborted rather than in-progress. That leads to 
two problems if the prepared transaction is later committed:

- We will incorrectly set hint bits on tuples inserted/deleted by the 
subtransactions, which leads to incorrect query results later if the 
prepared transaction is committed.

- If you acquire an MVCC snapshot and hold to it while the prepared 
transaction commits, the subtransactions will suddenly become visible to 
the old snapshot.

History
-------

StandbyRecoverPreparedTransactions has this comment:

>  * The lack of calls to SubTransSetParent() calls here is by design;
>  * those calls are made by RecoverPreparedTransactions() at the end of recovery
>  * for those xacts that need this.

I think that's wrong; it really should update pg_subtrans. It used to, a 
long time ago, but commit 49e92815497 changed it. Reading the 
discussions that led to that change, seems that we somehow didn't 
realize that it's important to distinguish between in-progress and 
aborted transactions in a standby. On that thread, Nikhil posted [1] a 
test case that is almost exactly the same test case that I used to find 
this, but apparently the visibility in standby in that scenario was not 
tested thoroughly back then.

[1] 
https://www.postgresql.org/message-id/CAMGcDxde4XjDyTjGvZCPVQROpXw1opfpC0vjpCkzc1pcQBqvrg%40mail.gmail.com

Fix
---

Attached is a patch to fix this, with a test case. It should be 
backpatched to all supported versions.

The patch changes a field in RunningTransactionsData from bool to an 
enum. Could that break extensions on back branches? I think it's OK, I'm 
not aware of any extensions touching RunningTransactionsData. I did not 
change the xl_running_xacts WAL record, only the in-memory struct.

Alternatively, we could add a new argument to 
ProcArrayApplyRecoveryInfo() to indicate the new case that the xids 
array in RunningTransactionsData does not include all the subxids but 
they have all been marked in pg_subtrans already. But I think the 
attached is better, as the enum makes the three different states more clear.

-- 
Heikki Linnakangas
Neon (https://neon.tech)
Attachment
On 20/06/2024 16:41, Heikki Linnakangas wrote:
> Attached is a patch to fix this, with a test case.

The patch did not compile, thanks to a last-minute change in a field 
name. Here's a fixed version.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment
On 20/06/2024 17:10, Heikki Linnakangas wrote:
> On 20/06/2024 16:41, Heikki Linnakangas wrote:
>> Attached is a patch to fix this, with a test case.
> 
> The patch did not compile, thanks to a last-minute change in a field
> name. Here's a fixed version.

All I heard is crickets, so committed and backported to all supported 
versions.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Hi,
I'm trying to run REL_13_STABLE recovey tests for windows and I get the 
error

waiting for server to shut down.............. failed
pg_ctl: server does not shut down
# pg_ctl stop failed: 256
# Stale postmaster.pid file for node "paris": PID 1868 no longer exists
Bail out! pg_ctl stop failed

I noticed that on buildfarm recovey tests are disabled for windows, was 
this done intentionally?

'invocation_args' => [
'--config',
'./bowerbird.conf',
'--skip-steps',
'recovery-check',
'--verbose',
'REL_13_STABLE'
],

REL_13_STABLE (071e19a36) - test passed
REL_13_STABLE(e9c8747ee) - test failed

28.06.2024 1:35, Heikki Linnakangas пишет:
> On 20/06/2024 17:10, Heikki Linnakangas wrote:
>> On 20/06/2024 16:41, Heikki Linnakangas wrote:
>>> Attached is a patch to fix this, with a test case.
>>
>> The patch did not compile, thanks to a last-minute change in a field
>> name. Here's a fixed version.
>
> All I heard is crickets, so committed and backported to all supported 
> versions.
>



Re: Visibility bug with prepared transaction with subtransactions on standby

From
Alexander Lakhin
Date:
Hello Heikki,

27.06.2024 21:35, Heikki Linnakangas wrote:
> All I heard is crickets, so committed and backported to all supported versions.
>

Recently hornet made some noise too: [1], by failing on the test
modification added with e9c8747ee (in REL_13_STABLE):
# issuing query via background psql: SELECT count(*) FROM t_009_tbl_standby_mvcc
# pump_until: process terminated unexpectedly when searching for "(?^:background_psql: QUERY_SEPARATOR)" with stream:
""
query failed: psql:<stdin>:6: ERROR:  relation "t_009_tbl_standby_mvcc" does not exist

It looks like t_009_tbl_standby_mvcc had not arrived to standby yet when
it was queried.

I can reproduce the same with TEMP_CONFIG containing:
recovery_min_apply_delay = '500ms'

All the other tests (I ran check-world) pass with this setting. So maybe
this test lacks waiting for standby synchronization.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2024-12-13%2022%3A14%3A10

Best regards,
Alexander



Re: Visibility bug with prepared transaction with subtransactions on standby

From
Heikki Linnakangas
Date:
On 16/12/2024 10:00, Alexander Lakhin wrote:
> Recently hornet made some noise too: [1], by failing on the test
> modification added with e9c8747ee (in REL_13_STABLE):
> # issuing query via background psql: SELECT count(*) FROM 
> t_009_tbl_standby_mvcc
> # pump_until: process terminated unexpectedly when searching for "(? 
> ^:background_psql: QUERY_SEPARATOR)" with stream: ""
> query failed: psql:<stdin>:6: ERROR:  relation "t_009_tbl_standby_mvcc" 
> does not exist
> 
> It looks like t_009_tbl_standby_mvcc had not arrived to standby yet when
> it was queried.
> 
> I can reproduce the same with TEMP_CONFIG containing:
> recovery_min_apply_delay = '500ms'
> 
> All the other tests (I ran check-world) pass with this setting. So maybe
> this test lacks waiting for standby synchronization.

Fixed, thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)