RE: Fix 035_standby_logical_decoding.pl race conditions - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Fix 035_standby_logical_decoding.pl race conditions
Date
Msg-id OSCPR01MB14966755BC3C534A0058EA07FF5AF2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Fix 035_standby_logical_decoding.pl race conditions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Fix 035_standby_logical_decoding.pl race conditions
List pgsql-hackers
Dear Amit, Bertrand,

> You have not added any injection point for the above case. Isn't it
> possible that if running_xact record is logged concurrently to the
> pruning record, it should move the active slot on standby, and the
> same failure should occur in this case as well?

I considered that the timing failure can happen. Reproducer:

```
 $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'D';]);
+$node_primary->safe_psql('testdb', 'CHECKPOINT');
+sleep(20);
 $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'E';]);
```

And here is my theory...

Firstly, a new table was created with smaller fill factor. Then, after doing UPDATE
three times, the page became full. At fourth UPDATE command (let's say txn4),
the page pruning was done by the backend process and PRUNE_ON_ACCESS was generated.
It requested standbys to discard tuples before the third UPDATE (say txn3),
thus the slot could be invalidated.
However, if a RUNNING_XACTS record is generated between txn3 and txn4, the
oldestRunningXact would be same xid as txn4, and the catalog_xmin of the standby
slot would be advanced till that. Upcoming PRUNE_ON_ACCESS points the txn3 so that
slot invalidation won't happen in this case.

Based on the fact, I've updated to use injection_points for scenario 5. Of course,
PG16/17 patches won't use the active slot for that scenario.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Rushabh Lathia
Date:
Subject: Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
Next
From: Amit Langote
Date:
Subject: Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning