BUG #18213: Standby's repeatable read isolation level transaction encountered a "nonrepeatable read" problem - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18213: Standby's repeatable read isolation level transaction encountered a "nonrepeatable read" problem
Date
Msg-id 18213-dbcf8d67b0e687f5@postgresql.org
Whole thread Raw
Responses Re: BUG #18213: Standby's repeatable read isolation level transaction encountered a "nonrepeatable read" problem
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18213
Logged by:          Fei Changhong
Email address:      feichanghong@qq.com
PostgreSQL version: 16.1
Operating system:   Operating system: centos 7,Kernel version: 5.10.13
Description:

We found that transactions with repeatable read isolation level on Standby
have a "nonrepeatable read" problem, that is, the results of two select are
different, and the hot_standby_feedback guc has been set to on. This problem
occurs when "create index concurrently" is executed on RW, and can be
reproduced through the following steps:
RW initialization data:
```
create table t(a int, b int, c char(1024));
create index on t(b);
insert into t select i, i, i from generate_series(1, 6)i;
```
Start a repeatable read transaction on Standby:
```
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
select txid_current_snapshot();
select a, b from t where a = 5;
```
RW execute update and create index concurrently:
```
update t set b = -1 where a = 5;
create index CONCURRENTLY t1_idx_btree_id on t(a);
```
Standby executes select again (still within the existing repeatable read
transaction). We expect that the results of the two select should be the
same, but the second select did not get any data.
```
postgres=*# select a, b from t where a = 5;
  a | b
---+---
  5 | 5
(1 row)

postgres=*# set enable_seqscan to off;
SET
postgres=*# select a, b from t where a = 5;
  a | b
---+---
(0 rows)
```

The environment in which the problem occurs:
Operating system: centos 7
Kernel version: 5.10.134
PostgreSQL code branch/commit:
master/d053a879bb360fb72c46de2a013e741d3f7c9e8d
Client: psql

We have also analyzed the causes of this problem, as follows:
1. RW uses an MVCC snapshot when executing "create index concurrently", so
the updated old tuple cannot be accessed through the index if the update is
not a HOT.
2. RW did not consider the oldest xmin of the snapshot on Standby when
creating the index concurrently.
3. Even in a repeatable read transaction, catlog snapshot will get new one,
so the tuple (5, 5) cannot be accessed when selecting through the newly
created index, but seqscan can still access it.

We have several ideas to fix this problem, but they all have obvious
flaws:
1. In the WaitForOlderSnapshots function in the final stage of "create index
concurrently", wait for replication_slot_xmin to exceed limitXmin. In this
solution, when there are long transactions on Standby, "create index
concurrently" operations may be blocked.
2. When selecting an index path in Standby, snapshot->xmin must be greater
than the transaction ID that creates the index. This solution will make the
invalidate of the plancache very complicated, otherwise the process will not
be able to use the newly created index in the future.
3. Consider replication_slot_xmin when creating snapshots for scaning the
heap table, and add all the tuple that Standby may access to the index. This
solution is relatively hacky and will increase the space occupied by the
index.


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
Next
From: Alvaro Herrera
Date:
Subject: Re: libpq: pipeline mode might desynchronize client and server