RE: Potential data loss due to race condition during logical replication slot creation - Mailing list pgsql-bugs

From Hayato Kuroda (Fujitsu)
Subject RE: Potential data loss due to race condition during logical replication slot creation
Date
Msg-id TYCPR01MB1207719C811F580A8774C79B7F52A2@TYCPR01MB12077.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Potential data loss due to race condition during logical replication slot creation  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: Potential data loss due to race condition during logical replication slot creation  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-bugs
Dear hackers,

While analyzing another failure [1], I found here. I think they occurred by the
same reason.

The reported failure occurred when the replication slot is created in the middle
of the transaction and it reuses the snapshot from other slot. The reproducer is:

```
Session0

SELECT pg_create_logical_replication_slot('slot0', 'test_decoding');
BEGIN;
INSERT INTO foo ...

Session1

SELECT pg_create_logical_replication_slot('slot1', 'test_decoding');

Session2

CHECKPOINT;
SELECT pg_logical_slot_get_changes('slot0', NULL, NULL);

Session0

INSERT INTO var ... // var is defined with (user_catalog_table = true)
COMMIT;

Session1
SELECT pg_logical_slot_get_changes('slot1', NULL, NULL);
-> Assertion failure.
```

> Here is the summary of several proposals we've discussed:
> a) Have CreateInitDecodingContext() always pass need_full_snapshot =
> true to AllocateSnapshotBuilder().

> b) Have snapbuild.c being able to handle multiple SnapBuildOnDisk versions.

> c) Add a global variable, say in_create, to snapbuild.c

Regarding three options raised by Sawada-san, I preferred the approach a).
Since the issue could happen for all supported branches, we should choose the
conservative approach. Also, it is quite painful if there are some codes for
handling the same issue.

Attached patch implemented the approach a) since no one made. I also added
the test which can do assertion failure, but not sure it should be included.

[1]:
https://www.postgresql.org/message-id/TYCPR01MB1207717063D701F597EF98A0CF5272%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

pgsql-bugs by date:

Previous
From: Kristo Marijo
Date:
Subject: AW: BUG #18389: pg_database_owner not recognized with alter default privileges
Next
From: Ronan Dunklau
Date:
Subject: Re: FSM Corruption (was: Could not read block at end of the relation)