Re: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Fix slot synchronization with two_phase decoding enabled |
Date | |
Msg-id | CAD21AoA99fyUcbgGpxzhqWh+MxC5iTVdm0_GDKCV5SHxaEEfng@mail.gmail.com Whole thread Raw |
In response to | RE: Fix slot synchronization with two_phase decoding enabled ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Fix slot synchronization with two_phase decoding enabled
|
List | pgsql-hackers |
On Mon, Apr 28, 2025 at 4:33 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Mon, Apr 28, 2025 at 5:33 PM shveta malik wrote: > > > > On Mon, Apr 28, 2025 at 2:27 PM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > > > > > > On Fri, Apr 18, 2025 at 12:29 PM Amit Kapila wrote: > > > > > > > > On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu) > > > > <houzj.fnst@fujitsu.com> > > > > > > > > > > ----- > > > > > Fix > > > > > ----- > > > > > > > > > > I think we should keep the confirmed_flush even if the previous > > > > > synced restart_lsn/catalog_xmin is newer. Attachments include a > > > > > patch for the > > > > same. > > > > > > > > > > > > > This will fix the case we are facing but adds a new rule for slot > > synchronization. > > > > Can we think of a simpler way to fix this by avoiding updating other > > > > slot fields (like two_phase, two_phase_at) if restart_lsn or > > > > catalog_xmin of the local slot is ahead of the remote slot? > > > > > > Since the fix for xmin advancement during fast-forward decoding has > > > been pushed (commit aaf9e95), I'm attaching the V2 patch to address > > > the assert failure by avoiding updating other slot fields (like > > > two_phase, two_phase_at) if restart_lsn or catalog_xmin of the local slot is > > ahead. > > > > > > > The patch looks good to me. We can have minor comment tweak: > > > > + * Syncing only two_phase_at, without also syncing the latest > > + * confirmed_lsn, might lead to transactions between the old > > + * confirmed_lsn and two_phase_at being unexpectedly decoded and sent > > + * to the subscriber. > > > > We can append "following a promotion". > > Thanks for reviewing. Here is V3 patch that addressed it. > > BTW, I also did some tests to confirm the catalog_xmin could > still be ahead in some case, and here is an example: > > 1. Create a failover replication slot named 'logicalslot' on primary > and acquire it in the walsender. > > 2. Log two standby snapshots on primary. Before logging, call txid_current() > To assign a xid, so that each standby snapshot will hold a new xid in its oldestrunningXid field: > - txid_current(); > - `0/3000420` - `running_xacts` (no running transactions, oldestrunningXid = 755) > - txid_current(); > - `0/3000488` - `running_xacts` (no running transactions, oldestrunningXid = 756) > > 3. The walsender sets `0/3000420` as the `candidate_restart_lsn`, 755 as > `candidate_catalog_xmin`, skipping the second `running_xacts` because > `candidate_restart_lsn`/`candidate_catalog_xmin` is already valid. > > 4. After receiving a reply from the apply worker, the walsender assigns > `0/3000420` to `restart_lsn`, `755` to `catalog_xmin`. At this point, the > replication slot 'logicalslot' has `restart_lsn` set to `0/3000420`, > `catalog_xmin` set to `755`. > > 5. On the standby, execute `pg_sync_replication_slots()` to synchronize > 'logicalslot'. > > 6. During synchronization, with the initial `restart_lsn` at `0/3000420`, the > sync slot reaches a consistent point at this position. As a result, it does > not update `candidate_restart_lsn` and `candidate_catalog_xmin` at > consistent point (refer to `SnapBuildProcessRunningXacts()`). > > 7. The sync process identifies the second standby snapshot at `0/3000488` and > uses its LSN as `candidate_restart_lsn`, and use the oldestrunningXid `756` > as `candidate_catalog_xmin`, eventually updating it to `restart_lsn` and > `catalog_xmin`. > > 8. Now, the synced slot holds `restart_lsn` at `0/3000488`, `catalog_xmin` at > `756`, which are all ahead of the remote slot on the primary server. > > Attaching a script to reproduce the same. > > Note that, to reproduce this stably, we'd better modify the value of > LOG_SNAPSHOT_INTERVAL_MS in bgwriter.c to a bigger number to avoid > unexpected xl_running_xacts logging. Thank you for sharing the patch and reproducible steps. I agree that this issue still happens on HEAD and should be fixed. If I understand the problem correctly, we need to avoid setting the values such that two_phase_at > confirmed_lsn. Because otherwise if a transaction is prepared between them, we end up decoding the prepared transaction twice or with an assertion failure if it's enabled. It never happens on the primary since we always set the same value to two_phase_at as confirmed_lsn. Your fix looks good to me. Is it worth considering putting an assertion to verify if new two_phase_at is equal to or greater than confirmed_lsn (or at least it doesn't go backward), when syncing two_phase_at? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: