Re: Replication slot is not able to sync up - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Replication slot is not able to sync up
Date
Msg-id CAA4eK1K=uB=4i8f+6QdtjmRC3KY7Rv9O4fh5OvgaSmbHL-tkrA@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot is not able to sync up  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, May 29, 2025 at 6:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, May 28, 2025 at 12:15 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> > I think the SQL API was mainly intended for testing and debugging purposes
> > where controlled sync operations are useful. For production use, the slotsync
> > worker (with sync_replication_slots=on) is recommended because it automatically
> > handles this problem and requires minimal manual intervention. But to avoid
> > confusion, I think we should clearly document this distinction.
>
> If this analysis is correct, this should never have been committed, at
> least not in this form. When we ship something, it needs to work.
> Testing and debugging facilities are best placed in src/test/modules
> or in contrib; if for some reason they really need to be in
> src/backend, then they had better be clearly documented as such.
>
> What really annoys me about this is that the function gives every
> superficial impression of being something you could actually use. Why
> wouldn't a user believe that if they periodically connect and run
> pg_sync_replication_slots(), things will be OK? I can certainly
> imagine a user *wanting* that to work. I'd like that to work. But it
> seems like either it's impossible for some reason that isn't clear to
> me, and we just went ahead and shipped it in a non-working state
> anyway, or it is possible to make it work and we didn't do the
> necessary engineering before something got committed. Either way,
> that's really disappointing.
>
> > I think the issue occurs because unlike the slotsync worker, the SQL API
> > removes temporary slots when the function ends, so it cannot hold back the
> > standby's catalog_xmin. If transactions on the primary keep advancing xids, the
> > source slot's catalog_xmin on the primary fails to catch up with the standby's
> > nextXid, causing sync failure.
>
> I still don't understand how this problem arises in the first place.
> It seems like you're describing a situation where we need to prevent
> the standby from getting ahead of the primary, but that should be
> impossible by definition.
>

The reason is that we do not allow creating a synced slot if the
required WAL or catalog rows for this slot have been removed or are at
risk of removal. The way we achieve it is that during the first
sync_slot call, either via slotsync worker or API, we create a
temporary slot on the standby with xmin pointed to the safest possible
xmin (catalog_xmin) on standby computed by
GetOldestSafeDecodingTransactionId() and WAL (restart_lsn) pointed to
by the oldest WAL present on standby. Now, if the source slot's (slot
on primary) corresponding location/xmin are prior to the location/xmin
on the standby then we can't sync the slot immediately because there
is no guarantee that required resources (WAL/catalog_rows) will be
available when we try to use the synced slot after promotion. The
slotsync worker will keep retrying to sync the slot and will
eventually succeed once the source slot's values are safe to be synced
to the standby. Now, with API, we didn't implement this retry logic
due to which we see the behaviour currently reported. Note that once
the first time sync is successful, the consecutive times, even the
API, should work similar to the worker.

I agree that the current use of API is limited, such that one can use
it in a controlled environment (e.g., the first time sync happens
before other operations on primary), or to debug this functionality,
or to write tests. It is not clear to me why someone would not use the
built-in functionality to sync slots and prefer this API. But going
forward (as we see people would like to use this API to sync slots),
it is not that difficult to improve this API to match its behaviour
with the built-in worker for initial/first sync.

I see that we separately document functions [1] used for
development/debug, and this API could be documented in that way.

[1]: https://www.postgresql.org/docs/current/functions-textsearch.html#TEXTSEARCH-FUNCTIONS-DEBUG-TABLE

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Maxim Orlov
Date:
Subject: Re: Warning -Wclobbered in PG_TRY(...)
Next
From: Andrei Lepikhov
Date:
Subject: Re: Proposal: Job Scheduler