Hi,
On Fri, Jan 24, 2025 at 02:44:21PM -0500, Andres Freund wrote:
> Hm, maybe I'm missing something, but isn't it possible for the active slot to
> actually progress decoding past the conflict point? It's an active slot, with
> the consumer running in the background, so all that needs to happen for that
> is that logical decoding progresses past the conflict point. That requires
> there be some reference to a newer xid to be in the WAL, but there's nothing
> preventing that afaict?
>
>
> In fact, I now saw this comment:
>
> # Note that pg_current_snapshot() is used to get the horizon. It does
> # not generate a Transaction/COMMIT WAL record, decreasing the risk of
> # seeing a xl_running_xacts that would advance an active replication slot's
> # catalog_xmin. Advancing the active replication slot's catalog_xmin
> # would break some tests that expect the active slot to conflict with
> # the catalog xmin horizon.
Yeah, that comes from 46d8587b504 (where we tried to reduce as much as possible
the risk of seeing an unwanted xl_running_xacts being generated).
> Which seems precisely what's happening here?
Much probably yes.
> If that's the issue, I think we need to find a way to block logical decoding
> from making forward progress during the test.
>
> The easiest way would be to stop pg_recvlogical and emit a bunch of changes,
> so that the backend is stalled sending out data. But that'd require a hard to
> predict amount of data to be emitted, which isn't great.
What about using an injection point instead to block pg_recvlogical until
we want it to resume?
> But perhaps we could do something smarter, by starting a session on the
> primary that acquires an access exclusive lock on a relation that logical
> decoding will need to access? The tricky bit likely would be that it'd
> somehow need to *not* prevent VACUUM on the primary.
Hm, I'm not sure how we could do that.
> If we could trigger VACUUM in a transaction on the primary this would be
> easy, but we can't.
Another idea that I had ([1]) was to make use of injection points
around places where RUNNING_XACTS is emitted. IIRC I tried to work on this but
that was not simple as it sounds as we need the startup process not to be blocked
.
[1]: https://www.postgresql.org/message-id/ZmadPZlEecJNbhvI%40ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com