Re: BF member drongo doesn't like 035_standby_logical_decoding.pl - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: BF member drongo doesn't like 035_standby_logical_decoding.pl
Date
Msg-id Z5cx/aExSSutUK8E@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
List pgsql-hackers
Hi,

On Fri, Jan 24, 2025 at 02:44:21PM -0500, Andres Freund wrote:
> Hm, maybe I'm missing something, but isn't it possible for the active slot to
> actually progress decoding past the conflict point? It's an active slot, with
> the consumer running in the background, so all that needs to happen for that
> is that logical decoding progresses past the conflict point. That requires
> there be some reference to a newer xid to be in the WAL, but there's nothing
> preventing that afaict?
> 
> 
> In fact, I now saw this comment:
> 
> # Note that pg_current_snapshot() is used to get the horizon.  It does
> # not generate a Transaction/COMMIT WAL record, decreasing the risk of
> # seeing a xl_running_xacts that would advance an active replication slot's
> # catalog_xmin.  Advancing the active replication slot's catalog_xmin
> # would break some tests that expect the active slot to conflict with
> # the catalog xmin horizon.

Yeah, that comes from 46d8587b504 (where we tried to reduce as much as possible
the risk of seeing an unwanted xl_running_xacts being generated).

> Which seems precisely what's happening here?

Much probably yes.

> If that's the issue, I think we need to find a way to block logical decoding
> from making forward progress during the test.
> 
> The easiest way would be to stop pg_recvlogical and emit a bunch of changes,
> so that the backend is stalled sending out data. But that'd require a hard to
> predict amount of data to be emitted, which isn't great.

What about using an injection point instead to block pg_recvlogical until
we want it to resume?

> But perhaps we could do something smarter, by starting a session on the
> primary that acquires an access exclusive lock on a relation that logical
> decoding will need to access?  The tricky bit likely would be that it'd
> somehow need to *not* prevent VACUUM on the primary.

Hm, I'm not sure how we could do that.

> If we could trigger VACUUM in a transaction on the primary this would be
> easy, but we can't.

Another idea that I had ([1]) was  to make use of injection points
around places where RUNNING_XACTS is emitted. IIRC I tried to work on this but
that was not simple as it sounds as we need the startup process not to be blocked
.

[1]: https://www.postgresql.org/message-id/ZmadPZlEecJNbhvI%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Alexander Pyhalov
Date:
Subject: Re: postgres_fdw could deparse ArrayCoerceExpr
Next
From: Bertrand Drouvot
Date:
Subject: Re: BF member drongo doesn't like 035_standby_logical_decoding.pl