Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALj2ACXKKohg7+35Tv2wrynat+TdzVU-dqp4qQAMjgTh03yRGA@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
Hi, Thanks for reviewing. On Mon, Sep 9, 2024 at 10:54 AM shveta malik <shveta.malik@gmail.com> wrote: > > 2) > src/sgml/config.sgml: > > + disables the inactive timeout invalidation mechanism > > + Slot invalidation due to inactivity timeout occurs during checkpoint. > > Either have 'inactive' at both the places or 'inactivity'. Used "inactive timeout". > 3) > slot.c: > +static bool InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause > cause, > + ReplicationSlot *s, > + XLogRecPtr oldestLSN, > + Oid dboid, > + TransactionId snapshotConflictHorizon, > + bool *invalidated); > +static inline bool SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s); > > I think, we do not need above 2 declarations. The code compile fine > without these as the usage is later than the definition. Hm, it's a usual practice that I follow irrespective of the placement of function declarations. Since it was brought up, I removed the declarations. > 4) > + /* > + * An error is raised if error_if_invalid is true and the slot has been > + * invalidated previously. > + */ > + if (error_if_invalid && s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT) > > The comment is generic while the 'if condition' is specific to one > invalidation cause. Even though I feel it can be made generic test for > all invalidation causes but that is not under scope of this thread and > needs more testing/analysis. Right. > For the time being, we can make comment > specific to the concerned invalidation cause. The header of function > will also need the same change. Adjusted the comment, but left the variable name error_if_invalid as is. Didn't want to make it long, one can look at the code to understand what it is used for. > 5) > SlotInactiveTimeoutCheckAllowed(): > > + * Check if inactive timeout invalidation mechanism is disabled or slot is > + * currently being used or server is in recovery mode or slot on standby is > + * currently being synced from the primary. > + * > > These comments say exact opposite of what we are checking in code. > Since the function name has 'Allowed' in it, we should be putting > comments which say what allows it instead of what disallows it. Modified. > 1) > src/sgml/config.sgml: > > + Synced slots are always considered to be inactive because they > don't perform logical decoding to produce changes. > > It is better we avoid such a statement, as internally we use logical > decoding to advance restart-lsn, see > 'LogicalSlotAdvanceAndCheckSnapState' called form slotsync.c. > <Also see related comment 6 below> > > 6) > > + * Synced slots are always considered to be inactive because they don't > + * perform logical decoding to produce changes. > + */ > +static inline bool > +SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s) > > Perhaps we should avoid mentioning logical decoding here. When slots > are synced, they are performing decoding and their inactive_since is > changing continuously. A better way to make this statement will be: > > We want to ensure that the slots being synchronized are not > invalidated, as they need to be preserved for future use when the > standby server is promoted to the primary. This is necessary for > resuming logical replication from the new primary server. > <Rephrase if needed> They are performing logical decoding, but not producing the changes for the clients to consume. So, IMO, the accompanying "to produce changes" next to the "logical decoding" is good here. > 7) > > InvalidatePossiblyObsoleteSlot() > > we are calling SlotInactiveTimeoutCheckAllowed() twice in this > function. We shall optimize. > > At the first usage place, shall we simply get timestamp when cause is > RS_INVAL_INACTIVE_TIMEOUT without checking > SlotInactiveTimeoutCheckAllowed() as IMO it does not seem a > performance critical section. Or if we retain check at first place, > then at the second place we can avoid calling it again based on > whether 'now' is NULL or not. Getting a current timestamp can get costlier on platforms that use various clock sources, so assigning 'now' unconditionally isn't the way IMO. Using the inline function in two places improves the readability. Can optimize it if there's any performance impact of calling the inline function in two places. Will post the new patch version soon. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: