Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | Zg2DgfXorAXrC8lw@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Hi, On Wed, Apr 03, 2024 at 08:28:04PM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Just one comment on v32-0001: > > > > +# Synced slot on the standby must get its own inactive_since. > > +is( $standby1->safe_psql( > > + 'postgres', > > + "SELECT '$inactive_since_on_primary'::timestamptz <= '$inactive_since_on_standby'::timestamptz AND > > + '$inactive_since_on_standby'::timestamptz <= '$slot_sync_time'::timestamptz;" > > + ), > > + "t", > > + 'synchronized slot has got its own inactive_since'); > > + > > > > By using <= we are not testing that it must get its own inactive_since (as we > > allow them to be equal in the test). I think we should just add some usleep() > > where appropriate and deny equality during the tests on inactive_since. > > > Except for the above, v32-0001 LGTM. > > Thanks. Please see the attached v33-0001 patch after removing equality > on inactive_since TAP tests. Thanks! v33-0001 LGTM. > On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Some comments regarding v31-0002: > > > > T2 === > > > > In case the slot is invalidated on the primary, > > > > primary: > > > > postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1'; > > slot_name | inactive_since | invalidation_reason > > -----------+-------------------------------+--------------------- > > s1 | 2024-04-03 06:56:28.075637+00 | inactive_timeout > > > > then on the standby we get: > > > > standby: > > > > postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1'; > > slot_name | inactive_since | invalidation_reason > > -----------+------------------------------+--------------------- > > s1 | 2024-04-03 07:06:43.37486+00 | inactive_timeout > > > > shouldn't the slot be dropped/recreated instead of updating inactive_since? > > The sync slots that are invalidated on the primary aren't dropped and > recreated on the standby. Yeah, right (I was confused with synced slots that are invalidated locally). > However, I > found that the synced slot is acquired and released unnecessarily > after the invalidation_reason is synced from the primary. I added a > skip check in synchronize_one_slot to skip acquiring and releasing the > slot if it's locally found inactive. With this, inactive_since won't > get updated for invalidated sync slots on the standby as we don't > acquire and release the slot. CR1 === Yeah, I can see: @@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) " name slot \"%s\" already exists on the standby", remote_slot->name)); + /* + * Skip the sync if the local slot is already invalidated. We do this + * beforehand to save on slot acquire and release. + */ + if (slot->data.invalidated != RS_INVAL_NONE) + return false; Thanks to the drop_local_obsolete_slots() call I think we are not missing the case where the slot has been invalidated on the primary, invalidation reason has been synced on the standby and later the slot is dropped/ recreated manually on the primary (then it should be dropped/recreated on the standby too). Also it seems we are not missing the case where a sync slot is invalidated locally due to wal removal (it should be dropped/recreated). > > > CR5 === > > > > + /* > > + * This function isn't expected to be called for inactive timeout based > > + * invalidation. A separate function InvalidateInactiveReplicationSlot is > > + * to be used for that. > > > > Do you think it's worth to explain why? > > Hm, I just wanted to point out the actual function here. I modified it > to something like the following, if others feel we don't need that, I > can remove it. Sorry If I was not clear but I meant to say "Do you think it's worth to explain why we decided to create a dedicated function"? (currently we "just" explain that we created one). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: