Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CAA4eK1+Oefb-dxBfi178YrW3wvmBZA2ymz5ctAGo=82pxG74Wg@mail.gmail.com Whole thread Raw |
In response to | RE: Introduce XID age and inactive timeout based replication slot invalidation ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
On Wed, Feb 12, 2025 at 1:16 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Wednesday, February 12, 2025 11:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Feb 11, 2025 at 9:39 PM Nathan Bossart > > <nathandbossart@gmail.com> wrote: > > > > > > On Tue, Feb 11, 2025 at 03:22:49PM +0100, Álvaro Herrera wrote: > > > > I find this proposed patch a bit strange and I feel it needs more > > > > explanation. > > > > > > > > When this thread started, Bharath justified his patches saying that > > > > a slot that's inactive for a very long time could be problematic > > > > because of XID wraparound. Fine, that sounds a reasonable feature. > > > > If you wanted to invalidate slots whose xmins were too old, I would > > > > support that. He submitted that as his 0004 patch then. > > > > > > > > However, he also chose to submit 0003 with invalidation based on a > > > > timeout. This is far less convincing a feature to me. The > > > > justification for the time out seems to be that ... it's difficult > > > > to have a one-size-fits-all value because size of disks vary. (???) > > > > Or something like that. Really? I mean -- yes, this will prevent > > > > problems in toy databases when run in developer's laptops. It will > > > > not prevent any problems in production databases. Do we really want > > > > a setting that is only useful for toy situations rather than production? > > > > > > > > > > ... > > > > > > > > I'm baffled. > > > > > > I agree, and I am also baffled because I think this discussion has > > > happened at least once already on this thread. > > > > > > > Yes, we previously discussed this topic and Robert seems to prefer a > > time-based parameter for invalidating the slot (1)(2) as it is easier to reason in > > terms of time. The other points discussed previously were that there are tools > > that create a lot of slots and sometimes forget to clean up slots. Bharath has > > seen this in production and we now have the tool pg_createsubscriber that > > creates a slot-per-database, so if for some reason, such slots are not cleaned > > on the tool's exit, such a parameter could save the cluster. See (3)(4). > > > > Also, we previously didn't have a good experience with XID-based threshold > > parameters like vacuum_defer_cleanup_age as mentioned by Robert (1). > > AFAICU from the previous discussion we need a time-based parameter and we > > didn't rule out xid_age based parameter as another parameter. > > Yeah, I think the primary purpose of this time-based option is to invalidate dormant > replication slots that have been inactive for a long period, in which case the > slots are no longer useful. > > Such slots can remain if a subscriber is down due to a system error or > inaccessible because of network issues. If this situation persists, it might be > more practical to recreate the subscriber rather than attempt to recover the > node and wait for it to catch up, which could be time-consuming. > > Parameters like max_slot_wal_keep_size and max_slot_xid_id_age do not > differentiate between active and inactive replication slots. Some customers I > met are hesitant about using these settings, as they can sometimes invalidate > a slot unnecessarily and break the replication. > Alvaro, Nathan, do let us know if you would like to discuss more on the use case for this new GUC idle_replication_slot_timeout? Otherwise, we can proceed with this patch. -- With Regards, Amit Kapila.
pgsql-hackers by date: