Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | ZcCuTqik6hT+HQKS@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Hi, On Thu, Jan 11, 2024 at 10:48:13AM +0530, Bharath Rupireddy wrote: > Hi, > > Therefore, it is often easy for developers to do the following: > a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5 > billion, after which the slots get invalidated. > b) set a timeout of say 1 or 2 or 3 days, after which the inactive > slots get invalidated. > > To implement (a), postgres needs a new GUC called max_slot_xid_age. > The checkpointer then invalidates all the slots whose xmin (the oldest > transaction that this slot needs the database to retain) or > catalog_xmin (the oldest transaction affecting the system catalogs > that this slot needs the database to retain) has reached the age > specified by this setting. > > To implement (b), first postgres needs to track the replication slot > metrics like the time at which the slot became inactive (inactive_at > timestamptz) and the total number of times the slot became inactive in > its lifetime (inactive_count numeric) in ReplicationSlotPersistentData > structure. And, then it needs a new timeout GUC called > inactive_replication_slot_timeout. Whenever a slot becomes inactive, > the current timestamp and inactive count are stored in > ReplicationSlotPersistentData structure and persisted to disk. The > checkpointer then invalidates all the slots that are lying inactive > for about inactive_replication_slot_timeout duration starting from > inactive_at. > > In addition to implementing (b), these two new metrics enable > developers to improve their monitoring tools as the metrics are > exposed via pg_replication_slots system view. For instance, one can > build a monitoring tool that signals when replication slots are lying > inactive for a day or so using inactive_at metric, and/or when a > replication slot is becoming inactive too frequently using inactive_at > metric. Thanks for the patch and +1 for the idea, I think adding those new "invalidation reasons" make sense. > > I’m attaching the v1 patch set as described below: > 0001 - Tracks invalidation_reason in pg_replication_slots. This is > needed because slots now have multiple reasons for slot invalidation. > 0002 - Tracks inactive replication slot information inactive_at and > inactive_timeout. > 0003 - Adds inactive_timeout based replication slot invalidation. > 0004 - Adds XID based replication slot invalidation. > I think it's better to have the XID one being discussed/implemented before the inactive_timeout one: what about changing the 0002, 0003 and 0004 ordering? 0004 -> 0002 0002 -> 0003 0003 -> 0004 As far 0001: " This commit renames conflict_reason to invalidation_reason, and adds the support to show invalidation reasons for both physical and logical slots. " I'm not sure I like the fact that "invalidations" and "conflicts" are merged into a single field. I'd vote to keep conflict_reason as it is and add a new invalidation_reason (and put "conflict" as value when it is the case). The reason is that I think they are 2 different concepts (could be linked though) and that it would be easier to check for conflicts (means conflict_reason is not NULL). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: