Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | ZdXrtXLkjvIJMYvB@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Hi, On Wed, Feb 21, 2024 at 10:55:00AM +0530, Bharath Rupireddy wrote: > On Tue, Feb 20, 2024 at 12:05 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > >> [...] and was able to produce something like: > > > > > > postgres=# select slot_name,slot_type,active,active_pid,wal_status,invalidation_reason from pg_replication_slots; > > > slot_name | slot_type | active | active_pid | wal_status | invalidation_reason > > > -------------+-----------+--------+------------+------------+--------------------- > > > rep1 | physical | f | | reserved | > > > master_slot | physical | t | 1482441 | unreserved | wal_removed > > > (2 rows) > > > > > > does that make sense to have an "active/working" slot "ivalidated"? > > > > Thanks. Can you please provide the steps to generate this error? Are > > you setting max_slot_wal_keep_size on primary to generate > > "wal_removed"? > > I'm able to reproduce [1] the state [2] where the slot got invalidated > first, then its wal_status became unreserved, but still the slot is > serving after the standby comes up online after it catches up with the > primary getting the WAL files from the archive. There's a good reason > for this state - > https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/slotfuncs.c;h=d2fa5e669a32f19989b0d987d3c7329851a1272e;hb=ff9e1e764fcce9a34467d614611a34d4d2a91b50#l351. > This intermittent state can only happen for physical slots, not for > logical slots because logical subscribers can't get the missing > changes from the WAL stored in the archive. > > And, the fact looks to be that an invalidated slot can never become > normal but still can serve a standby if the standby is able to catch > up by fetching required WAL (this is the WAL the slot couldn't keep > for the standby) from elsewhere (archive via restore_command). > > As far as the 0001 patch is concerned, it reports the > invalidation_reason as long as slot_contents.data.invalidated != > RS_INVAL_NONE. I think this is okay. > > Thoughts? Yeah, looking at the code I agree that looks ok. OTOH, that looks confusing, maybe we should add a few words about it in the doc? Looking at v5-0001: + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>invalidation_reason</structfield> <type>text</type> + </para> + <para> My initial thought was to put "conflict" value in this new field in case of conflict (not to mention the conflict reason in it). With the current proposal invalidation_reason could report the same as conflict_reason, which sounds weird to me. Does that make sense to you to use "conflict" as value in "invalidation_reason" when the slot has "conflict_reason" not NULL? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: