Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALj2ACWE9asmvN1B18LqfHE8uBuWGsCEP7OO5trRCxPtTPeHVA@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
On Tue, Feb 20, 2024 at 12:05 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > >> [...] and was able to produce something like: > > > > postgres=# select slot_name,slot_type,active,active_pid,wal_status,invalidation_reason from pg_replication_slots; > > slot_name | slot_type | active | active_pid | wal_status | invalidation_reason > > -------------+-----------+--------+------------+------------+--------------------- > > rep1 | physical | f | | reserved | > > master_slot | physical | t | 1482441 | unreserved | wal_removed > > (2 rows) > > > > does that make sense to have an "active/working" slot "ivalidated"? > > Thanks. Can you please provide the steps to generate this error? Are > you setting max_slot_wal_keep_size on primary to generate > "wal_removed"? I'm able to reproduce [1] the state [2] where the slot got invalidated first, then its wal_status became unreserved, but still the slot is serving after the standby comes up online after it catches up with the primary getting the WAL files from the archive. There's a good reason for this state - https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/slotfuncs.c;h=d2fa5e669a32f19989b0d987d3c7329851a1272e;hb=ff9e1e764fcce9a34467d614611a34d4d2a91b50#l351. This intermittent state can only happen for physical slots, not for logical slots because logical subscribers can't get the missing changes from the WAL stored in the archive. And, the fact looks to be that an invalidated slot can never become normal but still can serve a standby if the standby is able to catch up by fetching required WAL (this is the WAL the slot couldn't keep for the standby) from elsewhere (archive via restore_command). As far as the 0001 patch is concerned, it reports the invalidation_reason as long as slot_contents.data.invalidated != RS_INVAL_NONE. I think this is okay. Thoughts? [1] ./initdb -D db17 echo "max_wal_size = 128MB max_slot_wal_keep_size = 64MB archive_mode = on archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'" | tee -a db17/postgresql.conf ./pg_ctl -D db17 -l logfile17 start ./psql -d postgres -p 5432 -c "SELECT pg_create_physical_replication_slot('sb_repl_slot', true, false);" rm -rf sbdata logfilesbdata ./pg_basebackup -D sbdata echo "port=5433 primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu' primary_slot_name='sb_repl_slot' restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'" | tee -a sbdata/postgresql.conf touch sbdata/standby.signal ./pg_ctl -D sbdata -l logfilesbdata start ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();" ./pg_ctl -D sbdata -l logfilesbdata stop ./psql -d postgres -p 5432 -c "SELECT pg_logical_emit_message(true, 'mymessage', repeat('aaaa', 10000000));" ./psql -d postgres -p 5432 -c "CHECKPOINT;" ./pg_ctl -D sbdata -l logfilesbdata start ./psql -d postgres -p 5432 -xc "SELECT * FROM pg_replication_slots;" [2] postgres=# SELECT * FROM pg_replication_slots; -[ RECORD 1 ]-------+------------- slot_name | sb_repl_slot plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 710667 xmin | catalog_xmin | restart_lsn | 0/115D21A0 confirmed_flush_lsn | wal_status | unreserved safe_wal_size | 77782624 two_phase | f conflict_reason | failover | f synced | f invalidation_reason | wal_removed last_inactive_at | inactive_count | 1 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: