Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: [HACKERS] Restricting maximum keep segments by repslots |
Date | |
Msg-id | 20190627162256.4f4872b8@firost Whole thread Raw |
In response to | Re: [HACKERS] Restricting maximum keep segments by repslots (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] Restricting maximum keep segments by repslots
|
List | pgsql-hackers |
Hi all, Being interested by this feature, I did a patch review. This features adds the GUC "max_slot_wal_keep_size". This is the maximum amount of WAL that can be kept in "pg_wal" by active slots. If the amount of WAL is superior to this limit, the slot is deactivated and its status (new filed in pg_replication_slot) is set as "lost". Patching ======== The patch v13-0003 does not apply on HEAD anymore. The patch v13-0005 applies using "git am --ignore-space-change" Other patches applies correctly. Please, find attached the v14 set of patches rebased on master. Documentation ============= The documentation explains the GUC and related columns in "pg_replication_slot". It reflects correctly the current behavior of the patch. Usability ========= The patch implement what it described. It is easy to enable and disable. The GUC name is describing correctly its purpose. This feature is useful in some HA scenario where slot are required (eg. no possible archiving), but where primary availability is more important than standbys. In "pg_replication_slots" view, the new "wal_status" field is misleading. Consider this sentence and the related behavior from documentation (catalogs.sgml): <literal>keeping</literal> means that some of them are to be removed by the next checkpoint. "keeping" appears when the current checkpoint will delete some WAL further than "current_lsn - max_slot_wal_keep_size", but still required by at least one slot. As some WAL required by some slots will be deleted quite soon, probably before anyone can react, "keeping" status is misleading here. We are already in the red zone. I would expect this "wal_status" to be: - streaming: slot lag between 0 and "max_wal_size" - keeping: slot lag between "max_wal_size" and "max_slot_wal_keep_size". the slot actually protect some WALs from being deleted - lost: slot lag superior of max_slot_wal_keep_size. The slot couldn't protect some WAL from deletion Documentation follows with: The last two states are seen only when max_slot_wal_keep_size is non-negative This is true with the current behavior. However, if "keeping" is set as soon as te slot lag is superior than "max_wal_size", this status could be useful even with "max_slot_wal_keep_size = -1". As soon as a slot is stacking WALs that should have been removed by previous checkpoint, it "keeps" them. Feature tests ============= I have played with various traffic shaping setup between nodes, to observe how columns "active", "wal_status" and "remain" behaves in regard to each others using: while true; do sleep 0.3; psql -p 5432 -AXtc " select now(), active, restart_lsn, wal_status, pg_size_pretty(remain) from pg_replication_slots where slot_name='slot_limit_st'" done The primary is created using: initdb -U postgres -D slot_limit_pr --wal-segsize=1 cat<<EOF >>slot_limit_pr/postgresql.conf port=5432 max_wal_size = 3MB min_wal_size = 2MB max_slot_wal_keep_size = 4MB logging_collector = on synchronous_commit = off EOF WAL activity is generated using a simple pgbench workload. Then, during this activity, packets on loopback are delayed using: tc qdisc add dev lo root handle 1:0 netem delay 140msec Here is how the wal_status behave. I removed the timestamps, but the record order is the original one: t|1/7B116898|streaming|1872 kB t|1/7B1A0000|lost|0 bytes t|1/7B320000|keeping|0 bytes t|1/7B780000|lost|0 bytes t|1/7BB00000|keeping|0 bytes t|1/7BE00000|keeping|0 bytes t|1/7C100000|lost|0 bytes t|1/7C400000|keeping|0 bytes t|1/7C700000|lost|0 bytes t|1/7CA40000|keeping|0 bytes t|1/7CDE0000|lost|0 bytes t|1/7D100000|keeping|0 bytes t|1/7D400000|keeping|0 bytes t|1/7D7C0000|keeping|0 bytes t|1/7DB40000|keeping|0 bytes t|1/7DE60000|lost|0 bytes t|1/7E180000|keeping|0 bytes t|1/7E500000|keeping|0 bytes t|1/7E860000|lost|0 bytes t|1/7EB80000|keeping|0 bytes [...x15] t|1/80800000|keeping|0 bytes t|1/80900000|streaming|940 kB t|1/80A00000|streaming|1964 kB When increasing the network delay to 145ms, the slot has been lost for real. Note that it has been shown as lost but active twice (during approx 0.6s) before being deactivated. t|1/85700000|streaming|2048 kB t|1/85800000|keeping|0 bytes t|1/85940000|lost|0 bytes t|1/85AC0000|lost|0 bytes f|1/85C40000|lost|0 bytes Finally, at least once the following messages appeared in primary logs **before** the "wal_status" changed from "keeping" to "streaming": WARNING: some replication slots have lost required WAL segments So the slot lost one WAL, but the standby has been able to catch-up anyway. My humble opinion about these results: * after many different tests, the status "keeping" appears only when "remain" equals 0. In current implementation, "keeping" really adds no value... * "remain" should be NULL if "max_slot_wal_keep_size=-1 or if the slot isn't active * the "lost" status should be a definitive status * it seems related, but maybe the "wal_status" should be set as "lost" only when the slot has been deactivate ? * logs should warn about a failing slot as soon as it is effectively deactivated, not before.
Attachment
- v14-0001-Add-WAL-relief-vent-for-replication-slots.patch
- v14-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch
- v14-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch
- v14-0004-TAP-test-for-the-slot-limit-feature.patch
- v14-0005-Documentation-for-slot-limit-feature.patch
- v14-0006-Check-removal-of-in-reading-segment-file.patch
pgsql-hackers by date: