Thread: Patroni, slots, and expiring WALs
I think I'm missing something basic here. We have set up a postgresql cluster with Patroni (3.0.1-1.pgdg22.04+1) and PostgreSQL (15+248.pgdg22.04+1) from the PGDG repo fur Ubuntu. The patroni configuration was created via the pg_createconfig_patroni script, basically using all the defaults. The configuration includes `use_slots: true` and I can see a slot in pg_replication_slots on the leader. I was under the impression that this would be sufficient to prevent WALs from being deleted on the leader before they are used on the replica. However, when we took down one node for about two hours for some tests recently (with some moderate traffic on the remaining node), the replica didn't catch up after being restarted and inspection of the logs showed that it was trying to get WALs which had already been deleted. So apparently, `use_slots: true` isn't enough. What else do I have to configure? (I know about wal_keep_size, but it was my understanding that this isn't needed when slots are used) hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On Tue, Mar 28, 2023 at 10:55 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
The configuration includes `use_slots: true` and I can see a slot in
pg_replication_slots on the leader.
I was under the impression that this would be sufficient to prevent WALs
from being deleted on the leader before they are used on the replica.
Is max_slot_wal_keep_size set to something other than -1 on the leader?
Hi,
On Tue, 28 Mar 2023 at 16:55, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
However, when we took down one node for about two hours for some tests
recently (with some moderate traffic on the remaining node), the replica
didn't catch up after being restarted and inspection of the logs showed
that it was trying to get WALs which had already been deleted.
It is impossible to know whether it is down temporarily or gone forever, therefore the slot is removed as soon as the member key expires (after TTL).
So apparently, `use_slots: true` isn't enough. What else do I have to
configure? (I know about wal_keep_size, but it was my understanding that
this isn't needed when slots are used)
The best option is to configure continuous archiving and PITR. Backups are always important.
The second option - you can put all member names into permanent slots configuration (using patronictl edit-config):
slots:
nodename1:
type: physical
nodename2:
type: physical
nodename3:
type: physical
This way slots representing these members will not be removed.
Regards,
--
Alexander Kukushkin
On 2023-03-28 11:07:04 -0400, Jeremy Smith wrote: > On Tue, Mar 28, 2023 at 10:55 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > The configuration includes `use_slots: true` and I can see a slot in > pg_replication_slots on the leader. > > I was under the impression that this would be sufficient to prevent WALs > from being deleted on the leader before they are used on the replica. > > > > Is max_slot_wal_keep_size set to something other than -1 on the leader? Nope, it's -1: postgres=# show max_slot_wal_keep_size ; ╔════════════════════════╗ ║ max_slot_wal_keep_size ║ ╟────────────────────────╢ ║ -1 ║ ╚════════════════════════╝ (1 row) (Sorry, forgot to mention this) hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2023-03-28 17:08:38 +0200, Alexander Kukushkin wrote: > On Tue, 28 Mar 2023 at 16:55, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > > However, when we took down one node for about two hours for some tests > recently (with some moderate traffic on the remaining node), the replica > didn't catch up after being restarted and inspection of the logs showed > that it was trying to get WALs which had already been deleted. > > > > It is impossible to know whether it is down temporarily or gone forever, > therefore the slot is removed as soon as the member key expires (after TTL). > TTL is 30 seconds by default. So that doesn't seem very useful. > So apparently, `use_slots: true` isn't enough. What else do I have to > configure? (I know about wal_keep_size, but it was my understanding that > this isn't needed when slots are used) > > > The best option is to configure continuous archiving and PITR. Backups are > always important. Yeah, but for the given application a daily dump is sufficient for backups, so I'd like to keep that simple. > The second option - you can put all member names into permanent slots > configuration (using patronictl edit-config): > slots: > nodename1: > type: physical > nodename2: > type: physical > nodename3: > type: physical > > This way slots representing these members will not be removed. That seems to work (at least the slot didn't disappear within a few minutes). I'll do some more testing. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2023-03-28 17:27:27 +0200, Peter J. Holzer wrote: > On 2023-03-28 17:08:38 +0200, Alexander Kukushkin wrote: > > The second option - you can put all member names into permanent slots > > configuration (using patronictl edit-config): > > slots: > > nodename1: > > type: physical > > nodename2: > > type: physical > > nodename3: > > type: physical > > > > This way slots representing these members will not be removed. > > That seems to work (at least the slot didn't disappear within a few > minutes). I'll do some more testing. Works nicely. Thanks! hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"