Thread: Patroni, slots, and expiring WALs

Patroni, slots, and expiring WALs

From
"Peter J. Holzer"
Date:
I think I'm missing something basic here.

We have set up a postgresql cluster with Patroni (3.0.1-1.pgdg22.04+1)
and PostgreSQL (15+248.pgdg22.04+1) from the PGDG repo fur Ubuntu.

The patroni configuration was created via the pg_createconfig_patroni
script, basically using all the defaults.

The configuration includes `use_slots: true` and I can see a slot in
pg_replication_slots on the leader.

I was under the impression that this would be sufficient to prevent WALs
from being deleted on the leader before they are used on the replica.

However, when we took down one node for about two hours for some tests
recently (with some moderate traffic on the remaining node), the replica
didn't catch up after being restarted and inspection of the logs showed
that it was trying to get WALs which had already been deleted.

So apparently, `use_slots: true` isn't enough. What else do I have to
configure? (I know about wal_keep_size, but it was my understanding that
this isn't needed when slots are used)

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Patroni, slots, and expiring WALs

From
Jeremy Smith
Date:


On Tue, Mar 28, 2023 at 10:55 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:


The configuration includes `use_slots: true` and I can see a slot in
pg_replication_slots on the leader.

I was under the impression that this would be sufficient to prevent WALs
from being deleted on the leader before they are used on the replica.


Is max_slot_wal_keep_size set to something other than -1 on the leader? 

Re: Patroni, slots, and expiring WALs

From
Alexander Kukushkin
Date:
Hi,


On Tue, 28 Mar 2023 at 16:55, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

However, when we took down one node for about two hours for some tests
recently (with some moderate traffic on the remaining node), the replica
didn't catch up after being restarted and inspection of the logs showed
that it was trying to get WALs which had already been deleted.


It is impossible to know whether it is down temporarily or gone forever, therefore the slot is removed as soon as the member key expires (after TTL).
 
So apparently, `use_slots: true` isn't enough. What else do I have to
configure? (I know about wal_keep_size, but it was my understanding that
this isn't needed when slots are used)

The best option is to configure continuous archiving and PITR. Backups are always important.
The second option - you can put all member names into permanent slots configuration (using patronictl edit-config):
slots:
  nodename1:
    type: physical
  nodename2:
    type: physical
  nodename3:
    type: physical

This way slots representing these members will not be removed.

Regards,
--
Alexander Kukushkin

Re: Patroni, slots, and expiring WALs

From
"Peter J. Holzer"
Date:
On 2023-03-28 11:07:04 -0400, Jeremy Smith wrote:
> On Tue, Mar 28, 2023 at 10:55 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
>     The configuration includes `use_slots: true` and I can see a slot in
>     pg_replication_slots on the leader.
>
>     I was under the impression that this would be sufficient to prevent WALs
>     from being deleted on the leader before they are used on the replica.
>
>
>
> Is max_slot_wal_keep_size set to something other than -1 on the leader? 

Nope, it's -1:

postgres=# show max_slot_wal_keep_size ;
╔════════════════════════╗
║ max_slot_wal_keep_size ║
╟────────────────────────╢
║ -1                     ║
╚════════════════════════╝
(1 row)

(Sorry, forgot to mention this)

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Patroni, slots, and expiring WALs

From
"Peter J. Holzer"
Date:
On 2023-03-28 17:08:38 +0200, Alexander Kukushkin wrote:
> On Tue, 28 Mar 2023 at 16:55, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
>
>     However, when we took down one node for about two hours for some tests
>     recently (with some moderate traffic on the remaining node), the replica
>     didn't catch up after being restarted and inspection of the logs showed
>     that it was trying to get WALs which had already been deleted.
>
>
>
> It is impossible to know whether it is down temporarily or gone forever,
> therefore the slot is removed as soon as the member key expires (after TTL).
>  

TTL is 30 seconds by default. So that doesn't seem very useful.


>     So apparently, `use_slots: true` isn't enough. What else do I have to
>     configure? (I know about wal_keep_size, but it was my understanding that
>     this isn't needed when slots are used)
>
>
> The best option is to configure continuous archiving and PITR. Backups are
> always important.

Yeah, but for the given application a daily dump is sufficient for
backups, so I'd like to keep that simple.

> The second option - you can put all member names into permanent slots
> configuration (using patronictl edit-config):
> slots:
>   nodename1:
>     type: physical
>   nodename2:
>     type: physical
>   nodename3:
>     type: physical
>
> This way slots representing these members will not be removed.

That seems to work (at least the slot didn't disappear within a few
minutes). I'll do some more testing.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Patroni, slots, and expiring WALs

From
"Peter J. Holzer"
Date:
On 2023-03-28 17:27:27 +0200, Peter J. Holzer wrote:
> On 2023-03-28 17:08:38 +0200, Alexander Kukushkin wrote:
> > The second option - you can put all member names into permanent slots
> > configuration (using patronictl edit-config):
> > slots:
> >   nodename1:
> >     type: physical
> >   nodename2:
> >     type: physical
> >   nodename3:
> >     type: physical
> >
> > This way slots representing these members will not be removed.
>
> That seems to work (at least the slot didn't disappear within a few
> minutes). I'll do some more testing.

Works nicely. Thanks!

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment