Thread: Estimating wal_keep_size

Estimating wal_keep_size

From
"Dean Gibson (DB Administrator)"
Date:
Is this reasonable thinking?

I'd think that one would want a wal_keep_size to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be.

In my case, I get a complete replacement (in the form of "|"- delimited ASCII files) of one of the SCHEMAs every Sunday.  The size of that ASCII data is about 2GB, so I'm thinking of doubling that to 4GB (256 WAL files) to protect me in the case of the standby being unavailable during the update.  Note that a complete loss of both servers is not catastrophic (I have backups);  it would just be annoying.

Re: Estimating wal_keep_size

From
Julien Rouhaud
Date:
On Wed, Jun 16, 2021 at 05:36:24PM -0700, Dean Gibson (DB Administrator) wrote:
> Is this reasonable thinking?
> 
> I'd think that one would want a *wal_keep_size* to cover the pending updates
> while the standby server might be unavailable, however long one might
> anticipate that would be.

It's usually a better approach to use a replication slot, to keep all the
required WAL files, and only when needed.  See
https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS
for more details.

Note that a replication slot will keep all WAL files, which might eventually
lead to an outage if the standby doesn't come back before the filesystem
containing the logs get full.  You can cap the maximum amount of retained WAL
filed since pg 13 using max_slot_wal_keep_size, see
https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE.



Re: Estimating wal_keep_size

From
"Dean Gibson (DB Administrator)"
Date:
On 2021-06-16 18:02, Julien Rouhaud wrote:
On Wed, Jun 16, 2021 at 05:36:24PM -0700, Dean Gibson (DB Administrator) wrote:
Is this reasonable thinking?

I'd think that one would want a *wal_keep_size* to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be.
It's usually a better approach to use a replication slot, to keep all the required WAL files, and only when needed.  See https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS for more details.

Note that a replication slot will keep all WAL files, which might eventually lead to an outage if the standby doesn't come back before the filesystem containing the logs get full.  You can cap the maximum amount of retained WAL files since pg 13 using max_slot_wal_keep_size, see
https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE.

Granted, but the same question arises about the value for max_slot_wal_keep_size.  Setting either too low risks data loss, & setting either too high results in unnecessary disk space used.  The question was, is the estimated VALUE reasonable under the circumstances?

Re: Estimating wal_keep_size

From
Julien Rouhaud
Date:
Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) <postgresql@mailpen.com> a écrit :

Granted, but the same question arises about the value for max_slot_wal_keep_size.  Setting either too low risks data loss, & setting either too high results in unnecessary disk space used.  The question was, is the estimated VALUE reasonable under the circumstances?

it may be, until one day it won't be. and that day usually happens. when you set up this kind of limit you choose service availability over your data, so you have to accept that it may not be enough. if this is a problem don't setup a limit. 

Re: Estimating wal_keep_size

From
Vijaykumar Jain
Date:


On Fri, 18 Jun 2021 at 23:58, Julien Rouhaud <rjuju123@gmail.com> wrote:
Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) <postgresql@mailpen.com> a écrit :

Granted, but the same question arises about the value for max_slot_wal_keep_size.  Setting either too low risks data loss, & setting either too high results in unnecessary disk space used.  The question was, is the estimated VALUE reasonable under the circumstances?

it may be, until one day it won't be. and that day usually happens. when you set up this kind of limit you choose service availability over your data, so you have to accept that it may not be enough. if this is a problem don't setup a limit. 

yep, that day does come :) and in that case, i used to drop slot (primary is high priority) and rebuild the repica. We already had multiple replicas under load balancer, so it was feasible.

Anyways, I used to emit pg_current_wal_lsn to graphite (or any other telemetry monitoring) every minute or so, ot be able to calculate wal growth over period of time.
Then used it to estimate how much of the disk would be required for a PITR setup like barman if we use a 7 day WAL post backup.
one can use other math expressions like  <time>moving avg to estimate wal growth  over the time duration of an incident with the replica etc. ofcourse one needs to also calculate how fast the wals would be played back, if one has hot_standby_feedback for long running queries on replicas etc but I think I put my point.
 

--
Thanks,
Vijay
Mumbai, India

Re: Estimating wal_keep_size

From
"Dean Gibson (DB Administrator)"
Date:
On 2021-06-16 17:36, Dean Gibson (DB Administrator) wrote:
Is this reasonable thinking?

I'd think that one would want a wal_keep_size to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be.

In my case, I get a complete replacement (in the form of "|"- delimited ASCII files) of one of the SCHEMAs every Sunday.  The size of that ASCII data is about 2GB, so I'm thinking of doubling that to 4GB (256 WAL files) to protect me in the case of the standby being unavailable during the update.  Note that a complete loss of both servers is not catastrophic (I have backups);  it would just be annoying.


In the absence of any clear guidance, I temporarily set wal_keep_size to 16GB & waited for the Sunday update.  That update today created just over 6GB of WAL files during the update, so I've set wal_keep_size to 8GB (512 WAL files).

Oh, and wal_keep_size is NOT an upper limit restricting further WAL files. It's more like a minimum.