Thread: Estimating wal_keep_size
Is this reasonable thinking?
I'd think that one would want a wal_keep_size to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be.
In my case, I get a complete replacement (in the form of "|"- delimited ASCII files) of one of the SCHEMAs every Sunday. The size of that ASCII data is about 2GB, so I'm thinking of doubling that to 4GB (256 WAL files) to protect me in the case of the standby being unavailable during the update. Note that a complete loss of both servers is not catastrophic (I have backups); it would just be annoying.
I'd think that one would want a wal_keep_size to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be.
In my case, I get a complete replacement (in the form of "|"- delimited ASCII files) of one of the SCHEMAs every Sunday. The size of that ASCII data is about 2GB, so I'm thinking of doubling that to 4GB (256 WAL files) to protect me in the case of the standby being unavailable during the update. Note that a complete loss of both servers is not catastrophic (I have backups); it would just be annoying.
On Wed, Jun 16, 2021 at 05:36:24PM -0700, Dean Gibson (DB Administrator) wrote: > Is this reasonable thinking? > > I'd think that one would want a *wal_keep_size* to cover the pending updates > while the standby server might be unavailable, however long one might > anticipate that would be. It's usually a better approach to use a replication slot, to keep all the required WAL files, and only when needed. See https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS for more details. Note that a replication slot will keep all WAL files, which might eventually lead to an outage if the standby doesn't come back before the filesystem containing the logs get full. You can cap the maximum amount of retained WAL filed since pg 13 using max_slot_wal_keep_size, see https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE.
On 2021-06-16 18:02, Julien Rouhaud wrote:
On Wed, Jun 16, 2021 at 05:36:24PM -0700, Dean Gibson (DB Administrator) wrote:Is this reasonable thinking? I'd think that one would want a *wal_keep_size* to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be.It's usually a better approach to use a replication slot, to keep all the required WAL files, and only when needed. See https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS for more details. Note that a replication slot will keep all WAL files, which might eventually lead to an outage if the standby doesn't come back before the filesystem containing the logs get full. You can cap the maximum amount of retained WAL files since pg 13 using max_slot_wal_keep_size, see https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE.
Granted, but the same question arises about the value for max_slot_wal_keep_size. Setting either too low risks data loss, & setting either too high results in unnecessary disk space used. The question was, is the estimated VALUE reasonable under the circumstances?
Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) <postgresql@mailpen.com> a écrit :
Granted, but the same question arises about the value for max_slot_wal_keep_size. Setting either too low risks data loss, & setting either too high results in unnecessary disk space used. The question was, is the estimated VALUE reasonable under the circumstances?
it may be, until one day it won't be. and that day usually happens. when you set up this kind of limit you choose service availability over your data, so you have to accept that it may not be enough. if this is a problem don't setup a limit.
On Fri, 18 Jun 2021 at 23:58, Julien Rouhaud <rjuju123@gmail.com> wrote:
Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) <postgresql@mailpen.com> a écrit :
Granted, but the same question arises about the value for max_slot_wal_keep_size. Setting either too low risks data loss, & setting either too high results in unnecessary disk space used. The question was, is the estimated VALUE reasonable under the circumstances?it may be, until one day it won't be. and that day usually happens. when you set up this kind of limit you choose service availability over your data, so you have to accept that it may not be enough. if this is a problem don't setup a limit.
Anyways, I used to emit pg_current_wal_lsn to graphite (or any other telemetry monitoring) every minute or so, ot be able to calculate wal growth over period of time.
Then used it to estimate how much of the disk would be required for a PITR setup like barman if we use a 7 day WAL post backup.
one can use other math expressions like <time>moving avg to estimate wal growth over the time duration of an incident with the replica etc. ofcourse one needs to also calculate how fast the wals would be played back, if one has hot_standby_feedback for long running queries on replicas etc but I think I put my point.
Thanks,
Vijay
Mumbai, India
On 2021-06-16 17:36, Dean Gibson (DB Administrator) wrote:
Is this reasonable thinking?
I'd think that one would want a wal_keep_size to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be.
In my case, I get a complete replacement (in the form of "|"- delimited ASCII files) of one of the SCHEMAs every Sunday. The size of that ASCII data is about 2GB, so I'm thinking of doubling that to 4GB (256 WAL files) to protect me in the case of the standby being unavailable during the update. Note that a complete loss of both servers is not catastrophic (I have backups); it would just be annoying.
In the absence of any clear guidance, I temporarily set wal_keep_size to 16GB & waited for the Sunday update. That update today created just over 6GB of WAL files during the update, so I've set wal_keep_size to 8GB (512 WAL files).
Oh, and wal_keep_size is NOT an upper limit restricting further WAL files. It's more like a minimum.