On Tue, Apr 17, 2018 at 02:57:03PM -0400, Tom Lane wrote:
> I think possibly the OP doesn't understand why it's designed that way.
> The point is not really to "recycle old WAL files", it's to avoid having
> disk space allocation occur during the critical section where we must
> PANIC on failure. Now, of course, that doesn't really work if the
> filesystem is COW underneath, because it's allocating fresh disk space
> anyway even though semantically we're overwriting existing data.
> But what I'd like to see is a fix that deals with that somehow, rather
> than continue to accept the possibility of ENOSPC occurring inside WAL
> writes on these file systems. I have no idea what such a fix would
> look like :-(
That looks like a rather difficult problem to solve in PostgreSQL
itself, as the operator running the cluster is in charge of setting up
the FS options which would control the COW behavior, so it seems to me
that there is room as well for an in-core option to tell the
checkpointer to enforce the removal of past files instead of simple
recycling them, because this actually breaks max_wal_size.
max_wal_size is of course a soft limit, and it has been discussed a
couple of times that it would be nice to get that to a hard limit, but
it is really a hard problem to avoid the system to not slow down or even
stop its I/O if the hard is close by or reached..
--
Michael