Happened again even with max_wal_size=32GB on the 128GB WAL partition. At
this point I'm quite sure what happens:
+ a checkpoint is going on for several minutes (because of I/O contention
on the archival drive)
+ meanwhile data keeps coming in through several processes of pg_restore
+ data is coming in much faster because the WAL partition is high perf
+ The checkpoint makes it before full and finishes and frees up 32GB
(max_wal_size) from the WAL drive. Lets say the WAL partition is now at
80/128GB full.
+ Immediately another checkpoint starts. This one has to checkpoint 80GB
of data.
+ This doesn't make it in time and the WAL partition fills up.
+ Painful recovery follows, the database server won't even succeed in
doing recovery when starting up...
I'm surprised that there is no mechanism for the backends to block while
the WAL is overflowing.
Dimitris
On Thu, 10 Apr 2025, Dimitrios Apostolou wrote:
> Hello list,
>
> last night I got ENOSPACE on the WAL partition while running a huge
> pg_restore on an empty and idle database.
> The checkpoint that started 7 minutes earlier never finished:
>
> 04:31:09 LOG: checkpoint starting: wal
> ...
> 04:38:04 PANIC: could not write to file "pg_wal/xlogtemp.546204": No
> space left on device
> : CONTEXT: COPY table_partition_123, line 120872534
>
> This is most likely because most of the data is written to a very slow
> "archival" tablespace. No I/O errors on the system logs, I know things go
> slow on that device (iSCSI drive over a shared spinning-disks pool and busy
> network).
>
> This happened even though I'm keeping the WAL in its own partition (128GB
> partition size), with a lot of free space above max_wal_size (64GB). Somehow
> it managed to grow above max_wal_size and fill 100% of the partition.
>
> I'm running latest PostgreSQL 17 and the settings have been temporarily
> tweaked for fastest pg_restore:
>
> max_wal_size=64GB
> max_replication_slots = 0
> max_logical_replication_workers = 0
> max_wal_senders = 0
> wal_level = minimal
> autovacuum = off
>
> Several things seem to have gone wrong here. Questions:
>
> + The WAL partition is much faster than the archival tablespace. Am I in
> constant danger of overruning max_wal_size? How to make 100% sure this
> never happens again?
>
> + After recovery, with the database idling, I notice that WAL space usage
> is constant at 64GB. Why doesn't it free up space down to min_wal_size
> (1GB)?
>
> + I just created a 13GB zstd-compressed tarball of those 64GB WAL
> files. This indicates that the files are compressible despite using
> wal_compression=zstd setting. Could it be that postgres ignores the flag
> and does not compress the WAL? How to check?
>
> + I'm using parallel pg_restore --data-only, can't avoid that for now.
> Even though all the tables are empty (I truncated everything before
> starting pg_restore), I can't find a way to avoid going through the WAL.
> Ideas?
>
>
> Thanks in advance,
> Dimitris
>
>
>