I provisioned an RDS instance with 2500GB space and began the restore of a database I know to be about 1750 GB using 16 jobs.
Unfortunately, it died very near the end when it ran out of disk space due to WAL log usage. Lots of:
2024-11-17 00:07:09 UTC::@:[19861]:PANIC: could not write to file "pg_wal/xlogtemp.19861": No space left on device
And then kaboom.
I'm wondering what my course of action should be. Can I disable/reduce WAL during a restore? wal_level is set to replica, can this temporarily be set to minimal? Should I just eat the extra costs to add headroom for the WAL? Would using fewer jobs during a restore reduce the amount of WAL created?