Thread: pg_wal folder high disk usage
Good morning,
On one of our postgres instances we have the pg_wal/data folder up to 196GB, out of 200GB disk filled up.
This has stopped the posgresql.service this morning causing two applications to crash.
Unfortunately our database admin is on leave today, and we are trying to figure out how to get the disk down?
Any ideas or suggestions are more than welcome.
Thank you in advance.
On Thu, Oct 31, 2024 at 6:36 AM Paul Brindusa <paulbrindusa88@gmail.com> wrote:
Good morning,On one of our postgres instances we have the pg_wal/data folder up to 196GB, out of 200GB disk filled up.This has stopped the posgresql.service this morning causing two applications to crash.Unfortunately our database admin is on leave today, and we are trying to figure out how to get the disk down?Any ideas or suggestions are more than welcome.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!
Hi,
You might wanna check if archive backups(id enabled) are happening and/or if there is replication lag or replication broken if you have replicas.
On Thu, Oct 31, 2024 at 11:36 AM Paul Brindusa <paulbrindusa88@gmail.com> wrote:
Good morning,On one of our postgres instances we have the pg_wal/data folder up to 196GB, out of 200GB disk filled up.This has stopped the posgresql.service this morning causing two applications to crash.Unfortunately our database admin is on leave today, and we are trying to figure out how to get the disk down?Any ideas or suggestions are more than welcome.Thank you in advance.
On Thu, 2024-10-31 at 10:36 +0000, Paul Brindusa wrote: > On one of our postgres instances we have the pg_wal/data folder up to 196GB, out of 200GB disk filled up. > This has stopped the posgresql.service this morning causing two applications to crash. > Unfortunately our database admin is on leave today, and we are trying to figure out how to get the disk down? > Any ideas or suggestions are more than welcome. Check why pg_wal is growing: https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ Yours, Laurenz Albe
First of all check if postgres cannot archive or delete old WAL files. For immediate space, move older files from pg_Wal to another storage but don't delete them.
Restart postgres in recovery mode and if archiving is not working then try disabling it temporarily to let PostgreSQL automatically clear older WAL files
archive_mode = off
Restart postgres in recovery mode and if archiving is not working then try disabling it temporarily to let PostgreSQL automatically clear older WAL files
archive_mode = off
On Thu, 31 Oct 2024 at 15:36, Paul Brindusa <paulbrindusa88@gmail.com> wrote:
Good morning,On one of our postgres instances we have the pg_wal/data folder up to 196GB, out of 200GB disk filled up.This has stopped the posgresql.service this morning causing two applications to crash.Unfortunately our database admin is on leave today, and we are trying to figure out how to get the disk down?Any ideas or suggestions are more than welcome.Thank you in advance.
On Fri, Nov 1, 2024 at 2:40 AM Muhammad Usman Khan <usman.k@bitnine.net> wrote:
For immediate space, move older files from pg_Wal to another storage but don't delete them.
No, do not do this! Figure out why WAL is not getting removed by Postgres and let it do its job once fixed. Please recall the original poster is trying to figure out what to do because they are not the database admin, so having them figure out which WAL are "older" and safe to move is not good advice.
Resizing the disk is a better option. Could also see if there are other large files on that volume that can be removed or moved elsewhere, esp. large log files.
Hopefully all of this is moot because their DBA is back from leave. :)
Cheers,
Greg
A possible reason for pg_wal buildup is that there is a sort of replication going on(logical or physical replication) and the receiving side of the replication has stopped somehow.
This means: a different server that has a connection to your server and is expecting to receive data. And your server is then expecting to have to send data(this is the important bit). There could be multiple of these connections.
If even 1 of these receiving servers is down, or the network is out, or there is some other reason that it is no longer requesting data from your server, your server will notice it isn't getting confirmation from that other side, that they have received the data. As such, your postgres server will keep this data locally, expecting this situation to be solved in the future, and at that point in time, send all the data the other side hasn't gotten yet.
This is 1 option. As long as your server is configured to expect that other server to be there, and to be receiving, the buildup will continue. Taking the other server offline won't help, in fact it is likely the cause of the issue. The official documentation explains how to get rid of replication slots, ideally your DBA should handle this.
Laurenz's blogpost lays out all the options, for instance it can also happen that your system is generating data so fast, the writing of the WAL files cannot keep up. Or your setup also does WAL archiving and the compression on that is slow.
The post offers some ways to verify things, I suggest checking them out.
And of course, if your DBA is back, have them look at it too.
Regards,
Koen De Groote
On Fri, Nov 1, 2024 at 2:10 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Fri, Nov 1, 2024 at 2:40 AM Muhammad Usman Khan <usman.k@bitnine.net> wrote:For immediate space, move older files from pg_Wal to another storage but don't delete them.No, do not do this! Figure out why WAL is not getting removed by Postgres and let it do its job once fixed. Please recall the original poster is trying to figure out what to do because they are not the database admin, so having them figure out which WAL are "older" and safe to move is not good advice.Resizing the disk is a better option. Could also see if there are other large files on that volume that can be removed or moved elsewhere, esp. large log files.Hopefully all of this is moot because their DBA is back from leave. :)Cheers,Greg
Good morning Koen,
Highly appreciate your response on this.
This has clarified a little bit on the WAL files. Your insights made the whole thing a little bit more clear.
Kind Regards,
Paul B.
On 03/11/2024 13:59, Koen De Groote wrote:
A possible reason for pg_wal buildup is that there is a sort of replication going on(logical or physical replication) and the receiving side of the replication has stopped somehow.This means: a different server that has a connection to your server and is expecting to receive data. And your server is then expecting to have to send data(this is the important bit). There could be multiple of these connections.If even 1 of these receiving servers is down, or the network is out, or there is some other reason that it is no longer requesting data from your server, your server will notice it isn't getting confirmation from that other side, that they have received the data. As such, your postgres server will keep this data locally, expecting this situation to be solved in the future, and at that point in time, send all the data the other side hasn't gotten yet.This is 1 option. As long as your server is configured to expect that other server to be there, and to be receiving, the buildup will continue. Taking the other server offline won't help, in fact it is likely the cause of the issue. The official documentation explains how to get rid of replication slots, ideally your DBA should handle this.Laurenz's blogpost lays out all the options, for instance it can also happen that your system is generating data so fast, the writing of the WAL files cannot keep up. Or your setup also does WAL archiving and the compression on that is slow.The post offers some ways to verify things, I suggest checking them out.And of course, if your DBA is back, have them look at it too.Regards,Koen De GrooteOn Fri, Nov 1, 2024 at 2:10 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:On Fri, Nov 1, 2024 at 2:40 AM Muhammad Usman Khan <usman.k@bitnine.net> wrote:For immediate space, move older files from pg_Wal to another storage but don't delete them.No, do not do this! Figure out why WAL is not getting removed by Postgres and let it do its job once fixed. Please recall the original poster is trying to figure out what to do because they are not the database admin, so having them figure out which WAL are "older" and safe to move is not good advice.Resizing the disk is a better option. Could also see if there are other large files on that volume that can be removed or moved elsewhere, esp. large log files.Hopefully all of this is moot because their DBA is back from leave. :)Cheers,Greg