Thread: How to prevent master server crash if hot standby stops
Hi! Streaming asynchronous binary replication is used with hot standby slave. If slave stops responing, master server will create files in pg_wal directory. If disk becomes full, master server crashes also. How to avoid this ? If disk is nearly full, master should stop additional files creation (and maybe stop or disable replication slot). Postgres 12 in Debian 10 is used. Andrus.
On Mon, 2020-04-06 at 11:03 +0300, Andrus wrote: > Streaming asynchronous binary replication is used with hot standby slave. > > If slave stops responing, master server will create files in pg_wal directory. > If disk becomes full, master server crashes also. > > How to avoid this ? > > If disk is nearly full, master should stop additional files creation (and maybe stop or disable replication slot). > Postgres 12 in Debian 10 is used. That's why you use monitoring. Never use replication slots without monitoring replication (or at least the disk space on the primary). Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Mon, Apr 06, 2020 at 11:03:20AM +0300, Andrus wrote: > If slave stops responing, master server will create files in pg_wal directory. > If disk becomes full, master server crashes also. > > How to avoid this ? > > If disk is nearly full, master should stop additional files creation > (and maybe stop or disable replication slot). > Postgres 12 in Debian 10 is used. When you use replication slots, it is very important to put in place a monitoring solution to check if too much WAL is retained, and note that there is nothing able to do that natively in core Postgres. There are however multiple ways to solve this problem, like a background worker (for the slot monitoring as well as optionally killing and/or dropping), a simple cron job or even check_postgres. -- Michael
Attachment
Hi! Thank you. >That's why you use monitoring. >Never use replication slots without monitoring replication (or at least the >disk space on the primary). How to implement this automatically, without human interaction required ? "superuser_reserved_connections" setting exists. How about implementing "data_reserved_space" setting ? How to create procedure in server or maybe cron scipt which stops replication if disk becomes nearly full ? How to force to stop replication slot in master if it is in use ? Andrus.
Hi! Thank you. >When you use replication slots, it is very important to put in place a >monitoring solution to check if too much WAL is retained, and note >that there is nothing able to do that natively in core Postgres. >There are however multiple ways to solve this problem, like a >background worker (for the slot monitoring as well as optionally >killing and/or dropping), a simple cron job or even check_postgres. Where to find some sample how to implement this ? I read from docs that slot cannot dropped if it is in use. How to stop replication in this case. Andrus.
On Mon, 2020-04-06 at 11:47 +0300, Andrus wrote: > > That's why you use monitoring. > > Never use replication slots without monitoring replication (or at least the > > disk space on the primary). > > How to implement this automatically, without human interaction required ? > > "superuser_reserved_connections" setting exists. > How about implementing "data_reserved_space" setting ? > > How to create procedure in server or maybe cron scipt which stops replication if disk becomes nearly full ? > > How to force to stop replication slot in master if it is in use ? You don't. You reserve enough disk space that WAL can grow for a while before filling the disk. Any you need to trigger human intervention with monitoring. If you prefer replication to fail silently, don't use replication slots. Use "wal_keep_segments" instead. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hi! >If you prefer replication to fail silently, don't use replication >slots. Use "wal_keep_segments" instead. I desided to give 1 GB to wal. So I added wal_keep_segments=60 After some time Postgres created 80 files with total size 1.3GB. How to fix this so that no more than 1 GB of disk space is used ? How to get information how may wal files are yet not processed by slave ? How to delete processed wal files so that 1 GB of disk space can used for some other purposes ? /var/lib/postgresql/12/main/pg_wal# ls 00000001000002A200000072 00000001000002A200000083 00000001000002A200000094 00000001000002A2000000A5 00000001000002A2000000B6 00000001000002A200000073 00000001000002A200000084 00000001000002A200000095 00000001000002A2000000A6 00000001000002A2000000B7 00000001000002A200000074 00000001000002A200000085 00000001000002A200000096 00000001000002A2000000A7 00000001000002A2000000B8 00000001000002A200000075 00000001000002A200000086 00000001000002A200000097 00000001000002A2000000A8 00000001000002A2000000B9 00000001000002A200000076 00000001000002A200000087 00000001000002A200000098 00000001000002A2000000A9 00000001000002A2000000BA 00000001000002A200000077 00000001000002A200000088 00000001000002A200000099 00000001000002A2000000AA 00000001000002A2000000BB 00000001000002A200000078 00000001000002A200000089 00000001000002A20000009A 00000001000002A2000000AB 00000001000002A2000000BC 00000001000002A200000079 00000001000002A20000008A 00000001000002A20000009B 00000001000002A2000000AC 00000001000002A2000000BD 00000001000002A20000007A 00000001000002A20000008B 00000001000002A20000009C 00000001000002A2000000AD 00000001000002A2000000BE 00000001000002A20000007B 00000001000002A20000008C 00000001000002A20000009D 00000001000002A2000000AE 00000001000002A2000000BF 00000001000002A20000007C 00000001000002A20000008D 00000001000002A20000009E 00000001000002A2000000AF 00000001000002A2000000C0 00000001000002A20000007D 00000001000002A20000008E 00000001000002A20000009F 00000001000002A2000000B0 00000001000002A2000000C1 00000001000002A20000007E 00000001000002A20000008F 00000001000002A2000000A0 00000001000002A2000000B1 archive_status 00000001000002A20000007F 00000001000002A200000090 00000001000002A2000000A1 00000001000002A2000000B2 00000001000002A200000080 00000001000002A200000091 00000001000002A2000000A2 00000001000002A2000000B3 00000001000002A200000081 00000001000002A200000092 00000001000002A2000000A3 00000001000002A2000000B4 00000001000002A200000082 00000001000002A200000093 00000001000002A2000000A4 00000001000002A2000000B5 Andrus.
On Tue, 2020-04-07 at 00:50 +0300, Andrus wrote: > > If you prefer replication to fail silently, don't use replication > > slots. Use "wal_keep_segments" instead. > > I desided to give 1 GB to wal. So I added > > wal_keep_segments=60 > > After some time Postgres created 80 files with total size 1.3GB. > > How to fix this so that no more than 1 GB of disk space is used ? > How to get information how may wal files are yet not processed by slave ? > How to delete processed wal files so that 1 GB of disk space can used for some other purposes ? "wal_keep_segments" is the number of old WAL segments the server keeps around for the standby. But there are also some segments that are created for future use (the minimum is governed by "min_wal_size"). All these limits are not hard limits, the server will try to keep them more or less. Consider that WAL segments are created as needed, but only removed during checkpoints. So, about your first question, you cannot. Always make sure that there is more disk space available. About your second question, you also cannot do that. The primary server has no idea which standby server needs which WAL information. All you can tell is where the currently connected standby servers are: SELECT pg_walfile_name(flush_lsn) FROM pg_stat_replication; About your third question, you *never* manually mess with the files in pg_wal. The server does that. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hi! >About your third question, you *never* manually mess with the files in pg_wal. >The server does that. Is it OK to stop server, delete all files in pg_wal directory and re-start server ? Or should default value put back and wait until server frees 1 GB disk space ? Andrus.
On Tue, 2020-04-07 at 12:05 +0300, Andrus wrote: > > About your third question, you *never* manually mess with the files in pg_wal. > > The server does that. > > Is it OK to stop server, delete all files in pg_wal directory and re-start server ? No. > Or should default value put back and wait until server frees 1 GB disk space ? You'd have to set "wal_keep_segments" and "max_wal_size" lower, then have normal database activity and wait until a couple of checkpoints have passed. But frankly, 1 GB is ridiculously little. If you have to worry about that much disk space, you're must be running PostgreSQL on a toaster. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com