Thread: How to prevent master server crash if hot standby stops

How to prevent master server crash if hot standby stops

From
"Andrus"
Date:
Hi!

Streaming asynchronous binary replication is used with hot standby slave.

If slave stops responing, master server will create files in pg_wal directory.
If disk becomes full, master server crashes also.

How to avoid this ?

If disk is nearly full, master should stop additional files creation (and maybe stop or disable replication slot).
Postgres 12 in Debian 10 is used.

Andrus.



Re: How to prevent master server crash if hot standby stops

From
Laurenz Albe
Date:
On Mon, 2020-04-06 at 11:03 +0300, Andrus wrote:
> Streaming asynchronous binary replication is used with hot standby slave.
> 
> If slave stops responing, master server will create files in pg_wal directory.
> If disk becomes full, master server crashes also.
> 
> How to avoid this ?
> 
> If disk is nearly full, master should stop additional files creation (and maybe stop or disable replication slot).
> Postgres 12 in Debian 10 is used.

That's why you use monitoring.

Never use replication slots without monitoring replication (or at least the
disk space on the primary).

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: How to prevent master server crash if hot standby stops

From
Michael Paquier
Date:
On Mon, Apr 06, 2020 at 11:03:20AM +0300, Andrus wrote:
> If slave stops responing, master server will create files in pg_wal directory.
> If disk becomes full, master server crashes also.
>
> How to avoid this ?
>
> If disk is nearly full, master should stop additional files creation
> (and maybe stop or disable replication slot).
> Postgres 12 in Debian 10 is used.

When you use replication slots, it is very important to put in place a
monitoring solution to check if too much WAL is retained, and note
that there is nothing able to do that natively in core Postgres.
There are however multiple ways to solve this problem, like a
background worker (for the slot monitoring as well as optionally
killing and/or dropping), a simple cron job or even check_postgres.
--
Michael

Attachment

Re: How to prevent master server crash if hot standby stops

From
"Andrus"
Date:
Hi!

Thank you.

>That's why you use monitoring.
>Never use replication slots without monitoring replication (or at least the
>disk space on the primary).

How to implement this automatically, without human interaction required ?

"superuser_reserved_connections" setting exists.
How about implementing    "data_reserved_space" setting ?

How to create procedure in server or maybe cron scipt which stops replication if disk becomes nearly full ?

How to force to stop replication slot in master if it is in use ?

Andrus.



Re: How to prevent master server crash if hot standby stops

From
"Andrus"
Date:
Hi!

Thank you.

>When you use replication slots, it is very important to put in place a
>monitoring solution to check if too much WAL is retained, and note
>that there is nothing able to do that natively in core Postgres.
>There are however multiple ways to solve this problem, like a
>background worker (for the slot monitoring as well as optionally
>killing and/or dropping), a simple cron job or even check_postgres.

Where to find some sample how to implement this ?

I read from docs that slot cannot dropped if it is in use.
How to stop replication in this case.

Andrus.



Re: How to prevent master server crash if hot standby stops

From
Laurenz Albe
Date:
On Mon, 2020-04-06 at 11:47 +0300, Andrus wrote:
> > That's why you use monitoring.
> > Never use replication slots without monitoring replication (or at least the
> > disk space on the primary).
> 
> How to implement this automatically, without human interaction required ?
> 
> "superuser_reserved_connections" setting exists.
> How about implementing    "data_reserved_space" setting ?
> 
> How to create procedure in server or maybe cron scipt which stops replication if disk becomes nearly full ?
> 
> How to force to stop replication slot in master if it is in use ?

You don't.

You reserve enough disk space that WAL can grow for a while before
filling the disk.  Any you need to trigger human intervention with
monitoring.

If you prefer replication to fail silently, don't use replication
slots.  Use "wal_keep_segments" instead.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: How to prevent master server crash if hot standby stops

From
"Andrus"
Date:
Hi!

>If you prefer replication to fail silently, don't use replication
>slots.  Use "wal_keep_segments" instead.

I desided to give 1 GB to wal. So I added

wal_keep_segments=60

After some time Postgres created 80 files with total size 1.3GB. 

How to fix this so that no more than 1 GB of disk space is used ?
How to get information how may wal files are yet not processed by slave ?
How to delete processed wal files so that 1 GB of disk space can used for some other purposes ?

/var/lib/postgresql/12/main/pg_wal# ls
00000001000002A200000072  00000001000002A200000083  00000001000002A200000094  00000001000002A2000000A5
00000001000002A2000000B6
00000001000002A200000073  00000001000002A200000084  00000001000002A200000095  00000001000002A2000000A6
00000001000002A2000000B7
00000001000002A200000074  00000001000002A200000085  00000001000002A200000096  00000001000002A2000000A7
00000001000002A2000000B8
00000001000002A200000075  00000001000002A200000086  00000001000002A200000097  00000001000002A2000000A8
00000001000002A2000000B9
00000001000002A200000076  00000001000002A200000087  00000001000002A200000098  00000001000002A2000000A9
00000001000002A2000000BA
00000001000002A200000077  00000001000002A200000088  00000001000002A200000099  00000001000002A2000000AA
00000001000002A2000000BB
00000001000002A200000078  00000001000002A200000089  00000001000002A20000009A  00000001000002A2000000AB
00000001000002A2000000BC
00000001000002A200000079  00000001000002A20000008A  00000001000002A20000009B  00000001000002A2000000AC
00000001000002A2000000BD
00000001000002A20000007A  00000001000002A20000008B  00000001000002A20000009C  00000001000002A2000000AD
00000001000002A2000000BE
00000001000002A20000007B  00000001000002A20000008C  00000001000002A20000009D  00000001000002A2000000AE
00000001000002A2000000BF
00000001000002A20000007C  00000001000002A20000008D  00000001000002A20000009E  00000001000002A2000000AF
00000001000002A2000000C0
00000001000002A20000007D  00000001000002A20000008E  00000001000002A20000009F  00000001000002A2000000B0
00000001000002A2000000C1
00000001000002A20000007E  00000001000002A20000008F  00000001000002A2000000A0  00000001000002A2000000B1  archive_status
00000001000002A20000007F  00000001000002A200000090  00000001000002A2000000A1  00000001000002A2000000B2
00000001000002A200000080  00000001000002A200000091  00000001000002A2000000A2  00000001000002A2000000B3
00000001000002A200000081  00000001000002A200000092  00000001000002A2000000A3  00000001000002A2000000B4
00000001000002A200000082  00000001000002A200000093  00000001000002A2000000A4  00000001000002A2000000B5


Andrus.




Re: How to prevent master server crash if hot standby stops

From
Laurenz Albe
Date:
On Tue, 2020-04-07 at 00:50 +0300, Andrus wrote:
> > If you prefer replication to fail silently, don't use replication
> > slots.  Use "wal_keep_segments" instead.
> 
> I desided to give 1 GB to wal. So I added
> 
> wal_keep_segments=60
> 
> After some time Postgres created 80 files with total size 1.3GB. 
> 
> How to fix this so that no more than 1 GB of disk space is used ?
> How to get information how may wal files are yet not processed by slave ?
> How to delete processed wal files so that 1 GB of disk space can used for some other purposes ?

"wal_keep_segments" is the number of old WAL segments the server keeps around
for the standby.  But there are also some segments that are created for future
use (the minimum is governed by "min_wal_size").

All these limits are not hard limits, the server will try to keep them
more or less.  Consider that WAL segments are created as needed, but only
removed during checkpoints.

So, about your first question, you cannot.
Always make sure that there is more disk space available.

About your second question, you also cannot do that.
The primary server has no idea which standby server needs which WAL information.
All you can tell is where the currently connected standby servers are:

  SELECT pg_walfile_name(flush_lsn) FROM pg_stat_replication;

About your third question, you *never* manually mess with the files in pg_wal.
The server does that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: How to prevent master server crash if hot standby stops

From
"Andrus"
Date:
Hi!

>About your third question, you *never* manually mess with the files in pg_wal.
>The server does that.

Is it OK  to stop server, delete all files in pg_wal directory and re-start server ?

Or should default value put back and wait until server frees 1 GB disk space ?

Andrus.



Re: How to prevent master server crash if hot standby stops

From
Laurenz Albe
Date:
On Tue, 2020-04-07 at 12:05 +0300, Andrus wrote:
> > About your third question, you *never* manually mess with the files in pg_wal.
> > The server does that.
> 
> Is it OK  to stop server, delete all files in pg_wal directory and re-start server ?

No.

> Or should default value put back and wait until server frees 1 GB disk space ?

You'd have to set "wal_keep_segments" and "max_wal_size" lower, then have
normal database activity and wait until a couple of checkpoints have passed.

But frankly, 1 GB is ridiculously little.  If you have to worry about that
much disk space, you're must be running PostgreSQL on a toaster.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com