Thread: recovery_target_time and WAL fetch with streaming replication

recovery_target_time and WAL fetch with streaming replication

From
Hannes Erven
Date:
Hi,


what is Postgresql's strategy when to fetch WAL from the master while in 
streaming replication, and could it be tweaked?

I'm using a physical streaming replication slave to have a database 
lagging behind about one month behind the primary, by setting 
"recovery_target_time" to the desired point in time.
This setting is periodically advanced by a cronjob to allow the replica 
to roll forward. It's a 10.3-1 install on Debian.


It seems that as soon as the slave needs new WAL data to reach the 
requested target time, it will connect to the primary and fetch /all/ 
new WAL the primary has available for the slave's slot - up to "now", 
ignoring the recovery_target_time.

So in my setup, the slave will connect to the primary once per month, 
and download the whole next month's WAL data at once.

I do not care on which instance WAL is kept until needed, but I'd really 
prefer if the transfer (and disk space requirement!) would be more 
evenly distributed.


One option of course would be to use some transfer mechanism external to 
Postgresql... but so far I'm thinking there must be any easier way?



Thanks & best regards,

    -hannes


Re: recovery_target_time and WAL fetch with streaming replication

From
Michael Paquier
Date:
On Sun, May 13, 2018 at 01:39:48AM +0200, Hannes Erven wrote:
> what is Postgresql's strategy when to fetch WAL from the master while in
> streaming replication, and could it be tweaked?
>
> I'm using a physical streaming replication slave to have a database lagging
> behind about one month behind the primary, by setting "recovery_target_time"
> to the desired point in time.
> This setting is periodically advanced by a cronjob to allow the replica to
> roll forward. It's a 10.3-1 install on Debian.

I think that you are coplicating your life here.

Fetching WAL from a primary (or another standby) cannot be directly per
se.  By definition, any WAL present will e automatically fetched and
synced on the standby.  Now, it is not usually the moment WAL is fetched
and synced on a standby that matters, it is the moment it is replayed.
For example, what people usually want to be protected from is an
infortunate DROP TABLE on the primary run by an administrator to be
immediately replayed on the standby, losing the data.  Hence delaying
when WAL is replayed can offer some protection, and this can be achieved
by setting recovery_min_apply_delay in recovery.conf.  This will cause
WAL records replaying transactions commits to wait for the amount of
time specified by this parameter, giving you the time to recover from
any failures with a standby which has a controlled synced delta.

> One option of course would be to use some transfer mechanism external to
> Postgresql... but so far I'm thinking there must be any easier way?

Another option I can think of here is to use a specific restore_command
instead of streaming replication.  Simply archive a WAL segment on the
primary with some meta-data like the time it was archived, and then
allow the standby to recover the segment only after a delta has passed.
The can allow a more evenly distribution of segments.
--
Michael

Attachment

Re: recovery_target_time and WAL fetch with streaming replication

From
Hannes Erven
Date:
Michael,


Am 2018-05-13 um 08:23 schrieb Michael Paquier:
> On Sun, May 13, 2018 at 01:39:48AM +0200, Hannes Erven wrote:
>> what is Postgresql's strategy when to fetch WAL from the master while in
>> streaming replication, and could it be tweaked?
> 
> Fetching WAL from a primary (or another standby) cannot be directly per
> se.  By definition, any WAL present will e automatically fetched and
> synced on the standby.

I also thought so, but this is not the case (and that's my whole point).

When the standby finds sufficient local WAL in respect to its 
recovery_target_time, it seems it does not even attempt to connect to 
the primary.

But when new WAL is needed, the standby will fetch /all/ WAL present on 
the master.


I'd say, the standby should either:
- always connect to the primary and fetch any WAL present
- stop fetching/streaming WAL when it is not needed for the current 
recovery_target

So ideally there would be an option to ask for the desired behaviour?



> For example, what people usually want to be protected from is an
> infortunate DROP TABLE on the primary run by an administrator to be
> immediately replayed on the standby, losing the data.  Hence delaying
> when WAL is replayed can offer some protection, and this can be achieved
> by setting recovery_min_apply_delay in recovery.conf.  This will cause
> WAL records replaying transactions commits to wait for the amount of
> time specified by this parameter, giving you the time to recover from
> any failures with a standby which has a controlled synced delta.

recovery_min_apply_delay is an int of milliseconds, so the maximum value 
is approx. 25 days - which is not enough for my requirements.

Also, transaction/MVCC visibility will not cover all cases; most 
important, it cannot protected against TRUNCATE 
(https://wiki.postgresql.org/wiki/MVCC_violations).



>> One option of course would be to use some transfer mechanism external to
>> Postgresql... but so far I'm thinking there must be any easier way?
> 
> Another option I can think of here is to use a specific restore_command
> instead of streaming replication.  Simply archive a WAL segment on the
> primary with some meta-data like the time it was archived, and then
> allow the standby to recover the segment only after a delta has passed.

Yes, but thats far less simple than just setting restore_target_time .


Thanks again and best regards,

    -hannes



Re: recovery_target_time and WAL fetch with streaming replication

From
Michael Paquier
Date:
On Sun, May 13, 2018 at 09:42:42AM +0200, Hannes Erven wrote:
> But when new WAL is needed, the standby will fetch /all/ WAL present on the
> master.

Fetching as much WAL as possible when recovery happens is wanted by
design, so as it recovers as much as possible.  And that's documented.

> I'd say, the standby should either:
> - always connect to the primary and fetch any WAL present

This is what a hot standby does.  It keeps waiting for WAL to become
available whichever the source used (archive, local pg_xlog or stream)
and switches between one or the other.  You can look at
WaitForWALToBecomeAvailable to get an idea of the details.

> - stop fetching/streaming WAL when it is not needed for the current
> recovery_target

The startup process is in charge of recovery (WAL replay and definition
of from where to get the WAL available), and is the one which decides if
using streaming is necessary or not.  if streaming is used, then it
starts a WAL receiver.  If a switch from streaming to another WAL source
(local pg_xlog or archives is done), then it shuts down the WAL
receiver, consisting in sending SIGTERM to the WAL receiver and stopping
it immediately with a FATAL message (stops process immediately).  The
key point is that  WAL receiver is designed to be a light-weight
transport layer for WAL data.  In short, to be simple, it receives a set
of WAL bytes and writes them.  It does not include any logic to decode
WAL records, so it cannot know when a stop point happens or not.  It
also has no idea of the configuration within recovery.conf, which is
loaded by the startup process.

> Yes, but thats far less simple than just setting restore_target_time .

It seems to me that archiving provides the control you are looking for.
--
Michael

Attachment

Re: recovery_target_time and WAL fetch with streaming replication

From
Alexandre Arruda
Date:
Sorry to revive this post, but I have the same problem:

I set a streaming replication slave with this steps:
1) create a physical replication slot in master
2) pg_basebackup -S slot (...)
3) create a recovery.conf with primary_conninfo, primary_slot_name and
recovery_min_apply_delay = '4d'
4) start replica and wal_receiver is started and write the wal files
5) after 4 days, I have a delayed replica and ALL wal files to apply a
recovery from 4 days until now

The problem is: if I restart the master or the replica, the
wal_receiver will not start and the replica will not receive
the new WAL files because all files in the pg_wal match the
recovery_min_apply_delay criteria.
Master will retain the WAL files (because the inactive slot and last
LSN), growing the space to save this wals.

But the real problem is if the master die: I have a replica with only
WAL files until restart.

To solve, I need to receive the WAL by pg_receivewal and use a
traditional cp in recovery.conf to apply them.
But this is an extra work to monitor and cleanup this WALs.

There is a way to start walreceiver independent from
recovery_min_apply_delay set ?

Best regards,

Alexandre







Em dom, 13 de mai de 2018 às 08:45, Michael Paquier
<michael@paquier.xyz> escreveu:
>
> On Sun, May 13, 2018 at 09:42:42AM +0200, Hannes Erven wrote:
> > But when new WAL is needed, the standby will fetch /all/ WAL present on the
> > master.
>
> Fetching as much WAL as possible when recovery happens is wanted by
> design, so as it recovers as much as possible.  And that's documented.
>
> > I'd say, the standby should either:
> > - always connect to the primary and fetch any WAL present
>
> This is what a hot standby does.  It keeps waiting for WAL to become
> available whichever the source used (archive, local pg_xlog or stream)
> and switches between one or the other.  You can look at
> WaitForWALToBecomeAvailable to get an idea of the details.
>
> > - stop fetching/streaming WAL when it is not needed for the current
> > recovery_target
>
> The startup process is in charge of recovery (WAL replay and definition
> of from where to get the WAL available), and is the one which decides if
> using streaming is necessary or not.  if streaming is used, then it
> starts a WAL receiver.  If a switch from streaming to another WAL source
> (local pg_xlog or archives is done), then it shuts down the WAL
> receiver, consisting in sending SIGTERM to the WAL receiver and stopping
> it immediately with a FATAL message (stops process immediately).  The
> key point is that  WAL receiver is designed to be a light-weight
> transport layer for WAL data.  In short, to be simple, it receives a set
> of WAL bytes and writes them.  It does not include any logic to decode
> WAL records, so it cannot know when a stop point happens or not.  It
> also has no idea of the configuration within recovery.conf, which is
> loaded by the startup process.
>
> > Yes, but thats far less simple than just setting restore_target_time .
>
> It seems to me that archiving provides the control you are looking for.
> --
> Michael