Thread: Trigger file behavior with the standby

Trigger file behavior with the standby

From
Keiko Oda
Date:
Hello,

I'm seeing the following behavior with a trigger file which is very confusing to me, I'd like to get some advice of what is the expected behavior of the trigger file with the standby.
(I'm cross-posting this from pgsql-general as I didn't get the response there.)

1. setup the replication, with the standby having the following recovery.conf

  # we use wal-e
  restore_command = 'wal-e wal-fetch  "%f" "%p"'
  standby_mode = 'true'
  trigger_file = '/my/path/to/trigger-file/STANDBY_OFF'
  recovery_target_timeline = 'latest'
  primary_conninfo = 'host=myhost port=5432 user=foo password=verysecurepassword'

2. create a trigger file while standby is having a "lag" (and replication is not streaming, but file-based log-shipping at this point)
3. looks like Postgres doesn't recognize a trigger file at all, standby keeps replaying/recovering WALs
  * tried to see if Postgres is doing anything with DEBUG5 log, but it doesn't say anything about a trigger file
  * also tried to restart Postgres, sending SIGUSR1, etc. to see if it helps but it just keeps replaying WALs
4. once the standby "caught up" with the leader (replayed all WALs and about to switch to the streaming replication and/or switch to the streaming replication), Postgres finally realize that there is a trigger file, and do the failover


> To trigger failover of a log-shipping standby server, run pg_ctl promote or create a trigger file with the file name and path specified by the trigger_file setting in recovery.conf.

So, I'd expect that the standby will trigger a failover as soon as we create a trigger file at step 2. However, the failover doesn't happen until step 3 above, and between step 2 and step 3 can take many hours sometimes.

I've reproduced this with Postgres 9.4 and 9.5. I kinda gave up to reproduce with 10 for now (not that I wasn't able to, more like prep takes time and I'm postponing), but happy to try it if that helps.
Please let me know if there is any other information I could provide. 

Thanks!
Keiko

Re: Trigger file behavior with the standby

From
Michael Paquier
Date:
On Mon, Mar 19, 2018 at 01:27:21PM -0700, Keiko Oda wrote:
> I'm seeing the following behavior with a trigger file which is very
> confusing to me, I'd like to get some advice of what is the expected
> behavior of the trigger file with the standby.

This portion from the docs includes your answer:
https://www.postgresql.org/docs/devel/static/warm-standby.html#STANDBY-SERVER-OPERATION
"Standby mode is exited and the server switches to normal operation when
pg_ctl promote is run or a trigger file is found (trigger_file). Before
failover, any WAL immediately available in the archive or in pg_wal will
be restored, but no attempt is made to connect to the master.

So when creating a trigger file or signaling for promotion, any WAL
files available are first fetched, and then promotion happens.  In your
case all the WAL segments from the archives are retrieved first.
--
Michael

Attachment

Re: Trigger file behavior with the standby

From
Keiko Oda
Date:
Thanks a lot for the answer, Michael (and sorry for the slow response)!

So, if I understand what you're saying correctly, I'm seeing this behavior because wal-e keeps fetching wal files from s3 regardless of this trigger_file, and these fetched wal files are in pg_wal (or pg_xlog), therefore Postgres just tries to restore whatever available in pg_wal before the failover. Or, even if there is no file in pg_wal, it still tries to fetch from the "archive" (s3).
In other words, if I would like to do "immediate failover" (and do not care about WAL files available in archive or in pg_wal), I should be tweaking restore_command so that no further fetching/restoring happens.
Is it... accurate?

Thanks,
Keiko

On Mon, Mar 19, 2018 at 9:28 PM, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Mar 19, 2018 at 01:27:21PM -0700, Keiko Oda wrote:
> I'm seeing the following behavior with a trigger file which is very
> confusing to me, I'd like to get some advice of what is the expected
> behavior of the trigger file with the standby.

This portion from the docs includes your answer:
https://www.postgresql.org/docs/devel/static/warm-standby.html#STANDBY-SERVER-OPERATION
"Standby mode is exited and the server switches to normal operation when
pg_ctl promote is run or a trigger file is found (trigger_file). Before
failover, any WAL immediately available in the archive or in pg_wal will
be restored, but no attempt is made to connect to the master.

So when creating a trigger file or signaling for promotion, any WAL
files available are first fetched, and then promotion happens.  In your
case all the WAL segments from the archives are retrieved first.
--
Michael

Re: Trigger file behavior with the standby

From
Michael Paquier
Date:
On Wed, Mar 28, 2018 at 12:23:31PM -0700, Keiko Oda wrote:
> Thanks a lot for the answer, Michael (and sorry for the slow response)!

No problem.

> So, if I understand what you're saying correctly, I'm seeing this behavior
> because wal-e keeps fetching wal files from s3 regardless of this
> trigger_file, and these fetched wal files are in pg_wal (or pg_xlog),
> therefore Postgres just tries to restore whatever available in pg_wal
> before the failover. Or, even if there is no file in pg_wal, it still tries
> to fetch from the "archive" (s3).
> In other words, if I would like to do "immediate failover" (and do not care
> about WAL files available in archive or in pg_wal), I should be tweaking
> restore_command so that no further fetching/restoring happens.
> Is it... accurate?

Per the code and the documentation, the current behavior is clearly
intentional.  If you think about it, it can be relatively important
especially in the case of a base backup taken without WAL segments in
pg_wal while relying on a separate archive: this gives more guarantees
that the consistent point will be reached.  That also covers a bit what
people can look for in some cases with recovery_target = 'immediate'.

You could indeed tweak the restore command.  If a failure happens while
attempting to fetch a WAL segment, then the recovery would immediately
stop.  If you try to trigger a promotion without reaching a consistency
point, then you would get a complain from the startup process.  There
are some safeguards for this purpose.

Please don't take me wrong.  There is room for a feature which does more
efficiently what you are looking for, but that would be a separate
problem.

(Sakura season here by the way, they are blooming this week)
--
Michael

Attachment