Home > mailing lists

Re: Critical failure of standby - Mailing list pgsql-general

From	Jeff Janes
Subject	Re: Critical failure of standby
Date	August 15, 2016 16:09:54
Msg-id	CAMkU=1w388oBdz4PATSzCuzaND3C0JLPw2qv1pQjtPN729FnVw@mail.gmail.com Whole thread
In response to	Critical failure of standby (James Sewell <james.sewell@jirotech.com>)
Responses	Re: Critical failure of standby
List	pgsql-general

Tree view

On Thu, Aug 11, 2016 at 10:39 PM, James Sewell <james.sewell@jirotech.com> wrote:

Hello,

We recently experienced a critical failure when failing to a DR environment.

This is in the following environment:

3 x PostgreSQL machines in Prod in a sync replication cluster
3 x PostgreSQL machines in DR, with a single machine async and the other two cascading from the first machine.
There was network failure which isolated Production from everything else, Production has no errors during this time (and has now come back OK).

DR did not tolerate the break, the following appeared in the logs and none of them can start postgres. There were no queries coming into DR at the time of the break.

Please note that the "Host Key verification failed" messages are due to the scp command not functioning. This means restore_command is not working to restore from the XLOG archive, but should not effect anything else.

In my experience, PostgreSQL issues its own error messages when restore_command fails. So I see both the error from the command itself, and an error from PostgreSQL. Why don't you see that? Is the restore_command failing, but then reporting that it succeeded?

And if you can't get files from the XLOG archive, why do you think that that is OK?

Cheers,

Jeff

pgsql-general by date:

From: Adrian Klaver
Date: 15 August 2016, 15:45:27
Subject: Re: 9.2 to 9.5 pg_upgrade losing data

From: Jeff Janes
Date: 15 August 2016, 17:21:02
Subject: Re: RowExclusiveLock timeout while autovacuum

Re: Critical failure of standby - Mailing list pgsql-general

Previous

Next