Home > mailing lists

Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

From	Alexander Kukushkin
Subject	Re: BUG #15346: Replica fails to start after the crash
Date	August 28, 2018 12:21:57
Msg-id	CAFh8B=m0Bht-BfKmyzfxcivzjcqRd7BbNHeWthDveWwZ+DrV2A@mail.gmail.com Whole thread Raw
In response to	Re: BUG #15346: Replica fails to start after the crash (Michael Paquier <michael@paquier.xyz>)
List	pgsql-bugs

Tree view

Hi Michael,

> That's annoying.  Because that means that the control file of your
> server maps to a consistent point which is older than some of the
> relation pages.  How was the base backup of this node created?  Please
> remember that when taking a base backup from a standby, you should
> backup the control file last, as there is no control of end backup with
> records available.  So it seems to me that the origin of your problem
> comes from an incorrect base backup expectation?

We are running the cluster of 3 nodes (m4.large + EBS volume for
PGDATA) on AWS. Replicas were initialized about a years ago with
pg_basebackup and working absolutely fine. In the past year I did a
few minor upgrades with switchover (first upgrade of the replicas,
switchover, and upgrade the former primary). The last switchover was
done on the August 19th. This instance was working as a replica for
about three days until the sudden crash of EC2 instance. On the new
instance we attached existing EBS volume with existing the PGDATA and
tried to start postgres. Consequences you can see in the very first
email.


> One idea I have would be to copy all the WAL segments up to the point
> where the pages to-be-updated are, and let Postgres replay all the local
> WALs first.  However it is hard to say if that would be enough, as you
> could have more references to pages even newer than the btree one you
> just found.

Well, I did some experiments, among them was the approach you suggest,
i.e. I commented out restore_command in the recovery.conf and copied
quite a few WAL segments to the pg_xlog. Results are the same. It
aborts as long as there are connections open :(


Regards,
--
Alexander Kukushkin

pgsql-bugs by date:

From: Andres Freund
Date: 28 August 2018, 11:08:33
Subject: Re: BUG #15346: Replica fails to start after the crash

From: PG Bug reporting form
Date: 28 August 2018, 12:53:43
Subject: BUG #15355: for sonar integration

Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

Previous

Next