Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-bugs

From Alexander Kukushkin
Subject Re: BUG #15346: Replica fails to start after the crash
Date
Msg-id CAFh8B=m0Bht-BfKmyzfxcivzjcqRd7BbNHeWthDveWwZ+DrV2A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15346: Replica fails to start after the crash  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
Hi Michael,

> That's annoying.  Because that means that the control file of your
> server maps to a consistent point which is older than some of the
> relation pages.  How was the base backup of this node created?  Please
> remember that when taking a base backup from a standby, you should
> backup the control file last, as there is no control of end backup with
> records available.  So it seems to me that the origin of your problem
> comes from an incorrect base backup expectation?

We are running the cluster of 3 nodes (m4.large + EBS volume for
PGDATA) on AWS. Replicas were initialized about a years ago with
pg_basebackup and working absolutely fine. In the past year I did a
few minor upgrades with switchover (first upgrade of the replicas,
switchover, and upgrade the former primary). The last switchover was
done on the August 19th. This instance was working as a replica for
about three days until the sudden crash of EC2 instance. On the new
instance we attached existing EBS volume with existing the PGDATA and
tried to start postgres. Consequences you can see in the very first
email.


> One idea I have would be to copy all the WAL segments up to the point
> where the pages to-be-updated are, and let Postgres replay all the local
> WALs first.  However it is hard to say if that would be enough, as you
> could have more references to pages even newer than the btree one you
> just found.

Well, I did some experiments, among them was the approach you suggest,
i.e. I commented out restore_command in the recovery.conf and copied
quite a few WAL segments to the pg_xlog. Results are the same. It
aborts as long as there are connections open :(


Regards,
--
Alexander Kukushkin


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #15346: Replica fails to start after the crash
Next
From: PG Bug reporting form
Date:
Subject: BUG #15355: for sonar integration