At Tue, 12 May 2020 17:29:50 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in
> On Tue, 2020-05-12 at 09:40 +0000, Mariya Rampurawala wrote:
> > > but if the target cluster ran for a long time after the divergence,
> > > the old WAL files might no longer be present. In that case, they can
> > > be manually copied from the WAL archive to the pg_wal directory, or
> > > fetched on startup by configuring primary_conninfo or restore_command.
> >
> > I hit this issue every time I follow the aforementioned steps, manually as well as with scripts.
> > How long is "long time after divergence"? Is there a way I can make some
> > configuration changes so that I don’t hit this issue?
> > Is there anything I must change in my restore command?
As mentioned in the documentation, pg_rewind uses the WAL records
startng from the last checkpoint just before the divergence point. The
divergence point is shown as the follows in your log messages.
> pg_rewind: servers diverged at WAL location 6/B9FFFFD8 on timeline 53
pg_rewind scans backward starting from that location to find a
checkpoint record, which is the oldest WAL record pg_rewind needs. As
you see it is not a matter of calculation. There's no other way than
actually restoring WAL segment files to read and try finding.
> What you can do is to use a higher value for "wal_keep_segments".
> Then PostgreSQL will keep around that number of old WAL segments,
> which increases the chance for "pg_rewind" to succeed.
So this is one effective way to reduce the chance to lose required WAL
(segment) files.
On PG12, an easy way to automatically restore all required WAL files
would be restoring the WAL file every time pg_rewind complains that it
is missing.
Or, you could use pg_waldump to find a checkpoint record.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center