On Wed, Feb 4, 2015 at 09:23:56AM +0100, Alexey Klyukin wrote:
> On Tue, Feb 3, 2015 at 10:41 PM, Alexey Klyukin <alexk@hintbits.com> wrote:
> > The path to reproduce the problem is quite simple:
> >
> > - setup the master and the replica setup, and promote the replica,
> > creating a new timeline (timeline 2 by default)
> > - pg_upgrade the promoted replica. This would create a new cluster
> > with the same timeline, but without the history file.
> > - create the streaming-only replica. The streaming replication will
> > never start, since the timeline file is not there. The error messages
> > are:
> >
> > ERROR: could not open file "pg_xlog/00000002.history": No such file
> > or directory
> > FATAL: could not receive timeline history file from the primary
> > server: ERROR: could not open file "pg_xlog/00000002.history": No
> > such file or directory
> >
> > Note that this problem does not occur for the streaming-only replica
> > of the newly created cluster (with a timeline 1), even when there is
> > no timeline history file in the original datadir.
>
>
> In our case I solved the problem by manually creating a 00000002.history file
> on master. It consists of the previous timeline (obviously 1), the location when
> the new timeline has been created (lost) and the text reason, which can be
> arbitrary. As for the location, I don't think it matters much, since
> we don't have
> branched timelines and not going to switch to timeline 1 again, I've
> put the value
> of the 'Prior checkpoint location' from pg_controldata on the replica,
> but I think
> I could just put all 0 with the same effect.
>
> Nevertheless, the question is should pg_upgrade move the timeline history file
> from the old server to the new one, and, to be more specific, are
> there any cases when
> moving such file is not recommended?
Sorry I am just getting to this --- you are right it is a bug. I was
unaware that pg_upgrade passed the timeline from the old cluster, but it
does so when setting the WAL starting address with pg_resetxlog -l,
which includes the timeline as the first eight hex digits.
I think there are two options:
1) force pg_resetxlog -l to pass timeline 1
2) create a WAL history file to match the old cluster's timeline (!= 1)
I don't think we want to be doing #2, so I have developed the attached
patch for #1, which I should backpatch to 9.4. Do we store the timeline
in any user tables that might be transfered? Anywhere else?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +