Hi Vladimir,
On Thu, Apr 2, 2015 at 2:07 PM, Vladimir Borodin <root@simply.name> wrote:
> Hi, Alexey.
>
> The new replica did start and were restoring WAL files for a while,
> but eventually we came across the PANIC message:
>
> 2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
> 15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
> pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""
>
> We did check the disk on that system (and now rechecking the memory),
> but so far the hardware itself looks ok, which makes me wonder if the
> procedure above is flawed? What would be the proper way to produce a
> base backup from the standby without using pg_basebackup?
>
>
If you still want to use your own solution,
> you could look at how barman actually does it. It has an ability to take
> backups from replics and uses pgespresso [1] extension for it.
Thank you, pgespresso wraps the start/stop backup functionality
designed for the streaming replication into the user-callable
functions (with a timeline hack for the replica).
While it's a good solution on its own, I'm wondering if the start/stop
backup on master, together with archiving WAL segments and copying
data from the replica should produce a valid base-backup (and the
replica produced from it) as well.
Intuitively, it looks like a delay between the master and the replica
might result in them having different 'states' (say, atomic snapshots
of data/base files) of the database at the point P when the base
backup is started (say, master at state B, replica at earlier state
A), and since P is determined from the master, the changes to
transform the replica from state A to state B might not be included in
the sequence of WALs to replay on the new replica.
Alexey