Re: base backup from the standby without pg_basebackup - Mailing list pgsql-admin

From Vladimir Borodin
Subject Re: base backup from the standby without pg_basebackup
Date
Msg-id A321E593-E53B-4807-9BA8-A87C217715A9@simply.name
Whole thread Raw
In response to Re: base backup from the standby without pg_basebackup  (Alexey Klyukin <alexk@hintbits.com>)
Responses Re: base backup from the standby without pg_basebackup
List pgsql-admin

2 апр. 2015 г., в 15:50, Alexey Klyukin <alexk@hintbits.com> написал(а):

Hi Vladimir,

On Thu, Apr 2, 2015 at 2:07 PM, Vladimir Borodin <root@simply.name> wrote:
Hi, Alexey.

The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:

2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""

We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?


If you still want to use your own solution,
you could look at how barman actually does it. It has an ability to take
backups from replics and uses pgespresso [1] extension for it.


Thank you, pgespresso wraps the start/stop backup functionality
designed for the streaming replication into the user-callable
functions (with a timeline hack for the replica).
While it's a good solution on its own, I'm wondering if the start/stop
backup on master, together with archiving WAL segments and copying
data from the replica should produce a valid base-backup (and the
replica produced from it) as well.

Well, I haven’t ever tried to do so, but I think the reason that replica starts applying WALs from too late location is that you do not copy backup label file from master after issuing pg_start_backup. Does your tool copy it from master?

According to doc [0]:

It's also worth noting that the pg_start_backup function makes a file named backup_label in the database cluster directory, which is removed by pg_stop_backup. This file will of course be archived as a part of your backup dump file. The backup label file includes the label string you gave to pg_start_backup, as well as the time at which pg_start_backup was run, and the name of the starting WAL file. In case of confusion it is therefore possible to look inside a backup dump file and determine exactly which backup session the dump file came from. However, this file is not merely for your information; its presence and contents are critical to the proper operation of the system's recovery process.



Intuitively, it looks like a delay between the master and the replica
might result in them having  different 'states' (say, atomic snapshots
of data/base files) of the database at the point P when the base
backup is started (say, master at state B, replica at earlier state
A), and since P is determined from the master, the changes to
transform the replica from state A to state B might not be included in
the sequence of WALs to replay on the new replica.

Alexey


--
May the force be with you…

pgsql-admin by date:

Previous
From: Alexey Klyukin
Date:
Subject: Re: base backup from the standby without pg_basebackup
Next
From: Alexey Klyukin
Date:
Subject: Re: base backup from the standby without pg_basebackup