Thread: base backup from the standby without pg_basebackup

base backup from the standby without pg_basebackup

From

Alexey Klyukin

Date:

02 April 2015, 10:13:36

Hello,

We had an issue with backing up one of our master servers, since a
base backup increased the load on the master, resulting in slowdowns
in the application working with that database. While there are ways to
overcome this problem by using nice on the backend process doing the
backup or throttling the network connection, we've decided to use a
less complicated route of producing a base backup from an almost idle
standby server.

We don't use pg_basebackup, but our own tool, which does the rsync +
compression and also symlinks some directories from pg_data (such as
pg_log or pg_xlog) to the locations on different partitions. So, the
question is, how would one make a base backup from the [async] standby
without using pg_basebackup. Our steps were the following:

- make sure WAL files from the master are archived and reachable on
the new replica host (via NFS).
- run pg_start_backup('label') on the master
- initialise a new cluster, rsync  files from the replica, create a
recovery.conf and so on (made by our tool)
- run pg_stop_backup() on the master.

The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:

2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""

We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?

Both master and replicas are running PostgreSQL 9.3.5.

Kind regards,
--
Alexey Klyukin

Re: base backup from the standby without pg_basebackup

From

Vladimir Borodin

Date:

02 April 2015, 12:07:48

Hi, Alexey.

2 апр. 2015 г., в 13:13, Alexey Klyukin <alexk@hintbits.com> написал(а):

Hello,

We had an issue with backing up one of our master servers, since a
base backup increased the load on the master, resulting in slowdowns
in the application working with that database. While there are ways to
overcome this problem by using nice on the backend process doing the
backup or throttling the network connection, we've decided to use a
less complicated route of producing a base backup from an almost idle
standby server.

We don't use pg_basebackup, but our own tool, which does the rsync +
compression and also symlinks some directories from pg_data (such as
pg_log or pg_xlog) to the locations on different partitions. So, the
question is, how would one make a base backup from the [async] standby
without using pg_basebackup. Our steps were the following:

- make sure WAL files from the master are archived and reachable on
the new replica host (via NFS).
- run pg_start_backup('label') on the master
- initialise a new cluster, rsync files from the replica, create a
recovery.conf and so on (made by our tool)
- run pg_stop_backup() on the master.

The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:

2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""

We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?

I would recommend using barman. If you still want to use your own solution, you could look at how barman actually does it. It has an ability to take backups from replics and uses pgespresso [1] extension for it.

[0] http://docs.pgbarman.org/#main_features

[1] https://github.com/2ndquadrant-it/pgespresso

Both master and replicas are running PostgreSQL 9.3.5.

Kind regards,
--
Alexey Klyukin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

May the force be with you…

https://simply.name

Re: base backup from the standby without pg_basebackup

From

Alexey Klyukin

Date:

02 April 2015, 12:51:17

Hi Vladimir,

On Thu, Apr 2, 2015 at 2:07 PM, Vladimir Borodin <root@simply.name> wrote:
> Hi, Alexey.
>
> The new replica did start and were restoring WAL files for a while,
> but eventually we came across the PANIC message:
>
> 2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
> 15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
> pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""
>
> We did check the disk on that system (and now rechecking the memory),
> but so far the hardware itself looks ok, which makes me wonder if the
> procedure above is flawed? What would be the proper way to produce a
> base backup from the standby without using pg_basebackup?
>
>
 If you still want to use your own solution,
> you could look at how barman actually does it. It has an ability to take
> backups from replics and uses pgespresso [1] extension for it.

Thank you, pgespresso wraps the start/stop backup functionality
designed for the streaming replication into the user-callable
functions (with a timeline hack for the replica).
While it's a good solution on its own, I'm wondering if the start/stop
backup on master, together with archiving WAL segments and copying
data from the replica should produce a valid base-backup (and the
replica produced from it) as well.

Intuitively, it looks like a delay between the master and the replica
might result in them having  different 'states' (say, atomic snapshots
of data/base files) of the database at the point P when the base
backup is started (say, master at state B, replica at earlier state
A), and since P is determined from the master, the changes to
transform the replica from state A to state B might not be included in
the sequence of WALs to replay on the new replica.

Alexey

Re: base backup from the standby without pg_basebackup

From

Vladimir Borodin

Date:

02 April 2015, 13:28:02

2 апр. 2015 г., в 15:50, Alexey Klyukin <alexk@hintbits.com> написал(а):

Hi Vladimir,

On Thu, Apr 2, 2015 at 2:07 PM, Vladimir Borodin <root@simply.name> wrote:
Hi, Alexey.

The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:

2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""

We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?

If you still want to use your own solution,
you could look at how barman actually does it. It has an ability to take
backups from replics and uses pgespresso [1] extension for it.

Thank you, pgespresso wraps the start/stop backup functionality
designed for the streaming replication into the user-callable
functions (with a timeline hack for the replica).
While it's a good solution on its own, I'm wondering if the start/stop
backup on master, together with archiving WAL segments and copying
data from the replica should produce a valid base-backup (and the
replica produced from it) as well.

Well, I haven’t ever tried to do so, but I think the reason that replica starts applying WALs from too late location is that you do not copy backup label file from master after issuing pg_start_backup. Does your tool copy it from master?

According to doc [0]:

It's also worth noting that the pg_start_backup function makes a file named backup_label in the database cluster directory, which is removed by pg_stop_backup. This file will of course be archived as a part of your backup dump file. The backup label file includes the label string you gave to pg_start_backup, as well as the time at which pg_start_backup was run, and the name of the starting WAL file. In case of confusion it is therefore possible to look inside a backup dump file and determine exactly which backup session the dump file came from. However, this file is not merely for your information; its presence and contents are critical to the proper operation of the system's recovery process.

[0] http://www.postgresql.org/docs/current/static/continuous-archiving.html

Intuitively, it looks like a delay between the master and the replica
might result in them having different 'states' (say, atomic snapshots
of data/base files) of the database at the point P when the base
backup is started (say, master at state B, replica at earlier state
A), and since P is determined from the master, the changes to
transform the replica from state A to state B might not be included in
the sequence of WALs to replay on the new replica.

Alexey

May the force be with you…

https://simply.name

Re: base backup from the standby without pg_basebackup

From

Alexey Klyukin

Date:

02 April 2015, 14:19:50

On Thu, Apr 2, 2015 at 3:27 PM, Vladimir Borodin <root@simply.name> wrote:
>
> 2 апр. 2015 г., в 15:50, Alexey Klyukin <alexk@hintbits.com> написал(а):
>
> Hi Vladimir,
>

> Well, I haven’t ever tried to do so, but I think the reason that replica
> starts applying WALs from too late location is that you do not copy backup
> label file from master after issuing pg_start_backup. Does your tool copy it
> from master?

Yes, it doesn't exclude it explicitly ,hence, it's archived together
with other files.

Of course, things are getting funny when the archive is taken from the
replica and the label is created on the master, but in fact the
recovery will try to auto detect the initial position even without a
label, so at the end it's likely that the lack of label file is not
the culprit.

--
Regards,
Alexey Klyukin

Re: base backup from the standby without pg_basebackup

From

Matheus de Oliveira

Date:

02 April 2015, 14:25:23

On Thu, Apr 2, 2015 at 7:13 AM, Alexey Klyukin <alexk@hintbits.com> wrote:

So, the
question is, how would one make a base backup from the [async] standby
without using pg_basebackup.

Check the following link, it provides the steps required to do exactly this:

https://wiki.postgresql.org/wiki/Incrementally_Updated_Backups

Best regards,

Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: base backup from the standby without pg_basebackup

From

Denish Patel

Date:

02 April 2015, 14:33:16

Alexey,

Instead of managing it manually, you should be able to take backup from slave using OmniPITR: https://github.com/omniti-labs/omnipitr/blob/master/doc/omnipitr-backup-slave.pod

On Thu, Apr 2, 2015 at 6:13 AM, Alexey Klyukin <alexk@hintbits.com> wrote:

Hello,

We had an issue with backing up one of our master servers, since a
base backup increased the load on the master, resulting in slowdowns
in the application working with that database. While there are ways to
overcome this problem by using nice on the backend process doing the
backup or throttling the network connection, we've decided to use a
less complicated route of producing a base backup from an almost idle
standby server.

We don't use pg_basebackup, but our own tool, which does the rsync +
compression and also symlinks some directories from pg_data (such as
pg_log or pg_xlog) to the locations on different partitions. So, the
question is, how would one make a base backup from the [async] standby
without using pg_basebackup. Our steps were the following:

- make sure WAL files from the master are archived and reachable on
the new replica host (via NFS).
- run pg_start_backup('label') on the master
- initialise a new cluster, rsync files from the replica, create a
recovery.conf and so on (made by our tool)
- run pg_stop_backup() on the master.

The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:

2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""

We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?

Both master and replicas are running PostgreSQL 9.3.5.

Kind regards,
--
Alexey Klyukin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Denish Patel,
OmniTI Computer Consulting Inc.
Database Architect,
http://omniti.com/does/data-management

http://www.pateldenish.com

Re: base backup from the standby without pg_basebackup

From

Alexey Klyukin

Date:

02 April 2015, 14:42:20

On Thu, Apr 2, 2015 at 4:24 PM, Matheus de Oliveira
<matioli.matheus@gmail.com> wrote:
>
> On Thu, Apr 2, 2015 at 7:13 AM, Alexey Klyukin <alexk@hintbits.com> wrote:
>>
>> So, the
>> question is, how would one make a base backup from the [async] standby
>> without using pg_basebackup.
>
>
> Check the following link, it provides the steps required to do exactly this:
>
> https://wiki.postgresql.org/wiki/Incrementally_Updated_Backups

Thanks, a useful link, and it actually proves my concerns:

3. Run pg_controldata on the standby and wait for "Latest checkpoint's
REDO location" to equal or exceed the WAL information reported by
pg_start_backup(). You might need to call pg_switch_xlog() on the
master to force the required WAL file to be sent to the standby.

So this step is designed to wait until the replica is up-to-date with
the master's state at the time of pg_start_backup.

--
Regards,
Alexey Klyukin

Re: base backup from the standby without pg_basebackup

From

Alexey Klyukin

Date:

02 April 2015, 15:01:37

Hi,

On Thu, Apr 2, 2015 at 4:32 PM, Denish Patel <denish@omniti.com> wrote:
> Alexey,
>
> Instead of managing it manually, you should be able to take backup from
> slave using OmniPITR:
> https://github.com/omniti-labs/omnipitr/blob/master/doc/omnipitr-backup-slave.pod

Thanks for the suggestion. We don't manage replica creation manually,
we have a tool that is just not designed to get base backups from the
standby, because we never needed it until recently.

Instead of adopting another tool I'd rather understand the process and
implement the necessary changes in the tool of our own.

So far, there are 2 solutions (assuming archiving from the master is
turned on and archives are available to the new replica):

- run pg_start_backup from the master, wait until the replica reaches
the same position as a master during the backup, copy the data files
from the replica, copy the backup label from the master, run
pg_stop_backup and start the new replica.
- rely on pgespresso extension and run
pgespresso_start_backup/pgespresso_stop_backup on the replica (and
grab a backup label in between), and do the rest of the process the
same way like it's done when taking base backups from the master.

And, just for completeness, a set of wonderful tools to help:

- pg_basebackup (once has to turn on full_page_writes on the master beforehand)
- barman
- omnipitr

Thank you,
Alexey.