Thread: base backup from the standby without pg_basebackup
Hello, We had an issue with backing up one of our master servers, since a base backup increased the load on the master, resulting in slowdowns in the application working with that database. While there are ways to overcome this problem by using nice on the backend process doing the backup or throttling the network connection, we've decided to use a less complicated route of producing a base backup from an almost idle standby server. We don't use pg_basebackup, but our own tool, which does the rsync + compression and also symlinks some directories from pg_data (such as pg_log or pg_xlog) to the locations on different partitions. So, the question is, how would one make a base backup from the [async] standby without using pg_basebackup. Our steps were the following: - make sure WAL files from the master are archived and reachable on the new replica host (via NFS). - run pg_start_backup('label') on the master - initialise a new cluster, rsync files from the replica, create a recovery.conf and so on (made by our tool) - run pg_stop_backup() on the master. The new replica did start and were restoring WAL files for a while, but eventually we came across the PANIC message: 2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17 15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,"" We did check the disk on that system (and now rechecking the memory), but so far the hardware itself looks ok, which makes me wonder if the procedure above is flawed? What would be the proper way to produce a base backup from the standby without using pg_basebackup? Both master and replicas are running PostgreSQL 9.3.5. Kind regards, -- Alexey Klyukin
Hi, Alexey.
2 апр. 2015 г., в 13:13, Alexey Klyukin <alexk@hintbits.com> написал(а):Hello,
We had an issue with backing up one of our master servers, since a
base backup increased the load on the master, resulting in slowdowns
in the application working with that database. While there are ways to
overcome this problem by using nice on the backend process doing the
backup or throttling the network connection, we've decided to use a
less complicated route of producing a base backup from an almost idle
standby server.
We don't use pg_basebackup, but our own tool, which does the rsync +
compression and also symlinks some directories from pg_data (such as
pg_log or pg_xlog) to the locations on different partitions. So, the
question is, how would one make a base backup from the [async] standby
without using pg_basebackup. Our steps were the following:
- make sure WAL files from the master are archived and reachable on
the new replica host (via NFS).
- run pg_start_backup('label') on the master
- initialise a new cluster, rsync files from the replica, create a
recovery.conf and so on (made by our tool)
- run pg_stop_backup() on the master.
The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:
2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""
We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?
I would recommend using barman. If you still want to use your own solution, you could look at how barman actually does it. It has an ability to take backups from replics and uses pgespresso [1] extension for it.
Both master and replicas are running PostgreSQL 9.3.5.
Kind regards,
--
Alexey Klyukin
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Hi Vladimir, On Thu, Apr 2, 2015 at 2:07 PM, Vladimir Borodin <root@simply.name> wrote: > Hi, Alexey. > > The new replica did start and were restoring WAL files for a while, > but eventually we came across the PANIC message: > > 2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17 > 15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid > pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,"" > > We did check the disk on that system (and now rechecking the memory), > but so far the hardware itself looks ok, which makes me wonder if the > procedure above is flawed? What would be the proper way to produce a > base backup from the standby without using pg_basebackup? > > If you still want to use your own solution, > you could look at how barman actually does it. It has an ability to take > backups from replics and uses pgespresso [1] extension for it. Thank you, pgespresso wraps the start/stop backup functionality designed for the streaming replication into the user-callable functions (with a timeline hack for the replica). While it's a good solution on its own, I'm wondering if the start/stop backup on master, together with archiving WAL segments and copying data from the replica should produce a valid base-backup (and the replica produced from it) as well. Intuitively, it looks like a delay between the master and the replica might result in them having different 'states' (say, atomic snapshots of data/base files) of the database at the point P when the base backup is started (say, master at state B, replica at earlier state A), and since P is determined from the master, the changes to transform the replica from state A to state B might not be included in the sequence of WALs to replay on the new replica. Alexey
2 апр. 2015 г., в 15:50, Alexey Klyukin <alexk@hintbits.com> написал(а):Hi Vladimir,
On Thu, Apr 2, 2015 at 2:07 PM, Vladimir Borodin <root@simply.name> wrote:Hi, Alexey.If you still want to use your own solution,
The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:
2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""
We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?you could look at how barman actually does it. It has an ability to take
backups from replics and uses pgespresso [1] extension for it.
Thank you, pgespresso wraps the start/stop backup functionality
designed for the streaming replication into the user-callable
functions (with a timeline hack for the replica).
While it's a good solution on its own, I'm wondering if the start/stop
backup on master, together with archiving WAL segments and copying
data from the replica should produce a valid base-backup (and the
replica produced from it) as well.
Well, I haven’t ever tried to do so, but I think the reason that replica starts applying WALs from too late location is that you do not copy backup label file from master after issuing pg_start_backup. Does your tool copy it from master?
According to doc [0]:
It's also worth noting that the
pg_start_backup
function makes a file named backup_label in the database cluster directory, which is removed by pg_stop_backup
. This file will of course be archived as a part of your backup dump file. The backup label file includes the label string you gave to pg_start_backup
, as well as the time at which pg_start_backup
was run, and the name of the starting WAL file. In case of confusion it is therefore possible to look inside a backup dump file and determine exactly which backup session the dump file came from. However, this file is not merely for your information; its presence and contents are critical to the proper operation of the system's recovery process.
Intuitively, it looks like a delay between the master and the replica
might result in them having different 'states' (say, atomic snapshots
of data/base files) of the database at the point P when the base
backup is started (say, master at state B, replica at earlier state
A), and since P is determined from the master, the changes to
transform the replica from state A to state B might not be included in
the sequence of WALs to replay on the new replica.
Alexey
On Thu, Apr 2, 2015 at 3:27 PM, Vladimir Borodin <root@simply.name> wrote: > > 2 апр. 2015 г., в 15:50, Alexey Klyukin <alexk@hintbits.com> написал(а): > > Hi Vladimir, > > Well, I haven’t ever tried to do so, but I think the reason that replica > starts applying WALs from too late location is that you do not copy backup > label file from master after issuing pg_start_backup. Does your tool copy it > from master? Yes, it doesn't exclude it explicitly ,hence, it's archived together with other files. Of course, things are getting funny when the archive is taken from the replica and the label is created on the master, but in fact the recovery will try to auto detect the initial position even without a label, so at the end it's likely that the lack of label file is not the culprit. -- Regards, Alexey Klyukin
On Thu, Apr 2, 2015 at 7:13 AM, Alexey Klyukin <alexk@hintbits.com> wrote:
So, the
question is, how would one make a base backup from the [async] standby
without using pg_basebackup.
Check the following link, it provides the steps required to do exactly this:
https://wiki.postgresql.org/wiki/Incrementally_Updated_Backups
https://wiki.postgresql.org/wiki/Incrementally_Updated_Backups
Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
Alexey,
--
Instead of managing it manually, you should be able to take backup from slave using OmniPITR: https://github.com/omniti-labs/omnipitr/blob/master/doc/omnipitr-backup-slave.pod
On Thu, Apr 2, 2015 at 6:13 AM, Alexey Klyukin <alexk@hintbits.com> wrote:
Hello,
We had an issue with backing up one of our master servers, since a
base backup increased the load on the master, resulting in slowdowns
in the application working with that database. While there are ways to
overcome this problem by using nice on the backend process doing the
backup or throttling the network connection, we've decided to use a
less complicated route of producing a base backup from an almost idle
standby server.
We don't use pg_basebackup, but our own tool, which does the rsync +
compression and also symlinks some directories from pg_data (such as
pg_log or pg_xlog) to the locations on different partitions. So, the
question is, how would one make a base backup from the [async] standby
without using pg_basebackup. Our steps were the following:
- make sure WAL files from the master are archived and reachable on
the new replica host (via NFS).
- run pg_start_backup('label') on the master
- initialise a new cluster, rsync files from the replica, create a
recovery.conf and so on (made by our tool)
- run pg_stop_backup() on the master.
The new replica did start and were restoring WAL files for a while,
but eventually we came across the PANIC message:
2015-03-18 19:10:52.943 CET,,,17293,,55083494.438d,922,,2015-03-17
15:05:08 CET,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"xlog redo visible: rel 1663/16414/24453; blk 26569",,,,""
We did check the disk on that system (and now rechecking the memory),
but so far the hardware itself looks ok, which makes me wonder if the
procedure above is flawed? What would be the proper way to produce a
base backup from the standby without using pg_basebackup?
Both master and replicas are running PostgreSQL 9.3.5.
Kind regards,
--
Alexey Klyukin
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Denish Patel,
OmniTI Computer Consulting Inc.
Database Architect,
http://omniti.com/does/data-management
OmniTI Computer Consulting Inc.
Database Architect,
http://omniti.com/does/data-management
On Thu, Apr 2, 2015 at 4:24 PM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote: > > On Thu, Apr 2, 2015 at 7:13 AM, Alexey Klyukin <alexk@hintbits.com> wrote: >> >> So, the >> question is, how would one make a base backup from the [async] standby >> without using pg_basebackup. > > > Check the following link, it provides the steps required to do exactly this: > > https://wiki.postgresql.org/wiki/Incrementally_Updated_Backups Thanks, a useful link, and it actually proves my concerns: 3. Run pg_controldata on the standby and wait for "Latest checkpoint's REDO location" to equal or exceed the WAL information reported by pg_start_backup(). You might need to call pg_switch_xlog() on the master to force the required WAL file to be sent to the standby. So this step is designed to wait until the replica is up-to-date with the master's state at the time of pg_start_backup. -- Regards, Alexey Klyukin
Hi, On Thu, Apr 2, 2015 at 4:32 PM, Denish Patel <denish@omniti.com> wrote: > Alexey, > > Instead of managing it manually, you should be able to take backup from > slave using OmniPITR: > https://github.com/omniti-labs/omnipitr/blob/master/doc/omnipitr-backup-slave.pod Thanks for the suggestion. We don't manage replica creation manually, we have a tool that is just not designed to get base backups from the standby, because we never needed it until recently. Instead of adopting another tool I'd rather understand the process and implement the necessary changes in the tool of our own. So far, there are 2 solutions (assuming archiving from the master is turned on and archives are available to the new replica): - run pg_start_backup from the master, wait until the replica reaches the same position as a master during the backup, copy the data files from the replica, copy the backup label from the master, run pg_stop_backup and start the new replica. - rely on pgespresso extension and run pgespresso_start_backup/pgespresso_stop_backup on the replica (and grab a backup label in between), and do the rest of the process the same way like it's done when taking base backups from the master. And, just for completeness, a set of wonderful tools to help: - pg_basebackup (once has to turn on full_page_writes on the master beforehand) - barman - omnipitr Thank you, Alexey.