Thread: BDR and Backup and Recovery

BDR and Backup and Recovery

From
Will McCormick
Date:
What viable options exist for Backup & Recovery in a BDR environment? From the reading I have done PITR recovery is not an option with BDR. It's important to preface this that I have almost no exposure to postgres backup and recovery. Is PITR not an option with BDR?

If a user fat fingers something and deletes records from a table without a where clause what is the correct course of action is to recover as much data as possible. What type of backup do I require to restore as much data as possible before the incident in a BDR environment.

Sorry for such an open ended question. :D I'm continuing to read as I solicit feedback.

Is there a document outlining recovery with BDR?

Re: BDR and Backup and Recovery

From
Jim Nasby
Date:
On 11/18/15 9:46 AM, Will McCormick wrote:
> What viable options exist for Backup & Recovery in a BDR environment?
>  From the reading I have done PITR recovery is not an option with BDR.
> It's important to preface this that I have almost no exposure to
> postgres backup and recovery. Is PITR not an option with BDR?
>
> If a user fat fingers something and deletes records from a table without
> a where clause what is the correct course of action is to recover as
> much data as possible. What type of backup do I require to restore as
> much data as possible before the incident in a BDR environment.
>
> Sorry for such an open ended question. :D I'm continuing to read as I
> solicit feedback.
>
> Is there a document outlining recovery with BDR?

I don't know why PITR wouldn't work with BDR, other than you can't use
binary backups across incompatible versions and BDR might be considered
incompatible with community Postgres. I would think it should still work
fine if you try to restore to a BDR server.

That said, remember that if you are not regularly (preferably
automatically) testing your backups by doing a restore and testing the
restore, then you don't have a backup. You have a hope and a prayer. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: BDR and Backup and Recovery

From
Will McCormick
Date:
Re-sending to group as well Jim :D

Regarding testing backups, Well said Jim. Thanks for taking the time to respond. I will test regularly whatever we decide to put in place. 

The below is from the 0.9.3 BDR documentation:

"Because logical replication is only supported in streaming mode (rather than WAL archiving) it isn't suitable for point-in-time recovery. Logical replication may be used in conjunction with streaming physical replication and/or PITR, though; it is not necessary to choose one or the other." 

Am I misinterpreting that BDR uses Logical Decoding and as such I cannot perform PITR?

On Wed, Nov 18, 2015 at 11:19 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 11/18/15 9:46 AM, Will McCormick wrote:
What viable options exist for Backup & Recovery in a BDR environment?
 From the reading I have done PITR recovery is not an option with BDR.
It's important to preface this that I have almost no exposure to
postgres backup and recovery. Is PITR not an option with BDR?

If a user fat fingers something and deletes records from a table without
a where clause what is the correct course of action is to recover as
much data as possible. What type of backup do I require to restore as
much data as possible before the incident in a BDR environment.

Sorry for such an open ended question. :D I'm continuing to read as I
solicit feedback.

Is there a document outlining recovery with BDR?

I don't know why PITR wouldn't work with BDR, other than you can't use binary backups across incompatible versions and BDR might be considered incompatible with community Postgres. I would think it should still work fine if you try to restore to a BDR server.

That said, remember that if you are not regularly (preferably automatically) testing your backups by doing a restore and testing the restore, then you don't have a backup. You have a hope and a prayer. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: BDR and Backup and Recovery

From
Adrian Klaver
Date:
On 11/18/2015 08:54 AM, Will McCormick wrote:
> Re-sending to group as well Jim :D
>
> Regarding testing backups, Well said Jim. Thanks for taking the time to
> respond. I will test regularly whatever we decide to put in place.
>
> The below is from the 0.9.3 BDR documentation:
>
> "Because logical replication is only supported in streaming mode (rather
> than WAL archiving) it isn't suitable for point-in-time recovery.
> Logical replication may be used in conjunction with streaming physical
> replication and/or PITR, though; it is not necessary to choose one or
> the other."
>
> Am I misinterpreting that BDR uses Logical Decoding and as such I cannot
> perform PITR?

As I read it as, you can not use the BDR stream to do PITR, if for no
other reason then that it can be a subset of a database or database
cluster. Further reason, it does not transfer WAL files that have the
entire picture of the database cluster. As the above says though, there
is nothing stopping you from doing WAL archiving/PITR in parallel to the
BDR stream.

>
> On Wed, Nov 18, 2015 at 11:19 AM, Jim Nasby <Jim.Nasby@bluetreble.com
> <mailto:Jim.Nasby@bluetreble.com>> wrote:
>
>     On 11/18/15 9:46 AM, Will McCormick wrote:
>
>         What viable options exist for Backup & Recovery in a BDR
>         environment?
>           From the reading I have done PITR recovery is not an option
>         with BDR.
>         It's important to preface this that I have almost no exposure to
>         postgres backup and recovery. Is PITR not an option with BDR?
>
>         If a user fat fingers something and deletes records from a table
>         without
>         a where clause what is the correct course of action is to recover as
>         much data as possible. What type of backup do I require to
>         restore as
>         much data as possible before the incident in a BDR environment.
>
>         Sorry for such an open ended question. :D I'm continuing to read
>         as I
>         solicit feedback.
>
>         Is there a document outlining recovery with BDR?
>
>
>     I don't know why PITR wouldn't work with BDR, other than you can't
>     use binary backups across incompatible versions and BDR might be
>     considered incompatible with community Postgres. I would think it
>     should still work fine if you try to restore to a BDR server.
>
>     That said, remember that if you are not regularly (preferably
>     automatically) testing your backups by doing a restore and testing
>     the restore, then you don't have a backup. You have a hope and a
>     prayer. :)
>     --
>     Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
>     Experts in Analytics, Data Architecture and PostgreSQL
>     Data in Trouble? Get it in Treble! http://BlueTreble.com
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: BDR and Backup and Recovery

From
Jim Nasby
Date:
On 11/18/15 10:53 AM, Will McCormick wrote:
> Regarding testing backups, Well said Jim. Thanks for taking the time to
> respond. I will test regularly whatever we decide to put in place.
>
> The below is from the 0.9.3 BDR documentation:
>
> "Because logical replication is only supported in streaming mode (rather
> than WAL archiving) it isn't suitable for point-in-time recovery.
> Logical replication may be used in conjunction with streaming physical
> replication and/or PITR, though; it is not necessary to choose one or
> the other."
>
> Am I misinterpreting that BDR uses Logical Decoding and as such I cannot
> perform PITR?

Please keep replies on-list, and don't top-post. :)

What that's saying is that you can't use logical decoding (which BDR
uses) as a backup mechanism. That doesn't mean you can't use PITR. The
only thing PITR really has in common with Logical Decoding is that they
both use WAL.

So my expectation is (I'm not a BDR expert) that you can backup a BDR
database just like any other.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: BDR and Backup and Recovery

From
Adrian Klaver
Date:
On 11/18/2015 09:31 AM, Will McCormick wrote:

Ccing list
> Thanks Adrian. I think I have it
>
> Lets say we have 2 nodes:
>
> Node A
> Node B
>
>
> GOOD
>
> Application Writes only occurring against Node A
>
> 1) Node A Base Backup taken
> 2) User Error occurs that replicates
>
> Can restore and Recover Node A to PITR before 2)
>
>
> BAD
>
> 1) Writes at Node A
> 2) Backups of Node A and Node B taken
> 3) Hardware Failure on Node A
> 4) Traffic now on Node B
> 5) Node B user error
> 6) Restore of Node B from 2) possible
>
> As logs not shipped from Node A to Node B, PITR would only have a
> partial view?
>
> Is this right?

Someone more versed in BDR than I will need to comment on the above.
Though it seems to me a possible solution would be to have a third
machine that has WAL file archive directories for each node. This could
get complicated though. First keeping the WAL files from each server
going to the correct directory. Second, determining which node in the
universe of nodes you want do PITR on.

>
> On Wed, Nov 18, 2015 at 12:12 PM, Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 11/18/2015 08:54 AM, Will McCormick wrote:
>
>         Re-sending to group as well Jim :D
>
>         Regarding testing backups, Well said Jim. Thanks for taking the
>         time to
>         respond. I will test regularly whatever we decide to put in place.
>
>         The below is from the 0.9.3 BDR documentation:
>
>         "Because logical replication is only supported in streaming mode
>         (rather
>         than WAL archiving) it isn't suitable for point-in-time recovery.
>         Logical replication may be used in conjunction with streaming
>         physical
>         replication and/or PITR, though; it is not necessary to choose
>         one or
>         the other."
>
>         Am I misinterpreting that BDR uses Logical Decoding and as such
>         I cannot
>         perform PITR?
>
>
>     As I read it as, you can not use the BDR stream to do PITR, if for
>     no other reason then that it can be a subset of a database or
>     database cluster. Further reason, it does not transfer WAL files
>     that have the entire picture of the database cluster. As the above
>     says though, there is nothing stopping you from doing WAL
>     archiving/PITR in parallel to the BDR stream.
>
>
>         On Wed, Nov 18, 2015 at 11:19 AM, Jim Nasby
>         <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>
>         <mailto:Jim.Nasby@bluetreble.com
>         <mailto:Jim.Nasby@bluetreble.com>>> wrote:
>
>              On 11/18/15 9:46 AM, Will McCormick wrote:
>
>                  What viable options exist for Backup & Recovery in a BDR
>                  environment?
>                    From the reading I have done PITR recovery is not an
>         option
>                  with BDR.
>                  It's important to preface this that I have almost no
>         exposure to
>                  postgres backup and recovery. Is PITR not an option
>         with BDR?
>
>                  If a user fat fingers something and deletes records
>         from a table
>                  without
>                  a where clause what is the correct course of action is
>         to recover as
>                  much data as possible. What type of backup do I require to
>                  restore as
>                  much data as possible before the incident in a BDR
>         environment.
>
>                  Sorry for such an open ended question. :D I'm
>         continuing to read
>                  as I
>                  solicit feedback.
>
>                  Is there a document outlining recovery with BDR?
>
>
>              I don't know why PITR wouldn't work with BDR, other than
>         you can't
>              use binary backups across incompatible versions and BDR
>         might be
>              considered incompatible with community Postgres. I would
>         think it
>              should still work fine if you try to restore to a BDR server.
>
>              That said, remember that if you are not regularly (preferably
>              automatically) testing your backups by doing a restore and
>         testing
>              the restore, then you don't have a backup. You have a hope
>         and a
>              prayer. :)
>              --
>              Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
>              Experts in Analytics, Data Architecture and PostgreSQL
>              Data in Trouble? Get it in Treble! http://BlueTreble.com
>
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: BDR and Backup and Recovery

From
Craig Ringer
Date:


On 18 November 2015 at 23:46, Will McCormick <wmccormick@gmail.com> wrote:
What viable options exist for Backup & Recovery in a BDR environment? From the reading I have done PITR recovery is not an option with BDR. It's important to preface this that I have almost no exposure to postgres backup and recovery. Is PITR not an option with BDR?

If a user fat fingers something and deletes records from a table without a where clause what is the correct course of action is to recover as much data as possible. What type of backup do I require to restore as much data as possible before the incident in a BDR environment.

Sorry for such an open ended question. :D I'm continuing to read as I solicit feedback.

Is there a document outlining recovery with BDR?

Not yet, and that's a signficant oversight.

There are really two options:

* Periodic dumps with pg_dump; or
* PITR, like with PgBarman

Both have associated challenges.

Importantly, you *cannot* run a physical streaming replica of a node and promote it to replace a failed node. I'll explain why below.


PG_DUMP
---

The simplest option is to take periodic dumps of one of the nodes. If you have to restore, you tear down the whole system, create a new standalone node, restore the dump, strip out the old bdr.bdr_nodes and bdr.bdr_connections data, then bring up a new cluster with the restored node as the first node.

0.10.0 will include a built-in function to strip all BDR state from a node and turn it back into a standalone Pg database, which will make this easier. For now, see https://github.com/2ndQuadrant/bdr/issues/127 for steps to de-BDR-ize a database.



PITR
---

Alternately you can use the usual physical WAL archiving and base backup method. 

If you have to restore to recover a node it cannot just rejoin the cluster. Its connections will be rejected because its timeline has changed. This is because the other peers might've replayed data to the old node that has been discarded, and will not be replayed again to the restored node. So it'd create a gap in history and as a result, divergence between nodes.

Instead you must re-clone the node from another still-alive node.

If the whole cluster is lost you you can do a PITR restore, strip all BDR state from the restored database, then bring it up as the first node in a new cluster, exactly as if you'd restored a dump.


Why can't you have local physical replicas for node failover?
---

It'd be nice to be able to have local streaming replicas for each node in a distributed setup. That way if you lose a node, instead of having to re-clone it over a possibly slow/expensive WAN link, you can just promote the standby.

This isn't currently possible.  The main reason is that PostgreSQL does not replay replication state (and in 9.5, replication identifier) state and replay it to standbys. The standby node that gets promoted has no idea what the replay position of the BDR peer nodes is or what position it had replayed to from its peers. It could replay data twice, or miss data, and the same could happen to its peers. Divergence would result.

To fix this we need PostgreSQL to replicate slot and replication identifier state to physical streaming replicas. It'd be usable for PITR too, that way. There's work afoot to make that possible in 9.6, but it's never going to be possible in 9.4-based BDR, so you can't use a streaming replica standby to replace a failed node without a whole-cluster rebuild.

(There are more complexities here, too, regarding async replicas, slots that get advanced past the point the replica has data for, etc. We need a way to delay advancing a peer's slot until we've confirmed the local streaming replica has committed.)




REPLICATION SETS
----

If you have replication sets where no single node has complete information, it's harder.

Neither pg_dump or WAL archiving for PITR can capture non-replicated tables that aren't on the local node. So if you have a complicated arrangement of replication sets you have to do some hoop-jumping with pg_dump and scripting to make sure you get a complete set of dumps of all your tables in different replication sets on different nodes. Recovery in this case consists of creating a new cluster, restoring the dump from one node to it, then configuring the replication sets on each other node and restoring the separately-dumped node-local tables to those node(s). The gentler DDL lock in 0.10.0 should make it possible to quiesce writes and force global consistency for long enough to acquire a snapshot on each node so you can get consistent dumps even with replication sets, but there's still a fair bit of manual work involved.

If you're using PITR the concept is similar. You PITR-restore one node, strip all BDR configuration from it to make it back into a standalone DB, then use it to set up a new BDR cluster with all new nodes. On the other nodes you have to restore them temporarily, dump the tables that aren't in the first node's replication sets, and restore them.

I'd really like to bring together a more complete picture here, but the development time currently available has to focus on robustness work and on progress toward 9.6. As always, contribution would be greatly valued, whether in terms of docs or code.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR and Backup and Recovery

From
Craig Ringer
Date:


On 19 November 2015 at 00:54, Will McCormick <wmccormick@gmail.com> wrote:
 
The below is from the 0.9.3 BDR documentation:

"Because logical replication is only supported in streaming mode (rather than WAL archiving) it isn't suitable for point-in-time recovery. Logical replication may be used in conjunction with streaming physical replication and/or PITR, though; it is not necessary to choose one or the other." 

Am I misinterpreting that BDR uses Logical Decoding and as such I cannot perform PITR?

The point is that you cannot use the logical decoding data stream for point-in-time recovery. Nothing stops you archiving WAL like normal from a node that's participating in logical replication as an upstream and/or downstream. You just can't use the logical replication data stream its self for that purpose. Sounds like I need to clarify that part of the docs.

Note the caveats in my prior mail re PITR and BDR, though; you can't just PITR-restore a replacement for a failed node and have it catch up and rejoin replication.

Regarding logical PITR: Theoretically we could actually save a base pg_dump and a change stream as logical changes from pg_recvlogical, then use that for transaction-level logical PITR. It's not impossible, but it'd require new tools and require changes to BDR/UDR to allow the stream to be applied. Nobody's written them yet. I don't have any plans to do this in the near to mid term.

It'd be an interesting project to build with pglogical. Its protocol is better suited to this than BDR's. You could do selective PITR of just a subset of tables you were interested in. If anyone's keen to tackle that, get in touch and I'll see if I can offer any help.


I don't know why PITR wouldn't work with BDR, other than you can't use binary backups across incompatible versions and BDR might be considered incompatible with community Postgres. I would think it should still work fine if you try to restore to a BDR server.

It does, with the caveat that it can't be a drop-in replacement for a failed node due to the timeline increment. The data is there, but it won't participate in replication. See the steps outlined in my prior mail for details.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services