Re: BDR global sequences in two machine failover - Mailing list pgsql-general

From Giovanni Maruzzelli
Subject Re: BDR global sequences in two machine failover
Date
Msg-id CALXCt0oq7O4vzg_GDo=_F_Ex-7WEQhRkRSMg6cddERL5BQ0kbQ@mail.gmail.com
Whole thread Raw
In response to Re: BDR global sequences in two machine failover  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: BDR global sequences in two machine failover
List pgsql-general


On Sep 7, 2015 5:05 AM, "Craig Ringer" <craig@2ndquadrant.com> wrote:
>
> On 7 September 2015 at 00:18, Giovanni Maruzzelli <gmaruzz@gmail.com> wrote:
> > Hello,
> >
> > Typical HA situation.
> >
> > I have master-master, two only machines, one active and one passive
> > (standby) with floating IP.
> > I write to only one machine at time, the one with the floating IP.
>
> This is a deployment that is better suited to the typical approach
> with an active node, a standby streaming replica, and failover. Tools
> like repmgr help with this.
>

Craig, thanks a lot for your answer!

My use case is keeping internal state of some load balanced servers that need to act as one (eg cluster of voip servers).

Last update wins is ok.

If I do not use global sequences, and I use uuid as primary keys, would BDR be a correct choice?

BDR is appealing not only because of new toy coolness, but also because of possible geodistribution and the seemingly sheer simplicity of installation/management.

Btw, congratulation for the feat!

-giovanni




> > When one machine is down I can no more refill sequence allocated chunk (eg:
> > next pool of values)...
>
> Global sequence allocation requires a quorum of half the nodes plus
> one. So in a 2-node system that means both nodes.
>
> > How do you deal with this?
>
> Don't use a 2-node multi-master asynchronous replication system as an
> active/standby failover system.
>
> (BTW, newer BDR versions allow you to increase the preallocated chunk
> size, but that's just kicking the ball down the road a bit).
>
> > Seems that BDR global sequences will not be good for master-master failover.
>
> It's fine with more nodes. You have bigger worries, though, due to the
> *asynchronous* nature of the replication. You don't know if the peer
> node(s) have received all the changes from the master that failed. Not
> only that, but if it comes back online later, it'll replay those
> changes, and they might get discarded if more recent updates have
> since been applied to those rows, resulting in lost updates. See the
> documentation on multi-master conflicts and last-update-wins.
>
> This is very good behaviour for append-mostly applications, apps that
> are designed to work well with last-update-wins resolution, etc. It's
> really not what you want for some apps, though, and is extremely bad
> for a few workloads like apps that try to generate gapless sequences
> using counter tables. You *must* review the application if you're
> going to deploy it against a BDR system ... or any other asynchronous
> replication based solution.
>
> You can't just deploy a multi-master system like this and treat it as
> a single node. The very design choices that make it tolerant of
> latency and network partitions also means you have to think much more
> about how the application interacts with the system.
>
> With normal streaming replication you can make it synchronous, so
> there's no such concern. Or you can use it asynchronously, and accept
> that you'll lose some transactions, but you'll at least know (if you
> monitor replica lag) how big a time window you lose, and on failover
> you'll be making the decision to discard those transactions.  There
> are no multi-master conflicts to be concerned with, and failover
> becomes a simple (albeit painful) known quantity.
>
> > So, when you consumed the preallocated chunk (default to 15000 values), your
> > surviving machine will no more be able to insert into a table with a serial
> > column with underlying BDR global sequence.
> Correct.
>
> If you don't mind being tied to a fixed limit on the number of nodes
> you can instead use step/offset local sequences.
>
> > We're back to changing the start and increment of each sequence that underly
> > the "serial" field in each table.
> > And must do so differently for each node (only two in a master-master
> > failover).
>
> Correct.
>
> > Is there any workaround?
>
> Keep it simple. Use streaming replication and a hot standby.
>
> > For "traditional" (non BDR) serial, there is a way to set into configuration
> > what will be START and INCREMENT of all sequences?
>
> No.
>
> > Or each serial sequence must be individually ALTERed for each serial column
> > in each table?
>
> Yes.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

pgsql-general by date:

Previous
From: Jayadevan M
Date:
Subject: Partitioning and constraint exclusion
Next
From: Melvin Davidson
Date:
Subject: Re: table dependencies