Thread: BDR global sequences in two machine failover

BDR global sequences in two machine failover

From
Giovanni Maruzzelli
Date:
Hello,

Typical HA situation.

I have master-master, two only machines, one active and one passive (standby) with floating IP.
I write to only one machine at time, the one with the floating IP.

I have one column that is a serial from a global sequence in BDR.

When one machine is down I can no more refill sequence allocated chunk (eg: next pool of values)...

How do you deal with this? In a failover situation you want the single surviving machine to continue working... For an arbitrary period (eg: number of inserts will be more than the preallocated number of sequence values).

Seems that BDR global sequences will not be good for master-master failover.

They *need* half the nodes + 1 to be running and connected to vote for next chunk.

So, when you consumed the preallocated chunk (default to 15000 values), your surviving machine will no more be able to insert into a table with a serial column with underlying BDR global sequence.

We're back to changing the start and increment of each sequence that underly the "serial" field in each table.
And must do so differently for each node (only two in a master-master failover).

Is there any workaround?

For "traditional" (non BDR) serial, there is a way to set into configuration what will be START and INCREMENT of all sequences?
Or each serial sequence must be individually ALTERed for each serial column in each table?

Thank you all in advance,

-giovanni


--
Sincerely,

Giovanni Maruzzelli
Cell : +39-347-2665618

Re: BDR global sequences in two machine failover

From
Craig Ringer
Date:
On 7 September 2015 at 00:18, Giovanni Maruzzelli <gmaruzz@gmail.com> wrote:
> Hello,
>
> Typical HA situation.
>
> I have master-master, two only machines, one active and one passive
> (standby) with floating IP.
> I write to only one machine at time, the one with the floating IP.

This is a deployment that is better suited to the typical approach
with an active node, a standby streaming replica, and failover. Tools
like repmgr help with this.

> When one machine is down I can no more refill sequence allocated chunk (eg:
> next pool of values)...

Global sequence allocation requires a quorum of half the nodes plus
one. So in a 2-node system that means both nodes.

> How do you deal with this?

Don't use a 2-node multi-master asynchronous replication system as an
active/standby failover system.

(BTW, newer BDR versions allow you to increase the preallocated chunk
size, but that's just kicking the ball down the road a bit).

> Seems that BDR global sequences will not be good for master-master failover.

It's fine with more nodes. You have bigger worries, though, due to the
*asynchronous* nature of the replication. You don't know if the peer
node(s) have received all the changes from the master that failed. Not
only that, but if it comes back online later, it'll replay those
changes, and they might get discarded if more recent updates have
since been applied to those rows, resulting in lost updates. See the
documentation on multi-master conflicts and last-update-wins.

This is very good behaviour for append-mostly applications, apps that
are designed to work well with last-update-wins resolution, etc. It's
really not what you want for some apps, though, and is extremely bad
for a few workloads like apps that try to generate gapless sequences
using counter tables. You *must* review the application if you're
going to deploy it against a BDR system ... or any other asynchronous
replication based solution.

You can't just deploy a multi-master system like this and treat it as
a single node. The very design choices that make it tolerant of
latency and network partitions also means you have to think much more
about how the application interacts with the system.

With normal streaming replication you can make it synchronous, so
there's no such concern. Or you can use it asynchronously, and accept
that you'll lose some transactions, but you'll at least know (if you
monitor replica lag) how big a time window you lose, and on failover
you'll be making the decision to discard those transactions.  There
are no multi-master conflicts to be concerned with, and failover
becomes a simple (albeit painful) known quantity.

> So, when you consumed the preallocated chunk (default to 15000 values), your
> surviving machine will no more be able to insert into a table with a serial
> column with underlying BDR global sequence.
Correct.

If you don't mind being tied to a fixed limit on the number of nodes
you can instead use step/offset local sequences.

> We're back to changing the start and increment of each sequence that underly
> the "serial" field in each table.
> And must do so differently for each node (only two in a master-master
> failover).

Correct.

> Is there any workaround?

Keep it simple. Use streaming replication and a hot standby.

> For "traditional" (non BDR) serial, there is a way to set into configuration
> what will be START and INCREMENT of all sequences?

No.

> Or each serial sequence must be individually ALTERed for each serial column
> in each table?

Yes.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BDR global sequences in two machine failover

From
Giovanni Maruzzelli
Date:


On Sep 7, 2015 5:05 AM, "Craig Ringer" <craig@2ndquadrant.com> wrote:
>
> On 7 September 2015 at 00:18, Giovanni Maruzzelli <gmaruzz@gmail.com> wrote:
> > Hello,
> >
> > Typical HA situation.
> >
> > I have master-master, two only machines, one active and one passive
> > (standby) with floating IP.
> > I write to only one machine at time, the one with the floating IP.
>
> This is a deployment that is better suited to the typical approach
> with an active node, a standby streaming replica, and failover. Tools
> like repmgr help with this.
>

Craig, thanks a lot for your answer!

My use case is keeping internal state of some load balanced servers that need to act as one (eg cluster of voip servers).

Last update wins is ok.

If I do not use global sequences, and I use uuid as primary keys, would BDR be a correct choice?

BDR is appealing not only because of new toy coolness, but also because of possible geodistribution and the seemingly sheer simplicity of installation/management.

Btw, congratulation for the feat!

-giovanni




> > When one machine is down I can no more refill sequence allocated chunk (eg:
> > next pool of values)...
>
> Global sequence allocation requires a quorum of half the nodes plus
> one. So in a 2-node system that means both nodes.
>
> > How do you deal with this?
>
> Don't use a 2-node multi-master asynchronous replication system as an
> active/standby failover system.
>
> (BTW, newer BDR versions allow you to increase the preallocated chunk
> size, but that's just kicking the ball down the road a bit).
>
> > Seems that BDR global sequences will not be good for master-master failover.
>
> It's fine with more nodes. You have bigger worries, though, due to the
> *asynchronous* nature of the replication. You don't know if the peer
> node(s) have received all the changes from the master that failed. Not
> only that, but if it comes back online later, it'll replay those
> changes, and they might get discarded if more recent updates have
> since been applied to those rows, resulting in lost updates. See the
> documentation on multi-master conflicts and last-update-wins.
>
> This is very good behaviour for append-mostly applications, apps that
> are designed to work well with last-update-wins resolution, etc. It's
> really not what you want for some apps, though, and is extremely bad
> for a few workloads like apps that try to generate gapless sequences
> using counter tables. You *must* review the application if you're
> going to deploy it against a BDR system ... or any other asynchronous
> replication based solution.
>
> You can't just deploy a multi-master system like this and treat it as
> a single node. The very design choices that make it tolerant of
> latency and network partitions also means you have to think much more
> about how the application interacts with the system.
>
> With normal streaming replication you can make it synchronous, so
> there's no such concern. Or you can use it asynchronously, and accept
> that you'll lose some transactions, but you'll at least know (if you
> monitor replica lag) how big a time window you lose, and on failover
> you'll be making the decision to discard those transactions.  There
> are no multi-master conflicts to be concerned with, and failover
> becomes a simple (albeit painful) known quantity.
>
> > So, when you consumed the preallocated chunk (default to 15000 values), your
> > surviving machine will no more be able to insert into a table with a serial
> > column with underlying BDR global sequence.
> Correct.
>
> If you don't mind being tied to a fixed limit on the number of nodes
> you can instead use step/offset local sequences.
>
> > We're back to changing the start and increment of each sequence that underly
> > the "serial" field in each table.
> > And must do so differently for each node (only two in a master-master
> > failover).
>
> Correct.
>
> > Is there any workaround?
>
> Keep it simple. Use streaming replication and a hot standby.
>
> > For "traditional" (non BDR) serial, there is a way to set into configuration
> > what will be START and INCREMENT of all sequences?
>
> No.
>
> > Or each serial sequence must be individually ALTERed for each serial column
> > in each table?
>
> Yes.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR global sequences in two machine failover

From
Craig Ringer
Date:
On 7 September 2015 at 20:56, Giovanni Maruzzelli <gmaruzz@gmail.com> wrote:

> If I do not use global sequences, and I use uuid as primary keys, would BDR
> be a correct choice?

For something like a VoIP service where eventual consistency is
usually OK and geographic redundancy with latency tolerance and
partition tolerance is needed, yes, it could make a lot of sense.

You could use UUID keys or use normal sequences with different offsets
on the nodes. UUID will probably be easier to manage.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BDR global sequences in two machine failover

From
Giovanni Maruzzelli
Date:
Thanks again and more Craig

-giovanni


On Tue, Sep 8, 2015 at 8:31 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 7 September 2015 at 20:56, Giovanni Maruzzelli <gmaruzz@gmail.com> wrote:

> If I do not use global sequences, and I use uuid as primary keys, would BDR
> be a correct choice?

For something like a VoIP service where eventual consistency is
usually OK and geographic redundancy with latency tolerance and
partition tolerance is needed, yes, it could make a lot of sense.

You could use UUID keys or use normal sequences with different offsets
on the nodes. UUID will probably be easier to manage.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



--
Sincerely,

Giovanni Maruzzelli
Cell : +39-347-2665618