Thread: a failover scenario

a failover scenario

From
"Tomi N/A"
Date:
I am considering pgsql as the RDBMS in a project with the following constraints:
- there's a master and reserve instance of the RDBMS on every remote location
- there's a master and reserve instance of the RDBMS on a central location
- the connections are DSL connections and therefore unreliable
- all changes have to be propagated to all servers (multimaster replication)
- if the connection between a remote location and the central location
fails, the local server continues working and resynchronizes with the
central server when the connection is restored
- if any master fails, the reserve instance takes over and the rest of
the system acts as though nothing happened

The master/reserve instance is, from what I read, standard
functionality, but I'm not so sure about the resynchronization part of
a failed link...I imagine something like WAL shipping might be of use
here, but it's just an uneducated guess.
Does code exist to support this on pgsql or is it considered
application specific functionality? Do other RDBMSs support similar
functionality?

TIA,
t.n.a.

Re: a failover scenario

From
Bill Moran
Date:
In response to "Tomi N/A" <hefest@gmail.com>:

> I am considering pgsql as the RDBMS in a project with the following constraints:
> - there's a master and reserve instance of the RDBMS on every remote location
> - there's a master and reserve instance of the RDBMS on a central location
> - the connections are DSL connections and therefore unreliable
> - all changes have to be propagated to all servers (multimaster replication)
> - if the connection between a remote location and the central location
> fails, the local server continues working and resynchronizes with the
> central server when the connection is restored
> - if any master fails, the reserve instance takes over and the rest of
> the system acts as though nothing happened
>
> The master/reserve instance is, from what I read, standard
> functionality, but I'm not so sure about the resynchronization part of
> a failed link...I imagine something like WAL shipping might be of use
> here, but it's just an uneducated guess.
> Does code exist to support this on pgsql or is it considered
> application specific functionality? Do other RDBMSs support similar
> functionality?

I don't know of any system that will just hand you those capabilities.

Every multi-master system I've ever heard of requires high-speed links
between the masters, otherwise the synchronization is far too slow to
be usable.

I believe you could do what you want in the application.  PostgreSQL
8.3 will have a native UUID type, which will help with managing conflicts
between multiple masters.  If you can define clear rules on how to manage
conflicts, that can be done automatically.  If the rules aren't so clear,
you'll need an interface where a human can manage conflicts.

With triggers and LISTEN/NOTIFY, you can put together an app that
handles replicating data when tables experience changes.  From there,
you'll need to structure your schema so such an app can detect conflicts,
(create "last_updated" timestamps on all tables, and ensure that primary
keys include a UUID or other mechanism to guarantee uniqueness) and design
some sort of queue mechanism to ensure updates can wait while network
problems are resolved.

How much effort such a thing requires is dependent on how complex the
data is.  If it's a sales database (for example) it's not all that hard,
since there aren't typical cases where two people are simultaneously
updating the same record.

I know, for example, that the PA gaming commission is putting something
like this together for the race tracks.  Each track has handheld devices
that are used to record bets/payouts, etc.  These devices can't be
connected all the time, but a sync system is pretty easy because all they
ever do is _add_ new records.  Thus, you assign each handheld a unique
device ID, and that's part of the primary key for each table, so there's
no chance of of conflict.

Sounds like a fun and challenging project.  I'm jealous.

--
Bill Moran
http://www.potentialtech.com

Re: a failover scenario

From
"Tomi N/A"
Date:
2007/10/18, Bill Moran <wmoran@potentialtech.com>:
> I don't know of any system that will just hand you those capabilities.
>
> Every multi-master system I've ever heard of requires high-speed links
> between the masters, otherwise the synchronization is far too slow to
> be usable.

I supposed so. However, I have the advantage of not having a massive
amount of data change anywhere at once...transactions are
human-generated (with as little as 20-30 users) and typically affect
just a handful of records at once. This means that an hour can pass
between two transactions, but at peak times, a transaction every
second or two might not be unusual.

> I believe you could do what you want in the application.  PostgreSQL
> 8.3 will have a native UUID type, which will help with managing conflicts
> between multiple masters.  If you can define clear rules on how to manage
> conflicts, that can be done automatically.  If the rules aren't so clear,
> you'll need an interface where a human can manage conflicts.

I don't intend to let conflicts occur...ever. Not in the sense that
two databases allocate the same unique resource for different
purposes.

> With triggers and LISTEN/NOTIFY, you can put together an app that
> handles replicating data when tables experience changes.  From there,
> you'll need to structure your schema so such an app can detect conflicts,
> (create "last_updated" timestamps on all tables, and ensure that primary

I was thinking something along those lines. It might make things
easier if nothing was ever erased, only invalidated.

> keys include a UUID or other mechanism to guarantee uniqueness) and design
> some sort of queue mechanism to ensure updates can wait while network
> problems are resolved.

UUIDs didn't come to mind, thanks for pointing them out.

> How much effort such a thing requires is dependent on how complex the
> data is.  If it's a sales database (for example) it's not all that hard,
> since there aren't typical cases where two people are simultaneously
> updating the same record.

Well...it's a sales database...and the risk of simultaneous updates is huge. :)

> I know, for example, that the PA gaming commission is putting something
> like this together for the race tracks.  Each track has handheld devices
> that are used to record bets/payouts, etc.  These devices can't be
> connected all the time, but a sync system is pretty easy because all they
> ever do is _add_ new records.  Thus, you assign each handheld a unique
> device ID, and that's part of the primary key for each table, so there's
> no chance of of conflict.

Sounds pretty simple...probably the tip of the iceberg. :)

> Sounds like a fun and challenging project.  I'm jealous.

It's not a project yet, but the tender is out...all we've got to do is
win it. It's a shame the tender is fixed (for someone else), but
blowing the whistle might help.

Thanks for sharing. Cheers,
t.n.a.