Thread: Replication

Replication

From
Craig James
Date:
Looking for replication solutions, I find:

Slony-I
 Seems good, single master only, master is a single point of failure,
 no good failover system for electing a new master or having a failed
 master rejoin the cluster.  Slave databases are mostly for safety or
 for parallelizing queries for performance.  Suffers from O(N^2)
 communications (N = cluster size).

Slony-II
 Seems brilliant, a solid theoretical foundation, at the forefront of
 computer science.  But can't find project status -- when will it be
 available?  Is it a pipe dream, or a nearly-ready reality?

PGReplication
 Appears to be a page that someone forgot to erase from the old GBorg site.

PGCluster
 Seems pretty good, but web site is not current, there are releases in use
 that are not on the web site, and also seems to always be a couple steps
 behind the current release of Postgres.  Two single-points failure spots,
 load balancer and the data replicator.

Is this a good summary of the status of replication?  Have I missed any important solutions or mischaracterized
anything?

Thanks!
Craig


Re: Replication

From
"Joshua D. Drake"
Date:
Craig James wrote:
> Looking for replication solutions, I find:
>
> Slony-I
> Seems good, single master only, master is a single point of failure,
> no good failover system for electing a new master or having a failed
> master rejoin the cluster.  Slave databases are mostly for safety or
> for parallelizing queries for performance.  Suffers from O(N^2)
> communications (N = cluster size).

Yep

>
> Slony-II
> Seems brilliant, a solid theoretical foundation, at the forefront of
> computer science.  But can't find project status -- when will it be
> available?  Is it a pipe dream, or a nearly-ready reality?
>

Dead


> PGReplication
> Appears to be a page that someone forgot to erase from the old GBorg site.
>

Dead


> PGCluster
> Seems pretty good, but web site is not current, there are releases in use
> that are not on the web site, and also seems to always be a couple steps
> behind the current release of Postgres.  Two single-points failure spots,
> load balancer and the data replicator.
>

Slow as all get out for writes but cool idea

> Is this a good summary of the status of replication?  Have I missed any
> important solutions or mischaracterized anything?
>

log shipping, closed source solutions


> Thanks!
> Craig
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


Re: Replication

From
Ben
Date:
Which replication problem are you trying to solve?

On Thu, 14 Jun 2007, Craig James wrote:

> Looking for replication solutions, I find:
>
> Slony-I
> Seems good, single master only, master is a single point of failure,
> no good failover system for electing a new master or having a failed
> master rejoin the cluster.  Slave databases are mostly for safety or
> for parallelizing queries for performance.  Suffers from O(N^2)
> communications (N = cluster size).
>
> Slony-II
> Seems brilliant, a solid theoretical foundation, at the forefront of
> computer science.  But can't find project status -- when will it be
> available?  Is it a pipe dream, or a nearly-ready reality?
>
> PGReplication
> Appears to be a page that someone forgot to erase from the old GBorg site.
>
> PGCluster
> Seems pretty good, but web site is not current, there are releases in use
> that are not on the web site, and also seems to always be a couple steps
> behind the current release of Postgres.  Two single-points failure spots,
> load balancer and the data replicator.
>
> Is this a good summary of the status of replication?  Have I missed any
> important solutions or mischaracterized anything?
>
> Thanks!
> Craig
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: Replication

From
"Alexander Staubo"
Date:
On 6/15/07, Craig James <craig_james@emolecules.com> wrote:
[snip]
> Is this a good summary of the status of replication?  Have I missed any important solutions or mischaracterized
anything?

* Mammoth Replicator, commercial.

* Continuent uni/cluster, commercial
(http://www.continuent.com/index.php?option=com_content&task=view&id=212&Itemid=169).

* pgpool-II. Supports load-balancing and replication by implementing a
proxy that duplicates all updates to all slaves. It can partition data
by doing this, and it can semi-intelligently route queries to the
appropriate servers.

* Cybertec. This is a commercial packaging of PGCluster-II from an
Austrian company.

* Greenplum Database (formerly Bizgres MPP), commercial. Not so much a
replication solution as a way to parallelize queries, and targeted at
the data warehousing crowd. Similar to ExtenDB, but tightly integrated
with PostgreSQL.

* DRDB (http://www.drbd.org/), a device driver that replicates disk
blocks to other nodes. This works for failover only, not for scaling
reads. Easy migration of devices if combined with an NFS export.

* Skytools (https://developer.skype.com/SkypeGarage/DbProjects/SkyTools),
a collection of replication tools from the Skype people. Purports to
be simpler to use than Slony.

Lastly, and perhaps most promisingly, there's the Google Summer of
Code effort by Florian Pflug
(http://code.google.com/soc/postgres/appinfo.html?csaid=6545828A8197EBC6)
to implement true log-based replication, where PostgreSQL's
transaction logs are used to keep live slave servers up to date with a
master. In theory, such a system would be extremely simple to set up
and use, especially since it should, as far as I can see, also
transparently replicate the schema for you.

Alexander.

Re: Replication

From
"Kevin Grittner"
Date:
>>> On Thu, Jun 14, 2007 at  6:14 PM, in message <4671CBBA.6010104@emolecules.com>,
Craig James <craig_james@emolecules.com> wrote:
> Looking for replication solutions, I find:
>
> Slony-I
> Slony-II
> PGReplication
> PGCluster

You wouldn't guess it from the name, but pgpool actually supports replication:

http://pgpool.projects.postgresql.org/




Re: Replication

From
Craig James
Date:
Thanks to all who replied and filled in the blanks.  The problem with the web is you never know if you've missed
something.

Joshua D. Drake wrote:
>> Looking for replication solutions, I find...
>> Slony-II
> Dead

Wow, I'm surprised.  Is it dead for lack of need, lack of resources, too complex, or all of the above?  It sounded like
sucha promising theoretical foundation. 

Ben wrote:
> Which replication problem are you trying to solve?

Most of our data is replicated offline using custom tools tailored to our loading pattern, but we have a small amount
of"global" information, such as user signups, system configuration, advertisements, and such, that go into a single
small(~5-10 MB) "global database" used by all servers. 

We need "nearly-real-time replication," and instant failover.  That is, it's far more important for the system to keep
workingthan it is to lose a little data.  Transactional integrity is not important.  Actual hardware failures are rare,
andif a user just happens to sign up, or do "save preferences", at the instant the global-database server goes down,
it'snot a tragedy.  But it's not OK for the entire web site to go down when the one global-database server fails. 

Slony-I can keep several slave databases up to date, which is nice.  And I think I can combine it with a PGPool
instanceon each server, with the master as primary and few Slony-copies as secondary.  That way, if the master goes
down,the PGPool servers all switch to their secondary Slony slaves, and read-only access can continue.  If the master
crashes,users will be able to do most activities, but new users can't sign up, and existing users can't change their
preferences,until either the master server comes back, or one of the slaves is promoted to master. 

The problem is, there don't seem to be any "vote a new master" type of tools for Slony-I, and also, if the original
mastercomes back online, it has no way to know that a new master has been elected.  So I'd have to write a bunch of
SOAPservices or something to do all of this. 

I would consider PGCluster, but it seems to be a patch to Postgres itself.  I'm reluctant to introduce such a major
pieceof technology into our entire system, when only one tiny part of it needs the replication service. 

Thanks,
Craig

Re: Replication

From
Andreas Kostyrka
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Most of our data is replicated offline using custom tools tailored to
> our loading pattern, but we have a small amount of "global" information,
> such as user signups, system configuration, advertisements, and such,
> that go into a single small (~5-10 MB) "global database" used by all
> servers.

Slony provides near instantaneous failovers (in the single digit seconds
 range). You can script an automatic failover if the master server
becomes unreachable. That leaves you the problem of restarting your app
(or making it reconnect) to the new master.

5-10MB data implies such a fast initial replication, that making the
server rejoin the cluster by setting it up from scratch is not an issue.


> The problem is, there don't seem to be any "vote a new master" type of
> tools for Slony-I, and also, if the original master comes back online,
> it has no way to know that a new master has been elected.  So I'd have
> to write a bunch of SOAP services or something to do all of this.

You don't need SOAP services, and you do not need to elect a new master.
if dbX goes down, dbY takes over, you should be able to decide on a
static takeover pattern easily enough.

The point here is, that the servers need to react to a problem, but you
probably want to get the admin on duty to look at the situation as
quickly as possible anyway. With 5-10MB of data in the database, a
complete rejoin from scratch to the cluster is measured in minutes.

Furthermore, you need to checkout pgpool, I seem to remember that it has
some bad habits in routing queries. (E.g. it wants to apply write
queries to all nodes, but slony makes the other nodes readonly.
Furthermore, anything inside a BEGIN is sent to the master node, which
is bad with some ORMs, that by default wrap any access into a transaction)

Andreas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGceUXHJdudm4KnO0RAgh/AJ4kXFpzoQAEnn1B7K6pzoCxk0wFxQCggGF1
mA1KWvcKtfJ6ZcPiajJK1i4=
=eoNN
-----END PGP SIGNATURE-----

Re: Replication

From
Craig James
Date:
Andreas Kostyrka wrote:
> Slony provides near instantaneous failovers (in the single digit seconds
>  range). You can script an automatic failover if the master server
> becomes unreachable.

But Slony slaves are read-only, correct?  So the system isn't fully functional once the master goes down.

> That leaves you the problem of restarting your app
> (or making it reconnect) to the new master.

Don't you have to run a Slony app to convert one of the slaves into the master?

> 5-10MB data implies such a fast initial replication, that making the
> server rejoin the cluster by setting it up from scratch is not an issue.

The problem is to PREVENT it from rejoining the cluster.  If you have some semi-automatic process that detects the dead
serverand converts a slave to the master, and in the mean time the dead server manages to reboot itself (or its network
getsfixed, or whatever the problem was), then you have two masters sending out updates, and you're screwed. 

>> The problem is, there don't seem to be any "vote a new master" type of
>> tools for Slony-I, and also, if the original master comes back online,
>> it has no way to know that a new master has been elected.  So I'd have
>> to write a bunch of SOAP services or something to do all of this.
>
> You don't need SOAP services, and you do not need to elect a new master.
> if dbX goes down, dbY takes over, you should be able to decide on a
> static takeover pattern easily enough.

I can't see how that is true.  Any self-healing distributed system needs something like the following:

  - A distributed system of nodes that check each other's health
  - A way to detect that a node is down and to transmit that
    information across the nodes
  - An election mechanism that nominates a new master if the
    master fails
  - A way for a node coming online to determine if it is a master
    or a slave

Any solution less than this can cause corruption because you can have two nodes that both think they're master, or end
upwith no master and no process for electing a master.  As far as I can tell, Slony doesn't do any of this.  Is there a
simplersolution?  I've never heard of one. 

> The point here is, that the servers need to react to a problem, but you
> probably want to get the admin on duty to look at the situation as
> quickly as possible anyway.

No, our requirement is no administrator interaction.  We need instant, automatic recovery from failure so that the
systemstays online. 

> Furthermore, you need to checkout pgpool, I seem to remember that it has
> some bad habits in routing queries. (E.g. it wants to apply write
> queries to all nodes, but slony makes the other nodes readonly.
> Furthermore, anything inside a BEGIN is sent to the master node, which
> is bad with some ORMs, that by default wrap any access into a transaction)

I should have been more clear about this.  I was planning to use PGPool in the PGPool-1 mode (not the new PGPool-2
featuresthat allow replication).  So it would only be acting as a failover mechanism.  Slony would be used as the
replicationmechanism. 

I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could
bringthe whole system down.  If you're using it for INSERT/UPDATE, then there can only be one PGPool server. 

I was thinking I'd put a PGPool server on every machine in failover mode only.  It would have the Slony master as the
primaryconnection, and a Slony slave as the failover connection.  The applications would route all INSERT/UPDATE
statementsdirectly to the Slony master, and all SELECT statements to the PGPool on localhost.  When the master failed,
allof the PGPool servers would automatically switch to one of the Slony slaves. 

This way, the system would keep running on the Slony slaves (so it would be read-only), until a sysadmin could get the
masterSlony back online.  And when the master came online, the PGPool servers would automatically reconnect and
write-accesswould be restored. 

Does this make sense?

Craig

Re: Replication

From
"Joshua D. Drake"
Date:
Craig James wrote:
> Andreas Kostyrka wrote:
>> Slony provides near instantaneous failovers (in the single digit seconds
>>  range). You can script an automatic failover if the master server
>> becomes unreachable.
>
> But Slony slaves are read-only, correct?  So the system isn't fully
> functional once the master goes down.

That is what promotion is for.

Joshua D. Drake


> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: Replication

From
"Alexander Staubo"
Date:
On 6/15/07, Craig James <craig_james@emolecules.com> wrote:
> I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could
bringthe whole system down.  If you're using it for INSERT/UPDATE, then there can only be one PGPool server. 

Are you sure? I have been considering this possibility, too, but I
didn't find anything in the documentation. The main mechanism of the
proxy is taking received updates and playing them one multiple servers
with 2PC, and the proxies should not need to keep any state about
this, so why couldn't you install multiple proxies?

Alexander.

Re: Replication

From
Devrim GÜNDÜZ
Date:
Hello,

On Thu, 2007-06-14 at 16:14 -0700, Craig James wrote:
> Cluster
>  Seems pretty good, but web site is not current,

http://www.pgcluster.org is a bit up2date, also
http://pgfoundry.org/projects/pgcluster is up2date (at least downloads
page :) )

Regards,
--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/



Attachment

Re: Replication

From
Gábriel Ákos
Date:
On Thu, 14 Jun 2007 17:38:01 -0700
Craig James <craig_james@emolecules.com> wrote:

> I would consider PGCluster, but it seems to be a patch to Postgres
> itself.  I'm reluctant to introduce such a major piece of technology

Yes it is. For most of the time it is not very much behind actual
versions of postgresql. The project's biggest drawbacks, as I see:

- horrible documentation
- changing configuration without any warning/help to the "user"
(as far as there are only "rc"-s, I can't really blame the
developers for that... :) )

- there are only "rc" -s, no "stable" version available for current
postgresql releases.

I think this project needs someone speaking english very well, and
having the time and will to coordinate and document all the code that
is written. Otherwise the idea and the solution seems to be very good.
If someone - with big luck and lot of try-fail efforts - sets up a
working system, then it will be stable and working for long time.

> into our entire system, when only one tiny part of it needs the
> replication service.
>
> Thanks,
> Craig

Rgds,
Akos

--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web:  http://www.i-logic.hu =-
-=Tel/fax:+3612367353            |Mobil:+36209278894          =-

Re: Replication

From
Markus Schiltknecht
Date:
Hi,

Joshua D. Drake wrote:
>> Slony-II
>> Seems brilliant, a solid theoretical foundation, at the forefront of
>> computer science.  But can't find project status -- when will it be
>> available?  Is it a pipe dream, or a nearly-ready reality?
>>
>
> Dead

Not quite... there's still Postgres-R, see www.postgres-r.org  And I'm
continuously working on it, despite not having updated the website for
almost a year now...

I planned on releasing the next development snapshot together with 8.3,
as that seems to be delayed, that seems realistic ;-)

Regards

Markus


Re: Replication

From
Craig James
Date:
Markus Schiltknecht wrote:
> Not quite... there's still Postgres-R, see www.postgres-r.org  And I'm
> continuously working on it, despite not having updated the website for
> almost a year now...
>
> I planned on releasing the next development snapshot together with 8.3,
> as that seems to be delayed, that seems realistic ;-)

Is Postgres-R the same thing as Slony-II?  There's a lot of info and news around about Slony-II, but your web page
doesn'tseem to mention it. 

While researching replication solutions, I had a heck of a time sorting out the dead or outdated web pages (like the
stuffon gborg) from the active projects. 

Either way, it's great to know you're working on it.

Craig

Re: Replication

From
Markus Schiltknecht
Date:
Hi,

Craig James wrote:
> Is Postgres-R the same thing as Slony-II?  There's a lot of info and
> news around about Slony-II, but your web page doesn't seem to mention it.

Hm... true. Good point. Maybe I should add a FAQ:

Postgres-R has been the name of the research project by Bettina Kemme et
al. Slony-II was the name Neil and Gavin gave their attempt to continue
that project.

I've based my work on the old (6.4.2) Postgres-R source code - and I'm
still calling it Postgres-R, probably Postgres-R (8) to distinguish it
from the original one. But I'm thinking about changing the name
completely... however, I'm a developer, not a marketing guru.

> While researching replication solutions, I had a heck of a time sorting
> out the dead or outdated web pages (like the stuff on gborg) from the
> active projects.

Yeah, that's one of the main problems with replication for PostgreSQL. I
hope Postgres-R (or whatever name I'll come up with in the future) can
change that.

> Either way, it's great to know you're working on it.

Maybe you want to join its mailing list [1]? I'll try to get some
discussion going there in the near future.

Regards

Markus

[1]: Postgres-R on gborg:
http://pgfoundry.org/projects/postgres-r/

Re: Replication

From
Jeff Davis
Date:
On Thu, 2007-06-14 at 16:14 -0700, Craig James wrote:
> Looking for replication solutions, I find:
>
> Slony-I
>  Seems good, single master only, master is a single point of failure,
>  no good failover system for electing a new master or having a failed
>  master rejoin the cluster.  Slave databases are mostly for safety or
>  for parallelizing queries for performance.  Suffers from O(N^2)
>  communications (N = cluster size).
>

There's MOVE SET which transfers the origin (master) from one node to
another without losing any committed transactions.

There's also FAILOVER, which can set a new origin even if the old origin
is completely gone, however you will lose the transactions that haven't
been replicated yet.

To have a new node join the cluster, you SUBSCRIBE SET, and you can MOVE
SET to it later if you want that to be the master.

Regards,
    Jeff Davis



Re: Replication

From
Andrew Sullivan
Date:
On Mon, Jun 18, 2007 at 08:54:46PM +0200, Markus Schiltknecht wrote:
> Postgres-R has been the name of the research project by Bettina Kemme et
> al. Slony-II was the name Neil and Gavin gave their attempt to continue
> that project.

This isn't quite true.  Slony-II was originally conceived by Jan as
an attempt to implement some of the Postgres-R ideas.  For our uses,
however, Postgres-R had built into it a rather knotty design problem:
under high-contention workloads, it will automatically increase the
number of ROLLBACKs users experience.  Jan had some ideas on how to
solve this by moving around the GC events and doing slightly
different things with them.

To that end, Afilias sponsored a small workshop in Toronto during one
of the coldest weeks the city has ever seen.  This should have been a
clue, perhaps. ;-)  Anyway, the upshot of this was that two or three
different approaches were attempted in prototypes.  AFAIK, Neil and
Gavin got the farthest, but just about everyone who was involved in
the original workshop all independently concluded that the approach
we were attempting to get to work was doomed -- it might go, but
the overhead was great enough that it wouldn't be any benefit.

Part of the problem, as near as I could tell, was that we had no
group communication protocol that would really work.  Spread needed a
_lot_ of work (where "lot of work" may mean "rewrite"), and I just
didn't have the humans to put on that problem.  Another part of the
problem was that, for high-contention workloads like the ones we
happened to be working on, an optimistic approach like Postgres-R is
probably always going to be a loser.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
                --Brad Holland

Re: Replication

From
Markus Schiltknecht
Date:
Hi,

Andrew Sullivan wrote:
> This isn't quite true.  Slony-II was originally conceived by Jan as
> an attempt to implement some of the Postgres-R ideas.

Oh, right, thanks for that correction.

> Part of the problem, as near as I could tell, was that we had no
> group communication protocol that would really work.  Spread needed a
> _lot_ of work (where "lot of work" may mean "rewrite"), and I just
> didn't have the humans to put on that problem.  Another part of the
> problem was that, for high-contention workloads like the ones we
> happened to be working on, an optimistic approach like Postgres-R is
> probably always going to be a loser.

Hm.. for high-contention on single rows, sure, yes - you would mostly
get rollbacks for conflicting transactions. But the optimism there is
justified, as I think most real world transactions don't conflict (or
else you can work around such high single row contention).

You are right in that the serialization of the GCS can be bottleneck.
However, there's lots of research going on in that area and I'm
convinced that Postgres-R has it's value.

Regards

Markus