Thread: Replication

Replication

From
Gerry Reno
Date:
I noticed that the user survey on the community page does not list
replication among the choices for development priority.  For me,
replication is the most important thing that is critically missing from
postgresql.  We need something as good as MySQL Replication.  Both
statement-based and row-based replication.  And support for
Master-Master and full cyclic replication setups.  Postgresql is just a
toy database without this as far as I am concerned.

Regards,
Gerry


Re: Replication

From
"Joshua D. Drake"
Date:
On Mon, 2009-06-22 at 17:53 -0400, Gerry Reno wrote:
> I noticed that the user survey on the community page does not list
> replication among the choices for development priority.  For me,
> replication is the most important thing that is critically missing from
> postgresql.  We need something as good as MySQL Replication.  Both
> statement-based and row-based replication.  And support for
> Master-Master and full cyclic replication setups.  Postgresql is just a
> toy database without this as far as I am concerned.

Funny.

Joshua D. Drake


>
> Regards,
> Gerry
>
>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Replication

From
Kevin Barnard
Date:
On Jun 22, 2009, at 4:53 PM, Gerry Reno wrote:

> I noticed that the user survey on the community page does not list
> replication among the choices for development priority.  For me,
> replication is the most important thing that is critically missing
> from postgresql.  We need something as good as MySQL Replication.
> Both statement-based and row-based replication.  And support for
> Master-Master and full cyclic replication setups.  Postgresql is
> just a toy database without this as far as I am concerned.
>
> Regards,
> Gerry
>

Google postgresql replication.  There are multiple replication /
clustering options depending on you needs.  It's not built in to the
DB nor should it be because everyone  has different replication needs.

The idea of separating replication functionality from the core DB
product isn't new.  AFAIK IBM has always done this on there big iron
based DB2.  Granted their cheap replication software costs more then
you paid for that server that is running MySQL, and the expensive
replication probably costs more then a cabinet worth of MySQL
servers. :-)

--
Kevin Barnard
kevin.barnard@laser2mail.com




Re: Replication

From
Gerry Reno
Date:
Kevin Barnard wrote:
>
> On Jun 22, 2009, at 4:53 PM, Gerry Reno wrote:
>
>> I noticed that the user survey on the community page does not list
>> replication among the choices for development priority. For me,
>> replication is the most important thing that is critically missing
>> from postgresql. We need something as good as MySQL Replication. Both
>> statement-based and row-based replication. And support for
>> Master-Master and full cyclic replication setups. Postgresql is just
>> a toy database without this as far as I am concerned.
>>
>> Regards,
>> Gerry
>>
>
> Google postgresql replication. There are multiple replication /
> clustering options depending on you needs. It's not built in to the DB
> nor should it be because everyone has different replication needs.
>
> The idea of separating replication functionality from the core DB
> product isn't new. AFAIK IBM has always done this on there big iron
> based DB2. Granted their cheap replication software costs more then
> you paid for that server that is running MySQL, and the expensive
> replication probably costs more then a cabinet worth of MySQL servers.
> :-)
>
> --
> Kevin Barnard
> kevin.barnard@laser2mail.com
>
>
Have you ever tried any of the postgresql replication offerings? The
only one that is remotely viable is slony and it is so quirky you may as
well forget it. The rest are in some stage of decay/abandonment. There
is no real replication available for postgresql. Postgresql needs to
develop a real replication offering for postgresql. Builtin or a
separate module.

Regards,
Gerry


Re: Replication

From
"Joshua D. Drake"
Date:
On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:
> Kevin Barnard wrote:

> >
> Have you ever tried any of the postgresql replication offerings? The
> only one that is remotely viable is slony and it is so quirky you may as
> well forget it. The rest are in some stage of decay/abandonment. There
> is no real replication available for postgresql. Postgresql needs to
> develop a real replication offering for postgresql. Builtin or a
> separate module.

Well this certainly isn't true but what do I know.

Joshua D. Drake



--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Replication

From
Gerry Reno
Date:
Joshua D. Drake wrote:
> On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:
>
>> Kevin Barnard wrote:
>>
>
>
>> Have you ever tried any of the postgresql replication offerings? The
>> only one that is remotely viable is slony and it is so quirky you may as
>> well forget it. The rest are in some stage of decay/abandonment. There
>> is no real replication available for postgresql. Postgresql needs to
>> develop a real replication offering for postgresql. Builtin or a
>> separate module.
>>
>
> Well this certainly isn't true but what do I know.
>
> Joshua D. Drake
>
>
>
>
It is true.  Otherwise show me a viable replication offering for
postgresql that I can put into production and obtain support for it.

Regards,
Gerry


Re: Replication

From
"Joshua D. Drake"
Date:
On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:
> Joshua D. Drake wrote:

> It is true.  Otherwise show me a viable replication offering for
> postgresql that I can put into production and obtain support for it.

Well, you can get support for Slony (known to to be a bit complicated
but stable and flexible). You can also get support for Londiste (which
is used in production by Skype... I think that speaks for itself). You
can get support for log shipping if all you need is simple master->slave
redundancy.

I can name others if you like but since you are clearly not able to
effectively use Google nor actually present production requirements so
people can help you, I doubt it would do much good.

Joshua D. Drake

>
> Regards,
> Gerry
>
>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Replication

From
Gerry Reno
Date:
Joshua D. Drake wrote:
> On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:
>
>> Joshua D. Drake wrote:
>>
>
>
>> It is true.  Otherwise show me a viable replication offering for
>> postgresql that I can put into production and obtain support for it.
>>
>
> Well, you can get support for Slony (known to to be a bit complicated
> but stable and flexible).
I've already tried Slony last year and unless something major has
changed it is not viable.  I cannot have replication that just stops for
no known reason.

> You can also get support for Londiste (which
> is used in production by Skype... I think that speaks for itself).
Londiste is beta.  The fact that Skype uses it is because it's part of
Skytools which is their product.  They may want to run their own beta
stuff.  I don't.


> You
> can get support for log shipping if all you need is simple master->slave
> redundancy.
>
If all I needed was log shipping I can do that myself with some scripts.

> I can name others if you like but since you are clearly not able to
> effectively use Google nor actually present production requirements so
> people can help you, I doubt it would do much good.
>
> Joshua D. Drake
>
So name others.

Regards,
Gerry


Re: Replication

From
Greg Smith
Date:
On Mon, 22 Jun 2009, Gerry Reno wrote:

> We need something as good as MySQL Replication.

I certainly hope not, I was hoping for a reliable replication solution
instead.  Wow is the information you get searching for something like
"mysql replication corruption [replay log|bin log]" scary.  I also
appreciate fun bits like how you'll get completely quiet master/slave
mismatches if you should do something crazy like, say, use LIMIT the wrong
way (see http://dev.mysql.com/doc/refman/5.0/en/replication-features.html
for more fun like that).

Anyway, you seem to be unaware that built-in replication for PostgreSQL
already is moving along, with an implementation that's just not quite
production quality yet, and might make into the next version after 8.4 if
things go well.  That's probably why it's not on the survey--everybody
knows that's important and it's already being worked on actively.

P.S. another Google search, this one for "postgresql replication support",
finds the mythical company that sells multiple products and support for
this purpose on hit #2 for me.  Or you could use the alternate approach of
looking at the jobs of the everyone who's been giving your a hard time in
this thread...

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Replication

From
Gerry Reno
Date:
Greg Smith wrote:
> On Mon, 22 Jun 2009, Gerry Reno wrote:
>
>> We need something as good as MySQL Replication.
>
> I certainly hope not, I was hoping for a reliable replication solution
> instead. Wow is the information you get searching for something like
> "mysql replication corruption [replay log|bin log]" scary. I also
> appreciate fun bits like how you'll get completely quiet master/slave
> mismatches if you should do something crazy like, say, use LIMIT the
> wrong way (see
> http://dev.mysql.com/doc/refman/5.0/en/replication-features.html for
> more fun like that).
>
I didn't mean to imply that MySQL Replication was perfect. But I've been
using it for over three years with very few problems. And yes with
statement-based replication you can get some interesting replication
anomalies if you're not careful. But, that's true of any statement-based
replication with any database.


> Anyway, you seem to be unaware that built-in replication for
> PostgreSQL already is moving along, with an implementation that's just
> not quite production quality yet, and might make into the next version
> after 8.4 if things go well.
No, I'm aware of this basic builtin replication. It was rather
disappointing to see it moved out of the 8.4 release. We need something
more that just basic master-slave replication which is all this simple
builtin replication will provide. We need a real replication solution
that can handle statement-based and row-based replication. Multi-master
replication. Full cyclic replication chain setups. Simple master-slave
just doesn't cut it.

> That's probably why it's not on the survey--everybody knows that's
> important and it's already being worked on actively.
Ok, I just felt it should still be there. But, I hope development
understands just how important good replication really is.
>
> P.S. another Google search, this one for "postgresql replication
> support", finds the mythical company that sells multiple products and
> support for this purpose on hit #2 for me. Or you could use the
> alternate approach of looking at the jobs of the everyone who's been
> giving your a hard time in this thread...
I figured as much.

>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>


Regards,
Gerry


Re: Replication

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> On Mon, 22 Jun 2009, Gerry Reno wrote:
>> We need something as good as MySQL Replication.

> I certainly hope not, I was hoping for a reliable replication solution
> instead.  Wow is the information you get searching for something like
> "mysql replication corruption [replay log|bin log]" scary.

My experience, stretching over more than five years now, is that mysql
replication fails its own regression tests a significant percentage of
the time ... in nonreproducible fashion of course, so it's hard to file
bug reports.  I'm aware of this because I package the thing for Red Hat,
and I run mysql's regression tests as part of that build, and close to
half the time the build fails in the regression tests, invariably in
the replication-related tests.  Never twice the same mind you; when
I resubmit the job, with the exact same SRPM, it usually works.

This might be some artifact of the Red Hat/Fedora build farm
environment, since my builds on my own workstation seldom fail.  But
it's persisted over multiple incarnations of that build farm and quite
a few versions of mysql.  I've never been able to pin it down enough
to file a bug report.

I can't say I'd trust mysql replication with any data I cared about.

            regards, tom lane

Re: Replication

From
Craig Ringer
Date:
On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:

> > Anyway, you seem to be unaware that built-in replication for
> > PostgreSQL already is moving along, with an implementation that's just
> > not quite production quality yet, and might make into the next version
> > after 8.4 if things go well.

> No, I'm aware of this basic builtin replication. It was rather
> disappointing to see it moved out of the 8.4 release. We need something
> more that just basic master-slave replication which is all this simple
> builtin replication will provide. We need a real replication solution
> that can handle statement-based and row-based replication. Multi-master
> replication. Full cyclic replication chain setups. Simple master-slave
> just doesn't cut it.

Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs.

Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence.

My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads?

- Client-transparent fault-tolerance?

- ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?

- Client must connect to master for writes, otherwise master or slave,
  so must be more aware of connection management

- Client drivers have no way to transparently discover active master,
  must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.

> > That's probably why it's not on the survey--everybody knows that's
> > important and it's already being worked on actively.
> Ok, I just felt it should still be there. But, I hope development
> understands just how important good replication really is.

"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.

--
Craig Ringer


Re: Replication

From
Gerry Reno
Date:
Craig Ringer wrote:
On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:
 
Anyway, you seem to be unaware that built-in replication for 
PostgreSQL already is moving along, with an implementation that's just 
not quite production quality yet, and might make into the next version 
after 8.4 if things go well.      
 
No, I'm aware of this basic builtin replication. It was rather 
disappointing to see it moved out of the 8.4 release. We need something 
more that just basic master-slave replication which is all this simple 
builtin replication will provide. We need a real replication solution 
that can handle statement-based and row-based replication. Multi-master 
replication. Full cyclic replication chain setups. Simple master-slave 
just doesn't cut it.   
Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs. 
I don't know how it could guarantee that.  That's really why row-based is better.


Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence. 
Sequences I deal with by setting up an offset and increment for each replica so that there are no conflicts.
You have to know the entire replication array size prior to setup.  I usually set increment to 10 and then I can offset up to 10 replicas.
My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads? 
yes
- Client-transparent fault-tolerance? 
yes.
- ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you? 
failure of single master. 
- Client must connect to master for writes, otherwise master or slave, so must be more aware of connection management

- Client drivers have no way to transparently discover active master, must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.
 
That's probably why it's not on the survey--everybody knows that's 
important and it's already being worked on actively.     
Ok, I just felt it should still be there. But, I hope development 
understands just how important good replication really is.   
"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.
 

Regards,
Gerry

Re: Replication

From
Craig Ringer
Date:
On Mon, 2009-06-22 at 21:29 -0400, Gerry Reno wrote:

> I don't know how it could guarantee that.  That's really why row-based
> is better.

Yep, especially in the face of things like user PL functions, C
functions, etc.

This page:

http://dev.mysql.com/doc/refman/5.0/en/replication-features-functions.html

is downright alarming, and (implicitly) says quite enough about how
statement-based replication is a really, REALLY bad idea.

Rather than replicating sets of changed rows, though, I suspect that
block-level replication using the WAL is probably more efficient.
Certainly it'll be easier on the slave in terms of the work required to
keep up with the master.

I guess block-level replication isn't much good for multi-master,
though, since you'd be spending half your time finding out what the
other masters were doing and what their state was, or telling them about
yours. (I guess that's the case anyway to some extent, though, any time
you have concurrent statements on different masters using the same data
and one or more of them is altering it).

> Sequences I deal with by setting up an offset and increment for each
> replica so that there are no conflicts.

Ah, so you don't actually care if the replicas are identical - you
expect things like different primary keys on master and replicas(s) ?

How do your applications cope if they switch from one replica to another
and suddenly primary key identifiers are different?

> > What limitations of master-slave replication with read-only slaves
> > present roadblocks for you?
> >
> failure of single master.

For that, read-only slave + heartbeat based failover with STONITH (shoot
the other node in the head) by something like IPMI remote-poweroff or a
USB-controlled power switch would be sufficient.

The only part of the requirements for this that PG can't already satisfy
is synchronous replication - the current WAL-based replication doesn't
guarantee that the slave has the changes before the client's commit
returns successfully, so recent changes that the client thinks are
committed might be lost on failover. Synchronous replication is, of
course, what's being worked on right now, partly to address just this
issue and partly to allow for read-only reporting slaves.

This technique is well established, very robust, and it's not hard to
implement in a way that ensures that the slave - when it takes over as
master - claims the master's MAC address and IP address so clients
barely notice a change.

With Pg it'd break any existing connections, but any database
application worth a damn must be able to handle re-issuing transactions
due to deadlocks, resource exhaustion, admin statement cancellation etc
anyway.

--
Craig Ringer


Re: Replication

From
Scott Marlowe
Date:
On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno<greno@verizon.net> wrote:
> Joshua D. Drake wrote:
>>
>> On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:
>>> Joshua D. Drake wrote:
>>>
>>> It is true.  Otherwise show me a viable replication offering for
>>> postgresql that I can put into production and obtain support for it.
>>
>> Well, you can get support for Slony (known to to be a bit complicated
>> but stable and flexible).
>
> I've already tried Slony last year and unless something major has changed it
> is not viable.  I cannot have replication that just stops for no known
> reason.

I've been running slony since 1.0 came out, and have NEVER had it just
stop replication for no known reason.  ever.  Your inability to use it
tells me much less about slony than it does about you.

>> You can also get support for Londiste (which
>> is used in production by Skype... I think that speaks for itself).
>
> Londiste is beta.  The fact that Skype uses it is because it's part of
> Skytools which is their product.  They may want to run their own beta stuff.
>  I don't.

So, if they said it was general release, but it sucked, you'd try it,
but since they say it's beta, no way?  Wow.  Just wow.  The amount of
dumb in that sentence is not measurable with modern instrumentation.

Re: Replication

From
Gerry Reno
Date:
Craig Ringer wrote:
On Mon, 2009-06-22 at 21:29 -0400, Gerry Reno wrote:
 
I don't know how it could guarantee that.  That's really why row-based
is better.   
Yep, especially in the face of things like user PL functions, C
functions, etc.

This page:

http://dev.mysql.com/doc/refman/5.0/en/replication-features-functions.html

is downright alarming, and (implicitly) says quite enough about how
statement-based replication is a really, REALLY bad idea.

Rather than replicating sets of changed rows, though, I suspect that
block-level replication using the WAL is probably more efficient.
Certainly it'll be easier on the slave in terms of the work required to
keep up with the master.

I guess block-level replication isn't much good for multi-master,
though, since you'd be spending half your time finding out what the
other masters were doing and what their state was, or telling them about
yours. (I guess that's the case anyway to some extent, though, any time
you have concurrent statements on different masters using the same data
and one or more of them is altering it).
 
Sequences I deal with by setting up an offset and increment for each
replica so that there are no conflicts.   
Ah, so you don't actually care if the replicas are identical - you
expect things like different primary keys on master and replicas(s) ?

How do your applications cope if they switch from one replica to another
and suddenly primary key identifiers are different? 
Here is a link that describes the technique:  http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html?page=1

<snip>

Regards,
Gerry


Re: Replication

From
Conrad Lender
Date:
On 23/06/09 03:44, Scott Marlowe wrote:
> On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno<greno@verizon.net> wrote:
>> Londiste is beta.  The fact that Skype uses it is because it's part
>> of Skytools which is their product.  They may want to run their own
>> beta stuff. I don't.
>
> So, if they said it was general release, but it sucked, you'd try it,
> but since they say it's beta, no way?  Wow.  Just wow.  The amount
> of dumb in that sentence is not measurable with modern
> instrumentation.

To be fair, the "beta" label has been abused a lot in the last years;
and what's more, it has been used as an excuse to refuse support (I'm
looking at Google here). Another point would be that Skype has come
under attack for using what basically amounts to a black box protocol in
their main application - many security-minded people are sceptical of
the company for this reason, and I can't blame them. That said, I do use
pgbouncer, which is also a Skype project (released under the BSD
license). After some casual code review I found it to be of good
quality, and I'm now using it in production environments. I don't think
it's so unreasonable to be questioning projects which are only available
as "betas". There was a time when "beta" meant caveat emptor, this
product is not fully tested, and if it breaks, we'd like to hear about
it, but we won't be surprised. Trusting such a product with database
replication may well work, but it's a risk not everybody's willing to take.


  - Conrad

Re: Replication

From
Craig Ringer
Date:
On Mon, 2009-06-22 at 22:20 -0400, Gerry Reno wrote:

> Here is a link that describes the technique:
> http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html?page=1

Ah. You were referring to multiple-master replication, and your
reference to setting non-overlapping sequences referred to avoiding
collisions caused by inserts on two different masters. Yes, using
non-overlapping allocation ranges for sequences is indeed one way to
handle that, but it's not actually related to what I was talking about
anyway.


What I was referring to in the parent post was an issue with
statement-based replication of concurrent statements sharing a sequence.
It's completely unrelated; both statements are running on the SAME
server (master) and replicating to the slave. For example, take two
concurrent statements each of which inserts 10 generated rows into the
dummy table 'x':

CREATE SEQUENCE x;
CREATE TABLE x (
  a INTEGER PRIMARY KEY DEFAULT nextval('x_id_seq'),
  b INTEGER NOT NULL
);



CONNECTION (1) TO MASTER          CONNECTION (2) TO MASTER
-----------------------------     --------------------------
Issues INSERT INTO x (a,b)
SELECT nextval('x_id_seq'),1
FROM generate_series(0,9);
                                  Issues INSERT INTO x (a,b)
                                  SELECT nextval('x_id_seq'),2
                                  FROM generate_series(0,9);

nextval() returns 1
nextval() returns 2
nextval() returns 3
                  nextval() returns 4
nextval() returns 5
                  nextval() returns 6
nextval() returns 7
nextval() returns 8
                                nextval() returns 9
                                  nextval() returns 10
                                  nextval() returns 11
nextval() returns 12

... etc


If you issue the same two statements on the slave, the ordering in which
those nextval() calls are interleaved will be different. So, while on
the master according to the example above table 'x' would contain:

 a b
(1,1)
(2,1)
(3,1)
(4,2)
(5,1)
(6,2)
(7,1)
(8,1)
(9,2)
(10,2)
(11,2)
...


on the slave it might land up containing something like:

 a b
(1,1)
(2,2)
(3,2)
(4,1)
(5,2)
(6,1)
(7,1)
(8,2)
(9,1)
(10,1)
(11,2)

so your slave and master contain TOTALLY DIFFERENT DATA. Yet, there's
nothing wrong with the ordering of execution on the master being
non-deterministic, as we still got what we asked for. We have 10 rows
with unique primary keys and b=1, and ten rows with unique primary keys
and b=2 . We don't actually care what those primary key values are since
they're synthetic primary keys, we only care that they're unique. In a
master/slave situation, though, we also care that the SAME primary key
identifies the SAME entity on both master and slave, and that won't
happen with statement-based replication when concurrent statements
interleave in non-deterministic ways.

Of course, it's rather nice in performance terms that such statements
CAN be interleaved without synchronisation or locking. In fact, that's
why PostgreSQL sequences exist.

In this particular case, the server could work around it by logging its
selection of generated values to some sort of side channel (akin to
MySQL's replication binlog) so the slave can use that as its source for
them. That's kind of error prone, though, as it requires every such
function ( nextval, random(), etc ) to have support for replication
manually added, and will result in hopelessly out-of-sync slaves if a
function isn't handled. It also doesn't provide an answer for other
non-deterministic result sets like use of a function in a result set
with LIMIT without ORDER BY .

The problem is that if you do statement-based replication, the order in
which reads from the sequence by each statement are interleaved is
undefined and depends on the OS's I/O and processor scheduling. The
slave will not produce the same ordering, so the same statements
executed on the slave will result in inserted rows having different
generated keys than on the master.

MySQL appears to tackle these problems by
   look! a cassowary! Over there!

Anyway, what was I saying? Oh, yes, MySQL appears to ignore these
problems or expect a paranoidly careful admin to avoid them. Some
functions are just broken and don't replicate properly; some statements
will produce wrong results on the slave, etc.

You won't EVER see that sort of thing in PostgreSQL.

So ... it doesn't seem likely that statement-level replication would
ever get far in Pg because of nasty issues like this one.

That was my point re concurrent execution of statements. Nothing to do
with ensuring key uniqueness without inter-node synchronisation in
multi-master environments.

Block-level master/slave synchronous replication, however, is already on
the way. (Also, Slony provides row-level master/slave replication that
seems to work well for a lot of people, though it's widely admitted to
be a bit of a pain to work with and not particularly nice.)

--
Craig Ringer


Re: Replication

From
Scott Marlowe
Date:
On Mon, Jun 22, 2009 at 8:50 PM, Conrad Lender<crlender@gmail.com> wrote:
> On 23/06/09 03:44, Scott Marlowe wrote:
>> On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno<greno@verizon.net> wrote:
>>> Londiste is beta.  The fact that Skype uses it is because it's part
>>> of Skytools which is their product.  They may want to run their own
>>> beta stuff. I don't.
>>
>> So, if they said it was general release, but it sucked, you'd try it,
>> but since they say it's beta, no way?  Wow.  Just wow.  The amount
>> of dumb in that sentence is not measurable with modern
>> instrumentation.
>
> To be fair, the "beta" label has been abused a lot in the last years;
> and what's more, it has been used as an excuse to refuse support (I'm
> looking at Google here). Another point would be that Skype has come
> under attack for using what basically amounts to a black box protocol in
> their main application - many security-minded people are sceptical of
> the company for this reason, and I can't blame them. That said, I do use
> pgbouncer, which is also a Skype project (released under the BSD
> license). After some casual code review I found it to be of good
> quality, and I'm now using it in production environments. I don't think
> it's so unreasonable to be questioning projects which are only available
> as "betas". There was a time when "beta" meant caveat emptor, this
> product is not fully tested, and if it breaks, we'd like to hear about
> it, but we won't be surprised. Trusting such a product with database
> replication may well work, but it's a risk not everybody's willing to take.

Beta or alpha or final or production, they all mean nothing unless
they are applied to a specific piece of code and it's rep.  I've seen
plenty of software that was supposedly supported that was never fixed
or fixed at a leisurely pace (see mysql and packaging mistakes and
innodb order by desc bugs for examples).  I've used "alpha" products
in limited, well tested roles in production that worked and worked
well.  OpenSSL which I trust to do a good job, is 0.9. something right
now, which screams not "release" to me.

What makes code production worthy is that YOU have tested it
thoroughly and that YOU guarantee it to work or you'll fix it as long
as it's used in a way you can test for properly before upgrade /
update deployments.  How fast do fixes come out? How well is it
maintained.  An actively maintained beta may be a better answer in a
moving landscape because it can keep up.  Beta means beta.  And what
that means to an individual developer may not be what you expect it to
be.  The risk is purely non-existent based on the naming of the
release IF IT'S BEEN TESTED PROPERLY.

Re: Replication

From
Scott Marlowe
Date:
On Mon, Jun 22, 2009 at 8:59 PM, Craig
Ringer<craig@postnewspapers.com.au> wrote:

> So ... it doesn't seem likely that statement-level replication would
> ever get far in Pg because of nasty issues like this one.

It's exactly what pg_pool does, and you can choose it if you know what
you're doing.  But yes, it's usually a bad fit for replication by
itself.

> That was my point re concurrent execution of statements. Nothing to do
> with ensuring key uniqueness without inter-node synchronisation in
> multi-master environments.
>
> Block-level master/slave synchronous replication, however, is already on
> the way. (Also, Slony provides row-level master/slave replication that
> seems to work well for a lot of people, though it's widely admitted to
> be a bit of a pain to work with and not particularly nice.)

I think it's real easy to work with, once you understand that "it's
boss". I.e. you do things the slony way, or get used to recreating /
resubscribing a lot of times during maintenance windows when you can
run on one db.  The mis-feature of no ability to drop tables caught me
out.  Now we don't drop tables, period.  We rename and alter to get
around that.  Once I told the developers not to drop tables in order
to change them, things got better.  Really it was bad habits learned
from other dbs.

Re: Replication

From
Craig Ringer
Date:
On Mon, 2009-06-22 at 21:12 -0600, Scott Marlowe wrote:
> On Mon, Jun 22, 2009 at 8:59 PM, Craig
> Ringer<craig@postnewspapers.com.au> wrote:
>
> > So ... it doesn't seem likely that statement-level replication would
> > ever get far in Pg because of nasty issues like this one.
>
> It's exactly what pg_pool does, and you can choose it if you know what
> you're doing.  But yes, it's usually a bad fit for replication by
> itself.

Ah - I didn't realise that pg_pool did statement-based inter-node
replication; I was under the impression that it was primarily for
distribution of read-only queries between multiple clone nodes. Am I
confusing it and pgbouncer? I haven't had any need for tools intended to
scale up client counts, and haven't looked into them much.

In any case, I have a _bit_ less of a problem with the notion of
statement-level replication when it's add-on software rather than a core
part of the DB. To me, it seems like you should be able to trust the DB
to just get it right, and not have to worry about whether something's
safe with replication, etc.

If you have some kind of add-on or intermediary messing with things, at
least it's clear where responsibility lies - with the admin and the
add-on. The DB just does what it's told, consistently and just like
normal. Even so, I'm still very definitely not a fan of something that
can essentially cause silent data corruption as a result of minor
oversights in statement design.

I'm a bit leery of Slony for related reasons - because it messes with
the way Pg works to the point where, as you noted, you can't even drop
tables. Clients have to be very wary of the replication system's quirks
and issues, which I intensely dislike.

I know all these things, like MySQL's statement-level replication, have
their uses and have real advantages in situations where it's worth
trading coding pain and admin time for performance (to the point where
they may in some situations be BETTER than clustering / mm replication),
but I do think they're still VERY ugly. I personally really wouldn't
want to see statement level replication in particular in the core.

As you can probably imagine, I'm really rather excited about the coming
work on synchronous block-level replication to a read-only slave. I'd
also be pretty happy to see DDL triggers and/or triggers on system
tables to allow things like Slony to be a bit less ugly to work with.

> Once I told the developers not to drop tables in order
> to change them, things got better.  Really it was bad habits learned
> from other dbs.

I can understand working around such limitations, but it seems a stretch
to call use of the DB's normal capabilities "bad habits". Well, unless
you mean people using DROP TABLE / CREATE TABLE instead of just using
ALTER TABLE to do the job - but even then, there are times when a drop
and re-create is the preferable option in the absence of external
limitations like Slony.

--
Craig Ringer


Re: Replication

From
Arndt Lehmann
Date:
Hi Craig,

just wanted to mention that there is a new open-source solution
available that now also enables asynchronous, row-based, master-master
replication of PostgreSQL databases.

Name: rubyrep

Project website with full feature list, step-by-step tutorial and
screencast (from zero to running replication in under 5 minutes)
available here:
    http://www.rubyrep.org

Best Regards,
  Arndt Lehmann


Re: Replication

From
Mike Christensen
Date:
Rubyrep looks very interesting, I just watched their 5min video and looks very easy to setup.

Few questions..  The left/right database looks very limiting (you can only replicate two databases at a time)..  Their documentation says that the solution is to setup a chain.  To keep A, B and C in sync, A should replicate with C and B should replicate with C:

http://www.rubyrep.org/replication_between_more_than_two_databases.html

There will be a set of triggers for each replication.  Since MySql doesn't support more than one trigger on a table, this approach won't work which I guess is their way of saying "We're database independent, as long as you use either Postgres or MySql oh and btw we have no replication story above 2 nodes on MySQL"

Also, if database C goes down, then everything goes kaboom, right?  Even if you did A replicates with B, B replicates with C, if one database goes down your chain is broken.  I'm worried about this scenario, and any perf implications with having a whole bunch of triggers on a table.  Maybe someone can comment.

Mike

On Mon, Jun 22, 2009 at 10:23 PM, Arndt Lehmann <arndt.lehmann@gmail.com> wrote:
Hi Craig,

just wanted to mention that there is a new open-source solution
available that now also enables asynchronous, row-based, master-master
replication of PostgreSQL databases.

Name: rubyrep

Project website with full feature list, step-by-step tutorial and
screencast (from zero to running replication in under 5 minutes)
available here:
   http://www.rubyrep.org

Best Regards,
 Arndt Lehmann


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Replication

From
Arndt Lehmann
Date:
Note to self: better proof reading...

On Jun 23, 6:14 pm, Arndt Lehmann <arndt.lehm...@gmail.com> wrote:
> If let's say C goes down, then replication between A and B will not be
> affected.
> Same if A goes down: replication between B and C will continue.
Correct:
Let's say B goes down, then replication between A and C will not be
affected.
Same if A goes down: replication between B and C will continue.

Re: Replication

From
Arndt Lehmann
Date:
Hi Mike

thanks for your interest in rubyrep. I developed rubyrep. Let me
answer your questions.

On Jun 23, 4:16 pm, m...@kitchenpc.com (Mike Christensen) wrote:
> There will be a set of triggers for each replication.  Since MySql doesn't
> support more than one trigger on a table, this approach won't work which I
> guess is their way of saying "We're database independent, as long as you use
> either Postgres or MySql oh and btw we have no replication story above 2
> nodes on MySQL"
The first statement on the rubyrep project website says that rubyrep
provides database independent - currently supporting PostgreSQL and
MySQL - master-master replication [i. e. 2 databases].
That statement is 100% correct.
Regarding multi-master replication (i. e. more than 2 databases) the
according sub page says that it only works for PostgreSQL.
I intended both statements to be very clear and *not* misleading. If
you think they are not, I am interested in hearing your improvement
suggestions.
>
> Also, if database C goes down, then everything goes kaboom, right?  Even if
> you did A replicates with B, B replicates with C, if one database goes down
> your chain is broken.  I'm worried about this scenario,
Assuming that A replicates with C and C with B then once C goes down,
indeed replication will stop.
However nothing goes "kaboom". All changes in either A or B are still
tracked in the according queue tables. As soon as C comes up again,
all pending changes will be replicated.
If let's say C goes down, then replication between A and B will not be
affected.
Same if A goes down: replication between B and C will continue.
>and any perf
> implications with having a whole bunch of triggers on a table.  Maybe
> someone can comment.
The triggers are designed to be as "lean" as possible. Actually all
they do is to add the primary key of a created / modified / deleted
row into the queue table. I don't think that this will cause any
performance impact.
(The actual work is done by the rubyrep process which applies all
changes. That process can be run on a totally different server to
avoid impacting the database server performance.)


Regards,
  Arndt

Re: Replication

From
Devrim GÜNDÜZ
Date:
On Mon, 2009-06-22 at 17:53 -0400, Gerry Reno wrote:
> We need something as good as MySQL Replication.

Either you did not use MySQL Replication, or you don't know what good
means...
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
                   http://www.gunduz.org

Attachment

Re: Replication

From
Devrim GÜNDÜZ
Date:
On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:
> The only one that is remotely viable is slony and it is so quirky you
> may as well forget it.

Like what? I agree that Slony-I is not a plug-in-play replication
solution, but I don't agree that it is so quirky.

> The rest are in some stage of decay/abandonment. There is no real
> replication available for postgresql.

There are:

http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling

IIRC only Slony-II and PGCluster-II is abandoned. PGCluster-I is not
up2date, but the others are being worked on.

--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
                   http://www.gunduz.org

Attachment

Re: Replication

From
Grzegorz Jaśkiewicz
Date:
2009/6/23 Devrim GÜNDÜZ <devrim@gunduz.org>:
> On Mon, 2009-06-22 at 17:53 -0400, Gerry Reno wrote:
>> We need something as good as MySQL Replication.
>
> Either you did not use MySQL Replication, or you don't know what good
> means...

He is referring to user friendliness, and you to quality and reliability.
This is just quintessence of any "mysql vs postgresql" world argument :)


--
GJ

Re: Replication

From
Devrim GÜNDÜZ
Date:
On Tue, 2009-06-23 at 11:00 +0100, Grzegorz Jaśkiewicz wrote:
> > Either you did not use MySQL Replication, or you don't know what
> good
> > means...
>
> He is referring to user friendliness, and you to quality and
> reliability.

I *used* MySQL replication, and I do know what it means. It is not user
friendly, come on.

--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
                   http://www.gunduz.org

Attachment

Re: Replication

From
Grzegorz Jaśkiewicz
Date:
On Tue, Jun 23, 2009 at 11:03 AM, Devrim GÜNDÜZ<devrim@gunduz.org> wrote:

> I *used* MySQL replication, and I do know what it means. It is not user
> friendly, come on.


Well, I used it too - it is. If you want to replicate _everything_.
Never had troubles with it, but that was database holding stuff for
Zope, rest of the crap was in Zope's "database" (which is replicated
by rsync).

But I had so many other issues with mysql, I gave up. It is so bad
now, that I refuse to work on contracts with mysql involved. (random
crashes, data loss, so many bugs, and odd behaviour, it allows
strange sql syntax, etc) - iow, I can honestly believe that they
screwed up replication too. Especially since their motto is, that the
thing must be easy enough for someone without SQL experience. Which
explains everything anyway :)



--
GJ

Re: Replication

From
Jasen Betts
Date:
On 2009-06-23, Craig Ringer <craig@postnewspapers.com.au> wrote:
>
> With Pg it'd break any existing connections, but any database
> application worth a damn must be able to handle re-issuing transactions
> due to deadlocks, resource exhaustion, admin statement cancellation etc
> anyway.

Any app that dies when the database socket fails will itself earn a
few "Damn!"s from users :)


Re: Replication

From
Gerry Reno
Date:
Craig Ringer wrote:
On Mon, 2009-06-22 at 22:20 -0400, Gerry Reno wrote:
 
Here is a link that describes the technique:
http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html?page=1   
Ah. You were referring to multiple-master replication, and your
reference to setting non-overlapping sequences referred to avoiding
collisions caused by inserts on two different masters. Yes, using
non-overlapping allocation ranges for sequences is indeed one way to
handle that, but it's not actually related to what I was talking about
anyway.


What I was referring to in the parent post was an issue with
statement-based replication of concurrent statements sharing a sequence.
It's completely unrelated; both statements are running on the SAME
server (master) and replicating to the slave. 
I think mysql actually turns off autoincrement on the slave inserts and just plugs them in.  Since the masters have non-overlapping allocation this assures that the slaves will have proper sequences as well.

Regards,
Gerry

Re: Replication

From
Merlin Moncure
Date:
2009/6/23 Devrim GÜNDÜZ <devrim@gunduz.org>:
> On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:
>> The only one that is remotely viable is slony and it is so quirky you
>> may as well forget it.
>
> Like what? I agree that Slony-I is not a plug-in-play replication
> solution, but I don't agree that it is so quirky.
>
>> The rest are in some stage of decay/abandonment. There is no real
>> replication available for postgresql.
>
> There are:
>
> http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling
>
> IIRC only Slony-II and PGCluster-II is abandoned. PGCluster-I is not
> up2date, but the others are being worked on.

People interested in so called 'plug and play' replication will
probably want to use 'hot standby' if/when it becomes available.
Unfortunately it did not make the cut for 8.4, but was very usable
during the cycle.  Hot standby gives you n read only copies of the
master database with very little fuss and no setup...it works off the
WAL system which is already giving us point in time backups (which are
already arguably better than anything mysql has, btw).

I expect hot standby will be introduced with postgresql 8.5, in both
sychronous and asychronous variants.  This will nicely round out
postgresql's replication options.  If you are
adventurous/enterprising, you could dig up the hot standby patch, get
it in line with the soon to be released 8.4, and play with it...it
works very well.

merlin

Re: Replication

From
Ray Stell
Date:
On Tue, Jun 23, 2009 at 10:24:28AM -0400, Merlin Moncure wrote:
> If you are
> adventurous/enterprising, you could dig up the hot standby patch, get
> it in line with the soon to be released 8.4, and play with it...it
> works very well.

What is the entry point for source and config documentation of the standby
patch?  Thanks.

Re: Replication

From
Scott Marlowe
Date:
On Mon, Jun 22, 2009 at 9:43 PM, Craig
Ringer<craig@postnewspapers.com.au> wrote:
> On Mon, 2009-06-22 at 21:12 -0600, Scott Marlowe wrote:
>> On Mon, Jun 22, 2009 at 8:59 PM, Craig
>> Ringer<craig@postnewspapers.com.au> wrote:
>>
>> > So ... it doesn't seem likely that statement-level replication would
>> > ever get far in Pg because of nasty issues like this one.
>>
>> It's exactly what pg_pool does, and you can choose it if you know what
>> you're doing.  But yes, it's usually a bad fit for replication by
>> itself.
>
> Ah - I didn't realise that pg_pool did statement-based inter-node
> replication; I was under the impression that it was primarily for
> distribution of read-only queries between multiple clone nodes. Am I
> confusing it and pgbouncer? I haven't had any need for tools intended to
> scale up client counts, and haven't looked into them much.

pgpool can work two ways really.  One way has a single write master,
then you use something like slony to replicate to slaves, and pgpool
can then read from all slaves, or it can do sync write and read to all
machines.

> In any case, I have a _bit_ less of a problem with the notion of
> statement-level replication when it's add-on software rather than a core
> part of the DB. To me, it seems like you should be able to trust the DB
> to just get it right, and not have to worry about whether something's
> safe with replication, etc.

Exactly.

> I'm a bit leery of Slony for related reasons - because it messes with
> the way Pg works to the point where, as you noted, you can't even drop
> tables. Clients have to be very wary of the replication system's quirks
> and issues, which I intensely dislike.

Later versions fixed that bug I believe, but we're still on 1.2, not 2.0.

>> Once I told the developers not to drop tables in order
>> to change them, things got better.  Really it was bad habits learned
>> from other dbs.
>
> I can understand working around such limitations, but it seems a stretch
> to call use of the DB's normal capabilities "bad habits". Well, unless
> you mean people using DROP TABLE / CREATE TABLE instead of just using
> ALTER TABLE to do the job - but even then, there are times when a drop
> and re-create is the preferable option in the absence of external
> limitations like Slony.

What the developers were doing was this:

drop table user_widgets;
create table user_widgets ...

i.e. they were using drop / create table where a simple alter table
add column would suffice.  That is definitely a bad habit in my book.
And keep in mind, the developers never got to run ddl against the
production db, they run it against the dev db, put it into the db dev
notes and then I collect them all and run them in the proper sequence
in slony.  Pain in the butt, but it gets the job done and works well
under heavy daily load.

Re: Replication

From
Scott Marlowe
Date:
2009/6/23 Devrim GÜNDÜZ <devrim@gunduz.org>:
> On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:
>> The only one that is remotely viable is slony and it is so quirky you
>> may as well forget it.
>
> Like what? I agree that Slony-I is not a plug-in-play replication
> solution, but I don't agree that it is so quirky.
>
>> The rest are in some stage of decay/abandonment. There is no real
>> replication available for postgresql.
>
> There are:
>
> http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling
>
> IIRC only Slony-II and PGCluster-II is abandoned. PGCluster-I is not
> up2date, but the others are being worked on.

I wonder how many replication engines for MySQL never got built
because "it already has replication built in"?

Re: Replication

From
Merlin Moncure
Date:
On Tue, Jun 23, 2009 at 11:01 AM, Ray Stell<stellr@cns.vt.edu> wrote:
> On Tue, Jun 23, 2009 at 10:24:28AM -0400, Merlin Moncure wrote:
>> If you are
>> adventurous/enterprising, you could dig up the hot standby patch, get
>> it in line with the soon to be released 8.4, and play with it...it
>> works very well.
>
> What is the entry point for source and config documentation of the standby
> patch?  Thanks.

not sure what you mean by entry point?  getting it running is a snap
if you've already ever done a warm standby setup.  The hard part will
be patching it in.

merlin

Re: Replication

From
Mike Christensen
Date:
Hi Mike

thanks for your interest in rubyrep. I developed rubyrep. Let me
answer your questions.

On Jun 23, 4:16 pm, m...@kitchenpc.com (Mike Christensen) wrote:
> There will be a set of triggers for each replication.  Since MySql doesn't
> support more than one trigger on a table, this approach won't work which I
> guess is their way of saying "We're database independent, as long as you use
> either Postgres or MySql oh and btw we have no replication story above 2
> nodes on MySQL"
The first statement on the rubyrep project website says that rubyrep
provides database independent - currently supporting PostgreSQL and
MySQL - master-master replication [i. e. 2 databases].
That statement is 100% correct.
Regarding multi-master replication (i. e. more than 2 databases) the
according sub page says that it only works for PostgreSQL.
I intended both statements to be very clear and *not* misleading. If
you think they are not, I am interested in hearing your improvement
suggestions.

Hi Arndt - Sorry, my email was just being a little on the sarcastic side.  I don't think your statements are misleading, just a bit marketed.  To me, something "database independent" should be able to be configured to work on any DB, not just two.  And the fact that there's a pretty big limitation in one of those databases really limits your story for MySQL.  I can't say I'm too concerned, as like many others on this thread I'm not the biggest fan of MySQL (I tried it for about a day and kept saying "Wait, it can't do /that/?" too many times so I gave up)

I think unless you want to either develop a full replication story for MySQL or add support for other software such as IBM and Oracle, you might just want to focus all your energy on being the best Postgres replication story around and market as being just that.  It would definitely keep the support costs down having to deal with random bugs on mySQL and trying to please everybody.

>
> Also, if database C goes down, then everything goes kaboom, right?  Even if
> you did A replicates with B, B replicates with C, if one database goes down
> your chain is broken.  I'm worried about this scenario,
Assuming that A replicates with C and C with B then once C goes down,
indeed replication will stop.
However nothing goes "kaboom". All changes in either A or B are still
tracked in the according queue tables. As soon as C comes up again,
all pending changes will be replicated.
If let's say C goes down, then replication between A and B will not be
affected.
Same if A goes down: replication between B and C will continue.
>and any perf
> implications with having a whole bunch of triggers on a table.  Maybe
> someone can comment.
The triggers are designed to be as "lean" as possible. Actually all
they do is to add the primary key of a created / modified / deleted
row into the queue table. I don't think that this will cause any
performance impact.
(The actual work is done by the rubyrep process which applies all
changes. That process can be run on a totally different server to
avoid impacting the database server performance.)


Thanks for clarifying.  After sending the email I put some thought into what a true fault tolerant replication story would entale and it seems like it'd be a complete nightmare to design.  Does anyone actually have that (any node can go down and the others still replicate amongst themselves?)

On my system, if the master DB goes down and some data is out of date by a few minutes or an hour or so shouldn't cause too many problems so long as "half a transaction" isn't replicated or anything silly like that.  The only scenario I could come up with that would be annoying is if I user created a new account on the system, then switched over to another webserver/DB node and all of a sudden it said their user account didn't exist.

The "left/right" database idea I can see is a very straight forward architecture and easy to setup and maintain, I'm just worried if you had a /huge/ cluster (Like Facebook size) then it really puts a massive strain on your master DB, since with every piece of data that changes, keys have to be inserted in dozens of tables.  If seems if you could optimize this redundancy and have all the triggers using the same table, it might scale better.  However, I haven't done any conclusive studies to back this statement up :)

Keep up the good work!  One of these days I have to start looking into replication stories before my site goes live, and rubyrep will for sure be on the list.

Mike

Re: Replication

From
Thomas Kellerer
Date:
Mike Christensen wrote on 23.06.2009 19:37:
> Does anyone actually have that (any node can go down and the others still
> replicate amongst themselves?)

I think this is what Oracle promises with their RAC technology.

Thomas

Re: Replication

From
Ray Stell
Date:
On Tue, Jun 23, 2009 at 01:31:59PM -0400, Merlin Moncure wrote:
> not sure what you mean by entry point?  getting it running is a snap
> if you've already ever done a warm standby setup.  The hard part will
> be patching it in.

In my case, I'd need to know where the patch can be downloaded and where
it is documented.  Entry point, as in the best place to get started.

Re: Replication

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> It is true.  Otherwise show me a viable replication offering for
> postgresql that I can put into production and obtain support for it.

It depends on what you mean by "replication", but if you need
master-master (or even a really good master-multislave) you
can evaluate (and get support for) Bucardo:

http://bucardo.org

As with all replication systems, there are some things it does better
than other systems (i.e. MySQL's), and some things it does worse.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200906231414
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkpBG4wACgkQvJuQZxSWSsiIWQCg87Vy6iHu7ytx4yYeB311PG9u
KCQAn2bwerlEYXVtvFR8IFEINqblOTj1
=h7cE
-----END PGP SIGNATURE-----



Re: Replication

From
Glyn Astill
Date:
--- On Mon, 22/6/09, Gerry Reno <greno@verizon.net> wrote:

> Have you ever tried any of the postgresql replication
> offerings? The only one that is remotely viable is slony and
> it is so quirky you may as well forget it. The rest are in
> some stage of decay/abandonment. There is no real
> replication available for postgresql. Postgresql needs to
> develop a real replication offering for postgresql. Builtin
> or a separate module.
>

There was a similar thread on the Ferrari mailing list last week; some chap asking why the FFX didn't have a big red
buttonto "make the steering go light". Apparently it is too hard to drive, whereas the Fiat Punto is easy and has this
magictechnology. 

Seriously though, we use slony here in production and whilst it can be a pain in the arse at times it's a solid piece
ofkit. And bucardo, mammoth, londisite, pgpool are all good solutions - as long as you make yourself familiar with the
one(s)you choose. 

I've used the binlog streaming replication in mysql before, but I wouldn't trust it with my employer’s data.





Re: Replication

From
Emanuel Calvo Franco
Date:
2009/6/23 Greg Sabino Mullane <greg@turnstep.com>:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
>
>> It is true.  Otherwise show me a viable replication offering for
>> postgresql that I can put into production and obtain support for it.
>
> It depends on what you mean by "replication", but if you need
> master-master (or even a really good master-multislave) you
> can evaluate (and get support for) Bucardo:
>
> http://bucardo.org
>


There is another Replication Tool for Postgresql: PyReplica.

http://pgfoundry.org/projects/pyreplica/

The project is growing and have a very easy setup and administration.



--
              Emanuel Calvo Franco
ArPUG [www.arpug.com.ar] / AOSUG Member
        www.emanuelcalvofranco.com.ar

Re: Replication

From
Craig Ringer
Date:
Thomas Kellerer wrote:
> Mike Christensen wrote on 23.06.2009 19:37:
>> Does anyone actually have that (any node can go down and the others still
>> replicate amongst themselves?)
>
> I think this is what Oracle promises with their RAC technology.

Isn't RAC a shared-storage cluster?

--
Craig Ringer

Re: Replication

From
Scott Mead
Date:

On Tue, Jun 23, 2009 at 10:07 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
Thomas Kellerer wrote:
> Mike Christensen wrote on 23.06.2009 19:37:
>> Does anyone actually have that (any node can go down and the others still
>> replicate amongst themselves?)
>
> I think this is what Oracle promises with their RAC technology.

Isn't RAC a shared-storage cluster?

 Shared-

   Storage
   Memory
   Listener

   Storage : Obvious
   Memory: Cache fusion maintains buffer consistency across the network
   Listener: Load balancing and failover of connections are handled transparently by TNS (transparent network substrate).

--Scott 


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Replication

From
Dimitri Fontaine
Date:
Hi,

Gerry Reno <greno@verizon.net> writes:
>> You can also get support for Londiste (which
>> is used in production by Skype... I think that speaks for itself).
>
> Londiste is beta.  The fact that Skype uses it is because it's part of
> Skytools which is their product.  They may want to run their own beta stuff.
> I don't.

Yes, Londiste is beta, indeed. What it means is that the current stable
branch, skytools_2_1_stable, is not feature complete. It's production
ready and only receives bugfixes, and has been for several months.

I happen to run it on several production systems on quite different
contexts, and in some of them I could not use a synchronous replication
system, as it must tolerate network glitches: the service still has some
incoming queries when TCP/IP is down. The 30 subscribers are all keeping
up nicely, and they all are also provider to the central server which
federates the data.

The missing features of londiste 2.1 are being stabilized in next
skytools version, 3.0, of which an alpha release has been made, so that
you can easily check what it's worth.

Summary: yes londiste is beta, but that means it won't handle automatic
DDL scripting, cascading replication and automated failover. It's
stable, reliable, production quality code. I know it because I'm on call
every other week and in 2009 I'm yet to receive a Skytools related
alarm. I really hate it when people call me at nigth, that's one of the
reasons I use PostgreSQL and Skytools :)

Regards,
--
dim

Re: Replication

From
Thomas Kellerer
Date:
Craig Ringer wrote on 24.06.2009 04:07:
> Thomas Kellerer wrote:
>> Mike Christensen wrote on 23.06.2009 19:37:
>>> Does anyone actually have that (any node can go down and the others still
>>> replicate amongst themselves?)
>> I think this is what Oracle promises with their RAC technology.
>
> Isn't RAC a shared-storage cluster?

As far as I know it is also a cluster of servers which can even hand over the
processing of a single statement "in-flight" if the node goes down that is
current processing the statement.

Thomas

Re: Replication

From
Mike Christensen
Date:
We need to stop this thread, you guys are making me want to ditch Postgres and get Oracle (after taking out a second mortgage on my house that is)..

Mike

On Wed, Jun 24, 2009 at 9:40 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
Craig Ringer wrote on 24.06.2009 04:07:

Thomas Kellerer wrote:
Mike Christensen wrote on 23.06.2009 19:37:
Does anyone actually have that (any node can go down and the others still
replicate amongst themselves?)
I think this is what Oracle promises with their RAC technology.

Isn't RAC a shared-storage cluster?

As far as I know it is also a cluster of servers which can even hand over the processing of a single statement "in-flight" if the node goes down that is current processing the statement.

Thomas



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Replication

From
"Joshua D. Drake"
Date:
On Wed, 2009-06-24 at 12:13 -0700, Mike Christensen wrote:
> We need to stop this thread, you guys are making me want to ditch
> Postgres and get Oracle (after taking out a second mortgage on my
> house that is)..

You must have a really big house... I could send all of my kids to
college (4), easily for the cost of RAC.

Joshua d. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Replication

From
Geoffrey
Date:
Mike Christensen wrote:
> We need to stop this thread, you guys are making me want to ditch
> Postgres and get Oracle (after taking out a second mortgage on my house
> that is)..

You must have a REALLY nice house with a LOT of equity...

>
> Mike
>
> On Wed, Jun 24, 2009 at 9:40 AM, Thomas Kellerer <spam_eater@gmx.net
> <mailto:spam_eater@gmx.net>> wrote:
>
>     Craig Ringer wrote on 24.06.2009 04:07:
>
>         Thomas Kellerer wrote:
>
>             Mike Christensen wrote on 23.06.2009 19:37:
>
>                 Does anyone actually have that (any node can go down and
>                 the others still
>                 replicate amongst themselves?)
>
>             I think this is what Oracle promises with their RAC technology.
>
>
>         Isn't RAC a shared-storage cluster?
>
>
>     As far as I know it is also a cluster of servers which can even hand
>     over the processing of a single statement "in-flight" if the node
>     goes down that is current processing the statement.
>
>     Thomas
>
>
>
>     --
>     Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>     <mailto:pgsql-general@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-general
>
>


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

Re: Replication

From
Scott Marlowe
Date:
Not me.  I've fed and watered PostgreSQL for the better part of a
decade and did so for Oracle for three years.  At no point did I want
to switch preference from pgsql ro Oracle.  There were some things
Oracle did I wanted to see in pgsql, but the overall package of pgsql
is way better in my opinion.  And part of that is NOT putting things
in core before they're ready.

On Wed, Jun 24, 2009 at 1:13 PM, Mike Christensen<mike@kitchenpc.com> wrote:
> We need to stop this thread, you guys are making me want to ditch Postgres
> and get Oracle (after taking out a second mortgage on my house that is)..
>
> Mike
>
> On Wed, Jun 24, 2009 at 9:40 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>>
>> Craig Ringer wrote on 24.06.2009 04:07:
>>>
>>> Thomas Kellerer wrote:
>>>>
>>>> Mike Christensen wrote on 23.06.2009 19:37:
>>>>>
>>>>> Does anyone actually have that (any node can go down and the others
>>>>> still
>>>>> replicate amongst themselves?)
>>>>
>>>> I think this is what Oracle promises with their RAC technology.
>>>
>>> Isn't RAC a shared-storage cluster?
>>
>> As far as I know it is also a cluster of servers which can even hand over
>> the processing of a single statement "in-flight" if the node goes down that
>> is current processing the statement.
>>
>> Thomas
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>



--
When fascism comes to America, it will be intolerance sold as diversity.

Re: Replication

From
Greg Smith
Date:
On Tue, 23 Jun 2009, Ray Stell wrote:

> What is the entry point for source and config documentation of the standby
> patch?

http://wiki.postgresql.org/wiki/Hot_Standby

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Replication

From
Jasen Betts
Date:
On 2009-06-23, Mike Christensen <mike@kitchenpc.com> wrote:

> Does anyone actually have that (any node
> can go down and the others still replicate amongst themselves?)

you are describing usenet. (it was not designed for relational database
replication but it does have that feature)


Re: Replication

From
Dimitri Fontaine
Date:
Hi,

Le 25 juin 09 à 11:26, Jasen Betts a écrit :
> On 2009-06-23, Mike Christensen <mike@kitchenpc.com> wrote:
>
>> Does anyone actually have that (any node
>> can go down and the others still replicate amongst themselves?)
>
> you are describing usenet. (it was not designed for relational
> database
> replication but it does have that feature)

   http://postgres-r.org/about/about

   Reliability, Availability and Scalability
Mirroring the data on multiple nodes enhances reliability of the
database system. As Postgres-R is using a group communication system
it is easily possible to add or remove nodes on the running system.
Crashing nodes are automatically detected and removed, thus do not
affect the operating of the database system. This facilitates
administration tasks and ensures greater availability and scalability
of the database system.

Regards,
--
dim

Re: Replication

From
Gerry Reno
Date:
Dimitri Fontaine wrote:
> Hi,
>
> Le 25 juin 09 à 11:26, Jasen Betts a écrit :
>> On 2009-06-23, Mike Christensen <mike@kitchenpc.com> wrote:
>>
>>> Does anyone actually have that (any node
>>> can go down and the others still replicate amongst themselves?)
>>
>> you are describing usenet. (it was not designed for relational database
>> replication but it does have that feature)
>
>   http://postgres-r.org/about/about
>
>   Reliability, Availability and Scalability
> Mirroring the data on multiple nodes enhances reliability of the
> database system. As Postgres-R is using a group communication system
> it is easily possible to add or remove nodes on the running system.
> Crashing nodes are automatically detected and removed, thus do not
> affect the operating of the database system. This facilitates
> administration tasks and ensures greater availability and scalability
> of the database system.
>
> Regards,
> --dim
I think some disclosure is in order about postgres-r:

"Please note that this is *experimental* software. primarily released
for review by developers familiar with Postgres."

Looks interesting though...

Regards,
Gerry


Re: Replication

From
Simon Riggs
Date:
On Mon, 2009-06-22 at 17:53 -0400, Gerry Reno wrote:

> I noticed that the user survey on the community page does not list
> replication among the choices for development priority.  For me,
> replication is the most important thing that is critically missing from
> postgresql.  We need something as good as MySQL Replication.  Both
> statement-based and row-based replication.  And support for
> Master-Master and full cyclic replication setups.  Postgresql is just a
> toy database without this as far as I am concerned.

Just wanted to add thanks for putting this so succinctly. It's important
we listen to people that explain why PostgreSQL isn't good enough yet.
We'll do what we can in PostgreSQL 8.5.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Replication

From
Tim Uckun
Date:
Does anybody have any experience with tungsten or sequia they would
like to share?

How about pgcluster or cybercluster?

There are a lot of options but not a lot of insight or documentation really.


Also note that the postgres-r web site says it's not production ready
and it will take months if not years to do so.