Thread: Best replication solution?

Best replication solution?

From
Lists
Date:
I am looking to setup replication of my postgresql database, primarily for performance reasons.

The searching I've done shows a lot of different options, can anyone give suggestions about which one(s) are best? I've read the archives, but there seems to be more replication solutions since the last thread on this subject and it seems to change frequently.

I'd really like a solution that replicates DDL, but very few do so I think I'm out of luck for that. I can live without it.
Multi-master support would be nice too, but also seems to cause too many problems so it looks like I'll have to do without it too.


Slony-I - I've used this in the past, but it's a huge pain to work with, caused serious performance issues under heavy load due to long running transactions (may not be the case anymore, it's been a while since I used it on a large database with many writes), and doesn't seem very reliable (I've had replication break on me multiple times).

Mammoth Replicator - This is open source now, is it any good? It sounds like it's trigger based like Slony. Is it based on Slony, or simply use a similar solution?

pgpool - Won't work for us reliably for replication because we have some triggers and stored procedures that write data.

PGCluster - Sounds cool, but based on the mailing list traffic and the last news post on the site being from 2005, development seems to be near dead. Also, no releases seems to make it beyond the RC stage -- for multi-master stability is particularly important for data integrity.

PGReplicator - Don't know anything special about it.

Bucardo
- Don't know anything special about it.

Postgres-R - Don't know anything special about it.

SkyTools/Londiste - Don't know anything special about it.

Re: Best replication solution?

From
Lists
Date:
I have a high traffic database with high volumes of reads, and moderate volumes of writes. Millions of queries a day.

Running the latest version of Postgresql 8.2.x (I want to upgrade to 8.3, but the dump/reload requires an unacceptable amount of downtime)

Server is a dual core xeon 3GB ram and 2 mirrors of 15k SAS drives (1 for most data, 1 for wal and a few tables and indexes)

In total all databases on the server are about 10G on disk (about 2GB in pgdump format).


The IO on the disks is being maxed out and I don't have the budget to add more disks at this time. The web server has a raid10 of sata drives with some io bandwidth to spare so I would like to replicate all data over, and send some read queries to that server -- in particular the very IO intensive FTI based search queries.


ries van Twisk wrote:
Dr Mr No Name,

what replication solution is the best depends on your requirements.
May be you can tell a bit more what your situation is?
Since you didn't gave us to much information about your requirements it's hard to give you any advice.

Ries

On Apr 5, 2009, at 1:36 PM, Lists wrote:

I am looking to setup replication of my postgresql database, primarily for performance reasons.

The searching I've done shows a lot of different options, can anyone give suggestions about which one(s) are best? I've read the archives, but there seems to be more replication solutions since the last thread on this subject and it seems to change frequently.

I'd really like a solution that replicates DDL, but very few do so I think I'm out of luck for that. I can live without it.
Multi-master support would be nice too, but also seems to cause too many problems so it looks like I'll have to do without it too.


Slony-I - I've used this in the past, but it's a huge pain to work with, caused serious performance issues under heavy load due to long running transactions (may not be the case anymore, it's been a while since I used it on a large database with many writes), and doesn't seem very reliable (I've had replication break on me multiple times).

Mammoth Replicator - This is open source now, is it any good? It sounds like it's trigger based like Slony. Is it based on Slony, or simply use a similar solution?

pgpool - Won't work for us reliably for replication because we have some triggers and stored procedures that write data.

PGCluster - Sounds cool, but based on the mailing list traffic and the last news post on the site being from 2005, development seems to be near dead. Also, no releases seems to make it beyond the RC stage -- for multi-master stability is particularly important for data integrity.

PGReplicator - Don't know anything special about it.

Bucardo
- Don't know anything special about it.

Postgres-R - Don't know anything special about it.

SkyTools/Londiste - Don't know anything special about it.







Re: Best replication solution?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Running the latest version of Postgresql 8.2.x (I want to upgrade to
> 8.3, but the dump/reload requires an unacceptable amount of downtime)

You can use Slony or Bucardo to ugrade in place. Both will incur some
overhead and more overall complexity than a dump/reload, but going to
8.3 is well worth it (and will bring your IO down).

> The IO on the disks is being maxed out and I don't have the budget to
> add more disks at this time. The web server has a raid10 of sata drives
> with some io bandwidth to spare so I would like to replicate all data
> over, and send some read queries to that server -- in particular the
> very IO intensive FTI based search queries.

Sounds like a good solution for a table-based, read-only-slaves solutions,
especially if you only need enough of the schema to perform some of the
more intense queries. Again, Slony and Bucardo are probably the best fit.
All this assumes that the tables in question have some sort of unique key,
you aren't using large objects, or changing DDL frequently. I'd give Slony a
second try and Bucardo a first one on your QA/test cluster and see how
they work out for you. You could even make the read-only slaves 8.3, since
they will be starting from scratch.

Of course, if the underlying problem replication is trying to solve is too
much search traffic (e.g. select queries) on the main database, there are other
solutions you could consider (e.g. external search such as Sphinx or SOLR,
caching solutions such as Squid or Varnish, moving the slaves to the cloud, etc.)

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200904052158
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAknZZMgACgkQvJuQZxSWSsjbcgCfWqTUEDGlDqAnLaCAhcJlSLCk
EVMAni0oCevrnMdZ2Fuw8Tysaxp3q+/U
=0vu6
-----END PGP SIGNATURE-----



Re: Best replication solution?

From
Heikki Linnakangas
Date:
Lists wrote:
> Server is a dual core xeon 3GB ram and 2 mirrors of 15k SAS drives (1
> for most data, 1 for wal and a few tables and indexes)
>
> In total all databases on the server are about 10G on disk (about 2GB in
> pgdump format).

I'd suggest buying as much RAM as you can fit into the server. RAM is
cheap, and with a database of that size more cache could have a dramatic
effect.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Best replication solution?

From
Andrew Sullivan
Date:
On Sun, Apr 05, 2009 at 11:36:33AM -0700, Lists wrote:

> *Slony-I* - I've used this in the past, but it's a huge pain to work
> with, caused serious performance issues under heavy load due to long
> running transactions (may not be the case anymore, it's been a while
> since I used it on a large database with many writes), and doesn't seem
> very reliable (I've had replication break on me multiple times).

It is indeed a pain to work with, but I find it hard to believe that
it is the actual source of performance issues.  What's more likely
true is that it wasn't tuned to your write load -- that _will_ cause
performance issues.  Of course, tuning it is a major pain, as
mentioned.  I'm also somewhat puzzled by the claim of unreliability:
most of the actual replication failures I've ever seen under Slony are
due to operator error (these are trivial to induce, alas --
aforementioned pain to work with again).  Slony is baroque and
confusing, but it's specifically designed to fail in safe ways (which
is not true of some of the other systems: several of them have modes
in which it's possible to have systems out of sync with each other,
but with no way to detect as much.  IMO, that's much worse, so we
designed Slony to fail noisily if it was going to fail at all).

> *Mammoth Replicator* - This is open source now, is it any good? It
> sounds like it's trigger based like Slony. Is it based on Slony, or
> simply use a similar solution?

It's completely unrelated, and it doesn't use triggers.  I think the
people programming it are first-rate.  Last I looked at it, I felt a
little uncomfortable with certain design choices, which seemed to me
to be a little hacky.  They were all on the TODO list, though.

> *SkyTools/Londiste* - Don't know anything special about it.

I've been quite impressed by the usability.  It's not quite as
flexible as Slony, but it has the same theory of operation.  The
documentation is not as voluminous, although it's also much handier as
reference material than Slony's (which is, in my experience, a little
hard to navigate if you don't already know the system pretty well).

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Best replication solution?

From
Lists
Date:
I'm currently running 32bit FreeBSD so I can't really add more ram (PAE
doesn't work well under FreeBSD from what I've read) and there are
enough writes that more ram won't solve the problem completely.

However I will add plenty more ram next time I rebuild it.


Heikki Linnakangas wrote:
> Lists wrote:
>> Server is a dual core xeon 3GB ram and 2 mirrors of 15k SAS drives (1
>> for most data, 1 for wal and a few tables and indexes)
>>
>> In total all databases on the server are about 10G on disk (about 2GB
>> in pgdump format).
>
> I'd suggest buying as much RAM as you can fit into the server. RAM is
> cheap, and with a database of that size more cache could have a
> dramatic effect.
>


Re: Best replication solution?

From
Lists
Date:
Andrew Sullivan wrote:
On Sun, Apr 05, 2009 at 11:36:33AM -0700, Lists wrote:
 
*Slony-I* - I've used this in the past, but it's a huge pain to work  
with, caused serious performance issues under heavy load due to long  
running transactions (may not be the case anymore, it's been a while  
since I used it on a large database with many writes), and doesn't seem  
very reliable (I've had replication break on me multiple times).   
It is indeed a pain to work with, but I find it hard to believe that
it is the actual source of performance issues.  What's more likely
true is that it wasn't tuned to your write load -- that _will_ cause
performance issues.  
Can you point me in the direction of the documentation for tuning it? I don't see anything in the documentation for tuning for write load.

Of course, tuning it is a major pain, as
mentioned.  I'm also somewhat puzzled by the claim of unreliability:
most of the actual replication failures I've ever seen under Slony are
due to operator error (these are trivial to induce, alas --
aforementioned pain to work with again).  
Recently I had a problem with "duplicate key" errors on the slave, which shouldn't be possible since they keys are the same.
I've just noticed in the documentation that
The Duplicate Key Violation bug has helped track down a number of rather obscure PostgreSQL race conditions, so that in modern versions of Slony-I and PostgreSQL, there should be little to worry about.
so that may no longer be an issue. However I experienced with this the latest Slony (as of late last year) and Postgresql 8.3.

Also the dupe key error linked appears to be duplicate key of slony meta-data were as this was a duplicate key of one of my table's primary key.
Slony is baroque and
confusing, but it's specifically designed to fail in safe ways (which
is not true of some of the other systems: several of them have modes
in which it's possible to have systems out of sync with each other,
but with no way to detect as much.  IMO, that's much worse, so we
designed Slony to fail noisily if it was going to fail at all).   
An error is better than silently failing, but of course neither is optimal.

The slony project could really benefit from a simpler user interface and simpler documentation. It's integration into pgadminIII is a good step, but even with that it is still a bit of a pain so I hope it continues to improve in ease of use.

Being powerful and flexable is good, but ease of use with sensible defaults for complex items that can be easily overridden is even better.

 
*Mammoth Replicator* - This is open source now, is it any good? It  
sounds like it's trigger based like Slony. Is it based on Slony, or  
simply use a similar solution?   
It's completely unrelated, and it doesn't use triggers.  I think the
people programming it are first-rate.  Last I looked at it, I felt a
little uncomfortable with certain design choices, which seemed to me
to be a little hacky.  They were all on the TODO list, though.
 
*SkyTools/Londiste* - Don't know anything special about it.   
I've been quite impressed by the usability.  It's not quite as
flexible as Slony, but it has the same theory of operation.  The
documentation is not as voluminous, although it's also much handier as
reference material than Slony's (which is, in my experience, a little
hard to navigate if you don't already know the system pretty well).

A
 
Thanks, I'll look into both of those as well.

Re: Best replication solution?

From
Dimitri Fontaine
Date:
On Monday 06 April 2009 14:35:30 Andrew Sullivan wrote:
> > *SkyTools/Londiste* - Don't know anything special about it.
>
> I've been quite impressed by the usability.  It's not quite as
> flexible as Slony, but it has the same theory of operation.  The
> documentation is not as voluminous, although it's also much handier as
> reference material than Slony's (which is, in my experience, a little
> hard to navigate if you don't already know the system pretty well).

As a londiste user I find it really trustworthy solution, and very easy to use
and understand. We made some recent efforts on documentation front:
  http://wiki.postgresql.org/wiki/SkyTools
  http://wiki.postgresql.org/wiki/Londiste_Tutorial

Regards,
--
dim

Attachment

Re: Best replication solution?

From
Ivan Voras
Date:
Lists wrote:
> I'm currently running 32bit FreeBSD so I can't really add more ram (PAE
> doesn't work well under FreeBSD from what I've read)

That's probably left-over from the time many drivers were not 64-bit
friendly. I've yet to see a new configuration that doesn't work with PAE
(also, the default "PAE" configuration file is too conservative. Drivers
that work on amd64 should work on PAE without problems). In any case,
it's easy to try it - you can always boot the kernel.old.




Attachment

Re: Best replication solution?

From
Mark Kirkwood
Date:
Andrew Sullivan wrote:
> On Sun, Apr 05, 2009 at 11:36:33AM -0700, Lists wrote:
>
>
>> *Slony-I* - I've used this in the past, but it's a huge pain to work
>> with, caused serious performance issues under heavy load due to long
>> running transactions (may not be the case anymore, it's been a while
>> since I used it on a large database with many writes), and doesn't seem
>> very reliable (I've had replication break on me multiple times).
>>
>
>
> I'm also somewhat puzzled by the claim of unreliability:
> most of the actual replication failures I've ever seen under Slony are
> due to operator error (these are trivial to induce, alas --
> aforementioned pain to work with again).  Slony is baroque and
> confusing, but it's specifically designed to fail in safe ways (which
> is not true of some of the other systems: several of them have modes
> in which it's possible to have systems out of sync with each other,
> but with no way to detect as much.  IMO, that's much worse, so we
> designed Slony to fail noisily if it was going to fail at all).
>
>

 From my experience - gained from unwittingly being in the wrong place
at the wrong time and so being volunteered into helping people with
Slony failures - it seems to be quite possible to have nodes out of sync
and not be entirely aware of it - in addition to there being numerous
ways to shoot yourself in the foot via operator error. Complexity seems
to be the major evil here.

I've briefly experimented with Londiste, and it is certainly much
simpler to administer. Currently it lacks a couple of features Slony has
(chained slaves and partial DDL support), but I'll be following its
development closely - because if these can be added - whilst keeping the
operator overhead (and the foot-gun) small, then this looks like a winner.

regards

Mark


Re: Best replication solution?

From
Andrew Sullivan
Date:
On Mon, Apr 06, 2009 at 09:07:05PM -0700, Lists wrote:

> Can you point me in the direction of the documentation for tuning it? I
> don't see anything in the documentation for tuning for write load.

No, exactly.  As I said, it's a pain.  The main thing you need to do
is to make sure that your set size is just right for your workload.
The only way to get this right, unhappily, is trial and error and a
bunch of oral-tradition rules of thumb.  It's one of the weakest parts
of Slony from a user's point of view, IMO, but nobody's ever offered
to do the work to make really good tuning tools.

> Recently I had a problem with "duplicate key" errors on the slave, which
> shouldn't be possible since they keys are the same.
> I've just noticed in the documentation that
>
>    The Duplicate Key Violation
>    <http://www.slony.info/documentation/faq.html#DUPKEY> bug has helped
>    track down a number of rather obscure PostgreSQL race conditions, so
>    that in modern versions of Slony-I and PostgreSQL, there should be
>    little to worry about.
>
> so that may no longer be an issue. However I experienced with this the
> latest Slony (as of late last year) and Postgresql 8.3.

That problem was quite an old one.  "8.3" doesn't tell me very much,
but the issues should be covered there anyway.  It is of course
logically possible that there is a bug.  Often, however, the duplicate
key violations I've seen turn out to be from operator error.  There
are a _lot_ of sharp, pointy bits in Slony administration, and it's
nearly trivial to make an apparently innocuous error that causes you
this sort of big pain.

> Also the dupe key error linked appears to be duplicate key of slony
> meta-data were as this was a duplicate key of one of my table's primary
> key.

This really ought to be impossible -- Slony just speaks standard SQL
statements between nodes.  But I won't say there's no possible bug
there.  Your best bet is the Slony list.  It's a smallish community,
though, so you don't always get the response as fast as you want.

A
--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Best replication solution?

From
Andrew Sullivan
Date:
On Tue, Apr 07, 2009 at 10:31:02PM +1200, Mark Kirkwood wrote:
>
> From my experience - gained from unwittingly being in the wrong place at
> the wrong time and so being volunteered into helping people with Slony
> failures - it seems to be quite possible to have nodes out of sync and
> not be entirely aware of it

I should have stated that differently.  First, you're right that if
you don't know where to look or what to look for, you can easily be
unaware of nodes being out of sync.  What's not a problem with Slony
is that the nodes can get out of internally consistent sync state: if
you have a node that is badly lagged, at least it represents, for
sure, an actual point in time of the origin set's history.  Some of
the replication systems aren't as careful about this, and it's
possible to get the replica into a state that never happened on the
origin.  That's much worse, in my view.

In addition, it is not possible that Slony's system tables report the
replica as being up to date without them actually being so, because
the system tables are updated in the same transaction as the data is
sent.  It's hard to read those tables, however, because you have to
check every node and understand all the states.

>  Complexity seems to be the major evil here.

Yes.  Slony is massively complex.

> simpler to administer. Currently it lacks a couple of features Slony has
> (chained slaves and partial DDL support), but I'll be following its
> development closely - because if these can be added - whilst keeping the
> operator overhead (and the foot-gun) small, then this looks like a
> winner.

Well, those particular features -- which are indeed the source of much
of the complexity in Slony -- were planned in from the beginning.
Londiste aimed to be simpler, so it would be interesting to see
whether those features could be incorporated without the same
complication.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Best replication solution?

From
Mark Kirkwood
Date:
Andrew Sullivan wrote:
>
> I should have stated that differently.  First, you're right that if
> you don't know where to look or what to look for, you can easily be
> unaware of nodes being out of sync.  What's not a problem with Slony
> is that the nodes can get out of internally consistent sync state: if
> you have a node that is badly lagged, at least it represents, for
> sure, an actual point in time of the origin set's history.  Some of
> the replication systems aren't as careful about this, and it's
> possible to get the replica into a state that never happened on the
> origin.  That's much worse, in my view.
>
> In addition, it is not possible that Slony's system tables report the
> replica as being up to date without them actually being so, because
> the system tables are updated in the same transaction as the data is
> sent.  It's hard to read those tables, however, because you have to
> check every node and understand all the states.
>
>
>
Yes, and nicely explained!

> (on Londiste DDL + slave chaining)...
> Well, those particular features -- which are indeed the source of much
> of the complexity in Slony -- were planned in from the beginning.
> Londiste aimed to be simpler, so it would be interesting to see
> whether those features could be incorporated without the same
> complication.
>
>
>
>
Yeah, that's the challenge!

Personally I would like DDL to be possible without any special wrappers
or precautions, as the usual (accidental) breakage I end up looking at
in Slony is because someone (or an app's upgrade script) has performed
an ALTER TABLE directly on the master schema...

Cheers

Mark


Re: Best replication solution?

From
Marinos Yannikos
Date:
Heikki Linnakangas wrote:
> Lists wrote:
>> Server is a dual core xeon 3GB ram and 2 mirrors of 15k SAS drives (1
>> for most data, 1 for wal and a few tables and indexes)
>>
>> In total all databases on the server are about 10G on disk (about 2GB
>> in pgdump format).
>
> I'd suggest buying as much RAM as you can fit into the server. RAM is
> cheap, and with a database of that size more cache could have a dramatic
> effect.

I'll second this. Although it doesn't really answer the original
question, you have to keep in mind that for read-intensive workloads,
caching will give you the biggest benefit by far, orders of magnitude
more than replication solutions unless you want to spend a lot of $ on
hardware (which I take it you don't if you are reluctant to add new
disks). Keeping the interesting parts of the DB completely in RAM makes
a big difference, common older (P4-based) Xeon boards can usually be
upgraded to 12-16GB RAM, newer ones to anywhere between 16 and 192GB ...

As for replication solutions - Slony I wouldn't recommend (tried it for
workloads with large writes - bad idea), but PgQ looks very solid and
you could either use Londiste or build your own very fast non-RDBMS
slaves using PgQ by keeping the data in an optimized format for your
queries (e.g. if you don't need joins - use TokyoCabinet/Berkeley DB).

Regards,
  Marinos


Re: Best replication solution?

From
Jeff
Date:
On Apr 7, 2009, at 1:18 PM, Andrew Sullivan wrote:

> I should have stated that differently.  First, you're right that if
> you don't know where to look or what to look for, you can easily be
> unaware of nodes being out of sync.  What's not a problem with Slony

_$cluster.sl_status on the origin is a very handy tool to see your
slaves, how many sync's behind they are and whatnot.  Maybe I'm lucky,
but I haven't got into a funky state that didn't cause my alarms that
watch sl_status to go nuts.

>> Complexity seems to be the major evil here.
>
> Yes.  Slony is massively complex.
>

Configuring slony by hand using slonik commands does suck horribly.
But the included altperl tools that come with it, along with
slon_tools.conf removes a HUGE amount of that suck.

To add a table with a pk you edit slon_tools.conf and add something
along the lines of:

"someset" => {
    "set_id" => 5,
    "table_id" => 5,
    "pkeyedtables" => [ "tacos", "burritos", "gorditas" ]
}

then you just run

[create tables on slave(s)]
slonik_create_set someset;
slonik_subscribe_set 1 2;

there are other handy scripts in there as well for failing over,
adding tables, merging, etc. that hide a lot of the suck.  Especially
the suck of adding a node and creating the store paths.

I'm running slony on a rather write intensive system, works fine, just
make sure you've got beefy IO.  One sucky thing though is if a slave
is down sl_log can grow very large (I've had it get over 30M rows, the
slave was only down for hours) and this causes major cpu churn while
the queries slon issues sift through tons of data.  But, to be fair,
that'll hurt any replication system.

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




Re: Best replication solution?

From
Dimitri Fontaine
Date:
Hi,

Ok I need to answer some more :)

Le 8 avr. 09 à 20:20, Jeff a écrit :
> To add a table with a pk you edit slon_tools.conf and add something
> along the lines of:
>
> "someset" => {
>     "set_id" => 5,
>     "table_id" => 5,
>     "pkeyedtables" => [ "tacos", "burritos", "gorditas" ]
> }
>
> then you just run
>
> [create tables on slave(s)]
> slonik_create_set someset;
> slonik_subscribe_set 1 2;

  $ londiste.py setup.ini provider add schema.table
  $ londiste.py setup.ini subscriber add schema.table

Note both of those commands are to be run from the same host (often
enough, the slave), if you have more than one slave, issue the second
of them only on the remaining ones.

> there are other handy scripts in there as well for failing over,
> adding tables, merging, etc. that hide a lot of the suck.
> Especially the suck of adding a node and creating the store paths.

There's no set in Londiste, so you just don't manage them. You add
tables to queues (referencing the provider in fact) and the subscriber
is free to subscribe to only a subset of the provider queue's tables.
And any table could participate into more than one queue at any time
too, of course.

> I'm running slony on a rather write intensive system, works fine,
> just make sure you've got beefy IO.  One sucky thing though is if a
> slave is down sl_log can grow very large (I've had it get over 30M
> rows, the slave was only down for hours) and this causes major cpu
> churn while the queries slon issues sift through tons of data.  But,
> to be fair, that'll hurt any replication system.

This could happen in Londiste too, just set pgq_lazy_fetch to a
reasonable value and Londiste will use a cursor to fetch the events,
lowering the load. Events are just tuples in an INSERT only table,
which when not used anymore is TRUNCATEd away. PGQ will use 3 tables
where to store events and will rotate its choice of where to insert
new envents, allowing to use TRUNCATE rather than DELETE. And
PostgreSQL is quite efficient to manage this :)
   http://wiki.postgresql.org/wiki/Londiste_Tutorial#Londiste_is_eating_all_my_CPU_and_lag_is_raising


Oh and some people asked what Londiste with failover and DDL would
look like. Here's what the API being cooked looks like at the moment:
  $ londiste setup.ini execute myddl.script.sql

  $ londiste conf/londiste_db3.ini change-provider --provider=rnode1
  $ londiste conf/londiste_db1.ini switchover --target=rnode2

But I'm not the one who should be unveiling all of this, which is
currently being prepared to reach alpha soon'ish.

Regards,
--
dim


Re: Best replication solution?

From
Jeff
Date:
On Apr 8, 2009, at 4:46 PM, Dimitri Fontaine wrote:

>
> $ londiste.py setup.ini provider add schema.table
> $ londiste.py setup.ini subscriber add schema.table
>

That is nice.  One could probably do that for slony too.

I may try some tests out with londiste.. I'm always open to new
(ideally, better) things.
>
> This could happen in Londiste too, just set pgq_lazy_fetch to a
> reasonable value and Londiste will use a cursor to fetch the events,
> lowering the load. Events are just tuples in an INSERT only table,
> which when not used anymore is TRUNCATEd away. PGQ will use 3 tables
> where to store events and will rotate its choice of where to insert
> new envents, allowing to use TRUNCATE rather than DELETE. And
> PostgreSQL is quite efficient to manage this :)
>  http://wiki.postgresql.org/wiki/Londiste_Tutorial#Londiste_is_eating_all_my_CPU_and_lag_is_raising
>

Well, Slony always uses a cursor to fetch, the problem is it may have
to slog through millions of rows to find the new data - I've analyzed
the queries and there isn't much it can do -  lots of calls to the
xxid_ functions to determine whats to be used, whats not to be used.
When all slaves have a sync event ack'd it is free to be removed by
the cleanup routine which is run every few minutes.

>
> Oh and some people asked what Londiste with failover and DDL would
> look like. Here's what the API being cooked looks like at the moment:
> $ londiste setup.ini execute myddl.script.sql
>
> $ londiste conf/londiste_db3.ini change-provider --provider=rnode1
> $ londiste conf/londiste_db1.ini switchover --target=rnode2
>

ok, so londiste can't do failover yet, or is it just somewhat
convoluted at this point?

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/