Thread: Integrating Replication into Core

Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 14:23:46

Hi,

[ moving to -hackers, that seems more appropriate. ]

Jeff Davis wrote:
> If there is some great replication solution that a lot of people need
> and it will only work with a change to core, that change might make it
> in.

That's what I'm saying. Although it's hypothetical.

> However, there may not be nifty syntax changes nor GUCs in core to
> support a specific implementation of a replicator.

I'd love to get into that one. Some of the people who have attended my 
talk at the summit might know that I've introduced the following syntax 
to Postgres-R:

ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs;

And I'm using the system catalogs to store replication settings. What's 
so wrong with that?

Joshua D. Drake wrote:> There is definitely another reason though :). Adding a replication> solution that is integrated
*will*increase development overhead in> terms of support.

Sure. It's an additional feature after all. Refusing to add stuff to 
core because it increases development overhead certainly is a dead end.
> Replication touches (alot) of places.

Yes, that's exactly why I'm going the integrated way with Postgres-R.
:-)

Regards

Markus

Re: Integrating Replication into Core

From

Alvaro Herrera

Date:

22 November 2006, 14:49:33

Markus Schiltknecht wrote:

> >However, there may not be nifty syntax changes nor GUCs in core to
> >support a specific implementation of a replicator.
> 
> I'd love to get into that one. Some of the people who have attended my 
> talk at the summit might know that I've introduced the following syntax 
> to Postgres-R:
> 
> ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs;
> 
> And I'm using the system catalogs to store replication settings. What's 
> so wrong with that?

I don't know if there's anything wrong, but in Mammoth Replicator, the
syntax to enable replication of a single table is 

ALTER TABLE foo ENABLE REPLICATION 

and we store the replication settings in system catalogs as well.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 14:56:16

Hi,

Alvaro Herrera wrote:
> I don't know if there's anything wrong, but in Mammoth Replicator, the
> syntax to enable replication of a single table is 
> 
> ALTER TABLE foo ENABLE REPLICATION 
> 
> and we store the replication settings in system catalogs as well.

Oh, that's nice to know.

Regards

Markus

Re: Integrating Replication into Core

From

Andrew Dunstan

Date:

22 November 2006, 14:59:21

Alvaro Herrera wrote:
> Markus Schiltknecht wrote:
>
>   
>>> However, there may not be nifty syntax changes nor GUCs in core to
>>> support a specific implementation of a replicator.
>>>       
>> I'd love to get into that one. Some of the people who have attended my 
>> talk at the summit might know that I've introduced the following syntax 
>> to Postgres-R:
>>
>> ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs;
>>
>> And I'm using the system catalogs to store replication settings. What's 
>> so wrong with that?
>>     
>
> I don't know if there's anything wrong, but in Mammoth Replicator, the
> syntax to enable replication of a single table is 
>
> ALTER TABLE foo ENABLE REPLICATION 
>
> and we store the replication settings in system catalogs as well.
>
>   

Wasn't there supposed to be some discussion among replication authors to 
try to come up with at least some common hooks?

If everybody invents their own grammar, GUC vars, etc. etc. it will be 
impossible to handle down the track. We'd be faced with a choice of 
never having any replication in core, or picking one and leaving the 
others out in the cold. This is supposed to be a *community*.

cheers

andrew

Re: Integrating Replication into Core

From

"Jonah H. Harris"

Date:

22 November 2006, 15:07:12

On 11/22/06, Andrew Dunstan <andrew@dunslane.net> wrote:
> Wasn't there supposed to be some discussion among replication authors to
> try to come up with at least some common hooks?

That was my understanding as well.

-- 
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/

Re: Integrating Replication into Core

From

"Joshua D. Drake"

Date:

22 November 2006, 15:10:07

> Wasn't there supposed to be some discussion among replication authors to 
> try to come up with at least some common hooks?

Well yes, but as far as I know that never happen, and we have been
implementing the new version with the above syntax for a year and our
GUC variables have been around for over 4 years.

> 
> If everybody invents their own grammar, GUC vars, etc. etc. it will be 
> impossible to handle down the track. We'd be faced with a choice of 
> never having any replication in core, or picking one and leaving the 
> others out in the cold. This is supposed to be a *community*.

Agreed.

Joshua D. Drake

> 
> cheers
> 
> andrew
> 
-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: Integrating Replication into Core

From

Jeff Davis

Date:

22 November 2006, 15:11:51

On Wed, 2006-11-22 at 19:23 +0100, Markus Schiltknecht wrote:
> Hi,
> 
> [ moving to -hackers, that seems more appropriate. ]
> 
> Jeff Davis wrote:
> > If there is some great replication solution that a lot of people need
> > and it will only work with a change to core, that change might make it
> > in.
> 
> That's what I'm saying. Although it's hypothetical.
> 
> > However, there may not be nifty syntax changes nor GUCs in core to
> > support a specific implementation of a replicator.
> 
> I'd love to get into that one. Some of the people who have attended my 
> talk at the summit might know that I've introduced the following syntax 
> to Postgres-R:
> 
> ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs;
> 
> And I'm using the system catalogs to store replication settings. What's 
> so wrong with that?
> 

Nothing's wrong with that approach. My prediction, however, is that:

(1) Similar replication solutions will first agree on some common hooks
they need in the backend that may have no actual SQL syntax associated,
and get patches in
(2) then agree on some implementations details
(3) then agree on the syntax

To talk about getting syntax in the backend now seems like putting the
cart before the horse, to me anyway. But there's nothing wrong with
having SQL syntax for the replication. 

Regards,Jeff Davis

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 15:21:32

Hi,

Andrew Dunstan wrote:
> Wasn't there supposed to be some discussion among replication authors to 
> try to come up with at least some common hooks?

Yes, Andrew Sullivan even opened a PgFoundry project and a mailing list. 
But up to now, only the GORDA project has proposed some hooks.

For Postgres-R, I definitely don't want to settle for any hooks, yet, 
because I want to keep flexible. Hooks would only get into my way and 
serve no purpose.

> If everybody invents their own grammar, GUC vars, etc. etc. it will be 
> impossible to handle down the track. 

Why is that? I can very well change all of the configuration stuff, I 
just don't see no use for that.

> We'd be faced with a choice of 
> never having any replication in core, or picking one and leaving the 
> others out in the cold. 

...or wait for *the one* superior set of hooks we never can come up with?

Remember that the problem in replication is not interfacing with the 
database. That can and has been solved in multiple different ways. And 
interfaces can change (especially as long as they are still part of 
experimental software).

Regards

Markus

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 15:23:53

Hi,

Joshua D. Drake wrote:
> Well yes, but as far as I know that never happen, and we have been
> implementing the new version with the above syntax for a year and our
> GUC variables have been around for over 4 years.

Sorry, new version of what? what GUC variables?

Regards

Markus

Re: Integrating Replication into Core

From

"Simon Riggs"

Date:

22 November 2006, 15:27:25

On Wed, 2006-11-22 at 19:23 +0100, Markus Schiltknecht wrote:

> Jeff Davis wrote:
> > If there is some great replication solution that a lot of people need
> > and it will only work with a change to core, that change might make it
> > in.
> 
> That's what I'm saying. Although it's hypothetical.

My interest is in extending Warm Standby [8.2] to include the following
forms of replication:
1. asynchronous WAL-record level transfer to Standby server
2. synchronous WAL-record level transfer to Standby server
My foresight includes that this would likely require some improvements
in Group Commit, but I've not done the design for this *yet*.

I would also like to include some performance optimisations into Core
that are specifically aimed at improving Slony performance. (I'm more
than happy if those things also increase performance of other
situations). That's slightly different thing to embedding Slony in Core,
which I am *not* suggesting. Suggestions welcome.

This will then give PostgreSQL:
- improved performance for the most popular production replication
system for PostgreSQL (Slony)
- a capability for Synchronous Replication, when it is requested

That's the limit of my ambitions for 8.3.

Personally, I won't be investing time in multi-master solutions for a
host of reasons; please just regard that as a personal time allocation
decision rather than a suggestion to prevent others from doing so.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: Integrating Replication into Core

From

"Joshua D. Drake"

Date:

22 November 2006, 15:27:32

On Wed, 2006-11-22 at 20:23 +0100, Markus Schiltknecht wrote:
> Hi,
> 
> Joshua D. Drake wrote:
> > Well yes, but as far as I know that never happen, and we have been
> > implementing the new version with the above syntax for a year and our
> > GUC variables have been around for over 4 years.
> 
> Sorry, new version of what? what GUC variables?

Our new version of replicator (1.7) has been in development for a year
and that is the version that supports ALTER TABLE.

The GUC variables we use have mostly been static for years. 1.7 has some
clean up etc..

Sincerely,

Joshua D. Drake


> 
> Regards
> 
> Markus
> 
-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 15:31:34

Hi,

Jeff Davis wrote:
> Nothing's wrong with that approach. My prediction, however, is that:
> 
> (1) Similar replication solutions will first agree on some common hooks
> they need in the backend that may have no actual SQL syntax associated,
> and get patches in

Well, before that, you need to know what hooks you need. And that again 
involves lots of implementation details. Thus better first implement 
without hooks, otherwise you might later notice that there is something 
you didn't think of.

> (2) then agree on some implementations details
> (3) then agree on the syntax
> 
> To talk about getting syntax in the backend now seems like putting the
> cart before the horse, to me anyway. 

That was just an example. Postgres-R actually already does a lot behind 
the scenes if you type that command. So, yes, the horse definitely came 
before the cart.

> But there's nothing wrong with
> having SQL syntax for the replication. 

Okay. After reading the tsearch2 discussion I got another feeling, but 
that might just have been me.

Regards

Markus

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 15:35:35

Hi,

Joshua D. Drake wrote:
>> Joshua D. Drake wrote:
>>> Well yes, but as far as I know that never happen, and we have been
>>> implementing the new version with the above syntax for a year and our
>>> GUC variables have been around for over 4 years.
>> Sorry, new version of what? what GUC variables?
> 
> Our new version of replicator (1.7) has been in development for a year
> and that is the version that supports ALTER TABLE.
> 
> The GUC variables we use have mostly been static for years. 1.7 has some
> clean up etc..

Aha. Well, could you name the places where you'd need hooks? Would you 
like to use hooks? What purpose would that serve you?

Regards

Markus

Re: Integrating Replication into Core

From

"Joshua D. Drake"

Date:

22 November 2006, 15:39:53

On Wed, 2006-11-22 at 20:35 +0100, Markus Schiltknecht wrote:
> Hi,
> 
> Joshua D. Drake wrote:
> >> Joshua D. Drake wrote:
> >>> Well yes, but as far as I know that never happen, and we have been
> >>> implementing the new version with the above syntax for a year and our
> >>> GUC variables have been around for over 4 years.
> >> Sorry, new version of what? what GUC variables?
> > 
> > Our new version of replicator (1.7) has been in development for a year
> > and that is the version that supports ALTER TABLE.
> > 
> > The GUC variables we use have mostly been static for years. 1.7 has some
> > clean up etc..
> 
> Aha. Well, could you name the places where you'd need hooks? Would you 
> like to use hooks? What purpose would that serve you?

I would be the wrong person to ask however, I can say that I don't see a
need for the hooks. If we somehow (the community) created some
reasonable generic interface, we would likely make use of it but other
then that, I am happy with how we are doing it.

Sincerely,

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: Integrating Replication into Core

From

Andrew Dunstan

Date:

22 November 2006, 15:47:35

Markus Schiltknecht wrote:
>
>> But there's nothing wrong with
>> having SQL syntax for the replication. 
>
> Okay. After reading the tsearch2 discussion I got another feeling, but 
> that might just have been me.
>

The objection then was that about 8 or 9 new commands were proposed, and 
that a functional interface might be just as good. 

What sort of grammar support do you want?

cheers

andrew

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 15:58:39

Hi,

Andrew Dunstan wrote:
> What sort of grammar support do you want?

Support? I would have just extended the bison gram.y myself. :-)

I don't yet know what I will need. I'll probably have to add settings 
per database, some per table, others per transaction. I thought about 
some additions to existing ALTER DATABASE and ALTER TABLE commands as 
well as some SET variables, probably within the syntax of SET TRANSACTION...

Stuffing them into such a syntax seems more consistent to me than using 
function calls.

Regards

Markus

Re: Integrating Replication into Core

From

Jeff Davis

Date:

22 November 2006, 18:34:19

On Wed, 2006-11-22 at 20:31 +0100, Markus Schiltknecht wrote:
> Hi,
> 
> Jeff Davis wrote:
> > Nothing's wrong with that approach. My prediction, however, is that:
> > 
> > (1) Similar replication solutions will first agree on some common hooks
> > they need in the backend that may have no actual SQL syntax associated,
> > and get patches in
> 
> Well, before that, you need to know what hooks you need. And that again 
> involves lots of implementation details. Thus better first implement 
> without hooks, otherwise you might later notice that there is something 
> you didn't think of.

I think you misunderstand my point. I was talking about replication
implementations that already exist. They already have patches on the
backend that are necessary for their solution to work.

The idea is to design a single set of hooks that can be used to
implement an entire class of replication. This only makes sense after
existing solutions come to some agreement. I view that as a first step,
assuming that it is necessary to alter the core in order to implement
the class of replication in question.

Once that step is complete, ideally you'd be able to implement Postgres-
R without having to patch the postgresql backend to accomplish it
(except for maybe adding the syntax for your solution). Then, when a
syntax is agreed upon, you won't need to patch the backend at all. Isn't
that the goal, to be able to implement your replication without patching
the backend?

Regards,Jeff Davis

Re: Integrating Replication into Core

From

Alvaro Herrera

Date:

22 November 2006, 18:39:17

Andrew Dunstan wrote:

> Wasn't there supposed to be some discussion among replication authors to 
> try to come up with at least some common hooks?
> 
> If everybody invents their own grammar, GUC vars, etc. etc. it will be 
> impossible to handle down the track. We'd be faced with a choice of 
> never having any replication in core, or picking one and leaving the 
> others out in the cold. This is supposed to be a *community*.

I don't have the expectation that Mammoth Replicator will ever be
open-sourced (this is my personal opinion; the company owner may
differ).  And even if it were, I doubt it would serve as a basis for
whatever community effort to build a replication engine.  I don't think
it's in anybody's best interest to base design decisions on Mammoth
Replicator "experience".  The projects that are already open source are
in a much better standing for that (GORDA, Postgres-R, Slony, etc).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

22 November 2006, 18:47:25

Hi,

[ copying back to -hackers. ]

Andrew Dunstan wrote:
> You have totally misunderstood me. I mean, what sort of grammar changes 
> do you need. Oh, and *you* might submit changes, but *we* have to 
> support them 

Agreed, as long as *we* includes me. I'm not reading it like that, but I 
don't know how you meant it.

> if they go into the core.

Sure, if...

I think the group of developers working on PostgreSQL can be extended, 
by accepting patches, in the hope that the original authors keep 
supporting it - especially because it's easy to revert patches again. 
Not accepting extensions because the current group thinks they can't 
support it won't help in attracting more developers and enlarging that 
group.

I'm taking Postgres-R along since 7.4, for example. You can be very sure 
that I won't drop it because it got into core. (Bad example, because 
this is not going to happen anytime soon, if at all.)

Regards

Markus

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

23 November 2006, 03:50:21

Hi,

Jeff Davis wrote:
> I think you misunderstand my point.

That may well be. Please keep in mind that I'm not a native English 
speaker, thus please speak loud and clear ;-)

> I was talking about replication
> implementations that already exist. They already have patches on the
> backend that are necessary for their solution to work.

Do they? I'm only aware of the GORDA patch. The old Postgres-R patches 
are out of date. Sequoia, PgPool and PgPool-II obviously do not need 
patches. Slony-II, Postgres-R (8) (mine) as well as PGCluster-II are not 
open sourced, yet. And I haven't heard much regarding hooks from any of 
the proprietary vendors (except Joshua's recent statement that he's 
happy without such hooks).

> The idea is to design a single set of hooks that can be used to
> implement an entire class of replication. This only makes sense after
> existing solutions come to some agreement. I view that as a first step,
> assuming that it is necessary to alter the core in order to implement
> the class of replication in question.

As there's not even *one* existing and open replication solution which 
needs patching the backend, you are basing your statements on a false 
premise. Thus, speaking of hooks as a "first step" is very confusing, at 
least.

> Once that step is complete, ideally you'd be able to implement Postgres-
> R without having to patch the postgresql backend to accomplish it
> (except for maybe adding the syntax for your solution). Then, when a
> syntax is agreed upon, you won't need to patch the backend at all. Isn't
> that the goal, to be able to implement your replication without patching
> the backend?

No, it's not. What would that buy me? My goal is to write a widely 
usable replication system. How that interacts with the backend is of 
much less importance to me. And currently fiddling with the backend is 
much easier than maintaining hooks and keep all the replication stuff 
separate.

Postgres-R can be one of the solutions used to decide what hooks we 
need. Waiting for hooks to establish before implementing Postgres-R 
would be what you call 'putting the cart before the horse'.

Regards

Markus

Re: Integrating Replication into Core

From

José Orlando Pereira

Date:

23 November 2006, 07:14:41

On Wednesday 22 November 2006 7:21 pm, Markus Schiltknecht wrote:
>
> Yes, Andrew Sullivan even opened a PgFoundry project and a mailing list.
> But up to now, only the GORDA project has proposed some hooks.
>
> For Postgres-R, I definitely don't want to settle for any hooks, yet,
> because I want to keep flexible. Hooks would only get into my way and
> serve no purpose.
>
> > If everybody invents their own grammar, GUC vars, etc. etc. it will be
> > impossible to handle down the track.
>
> Why is that? I can very well change all of the configuration stuff, I
> just don't see no use for that.

Indeed, we in GORDA have also came up with yet another set of changes to 
grammar and GUC variables. This is not the ideal scenario. :-(

I understand that different people have different motives not to agree with 
the GORDA-style hook based approach. Therefore, I suggest that we try to 
agree on small but sure steps. The worst outcome of this would be that we all 
end up with smaller patches to maintain...

The configuration stuff seems to be a good place to start. What about each of 
us summarizing their changes to grammar and GUC to the hooks list to get the 
discussion started on a solid ground?

BTW, we have released a new version of the GORDA platform. This version has 
been completly rewritten (reusing some code from PL-J) and has a lot more 
functionality. The annoucement will follow shortly.

-- 
Jose Orlando Pereira

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

23 November 2006, 07:46:45

Hello Jose,

José Orlando Pereira wrote:
> Indeed, we in GORDA have also came up with yet another set of changes to 
> grammar and GUC variables. This is not the ideal scenario. :-(
> 
> I understand that different people have different motives not to agree with 
> the GORDA-style hook based approach. Therefore, I suggest that we try to 
> agree on small but sure steps. 

I appreciate your efforts to come up with hooks. But as I've already 
stated, I'm not ready to settle down for concrete hooks for Postgres-R 
(8), so I probably can't help.

> The worst outcome of this would be that we all 
> end up with smaller patches to maintain...

Do you really maintain patches? I'm maintaining a source tree and I'd 
like to keep it that way, as of now.

I'd better like to work together in other areas, for example, what do 
you use for testing? I've read that the Sequoia people use their 
home-grown (and closed source) test suite. I'm about to write the third 
generation of my own test suite...

For simulations, I'm using qemu, sometimes also trying Xen, but that 
does not run on my laptop. :-(

Perhaps we can share test suites, or even automated benchmarks? IMO, we 
would gain a whole lot more with that than with hooks.

Regards

Markus

Re: Integrating Replication into Core

From

José Orlando Pereira

Date:

23 November 2006, 08:32:24

On Thursday 23 November 2006 11:46 am, Markus Schiltknecht wrote:
> I appreciate your efforts to come up with hooks.

Thank you. :-)

> But as I've already 
> stated, I'm not ready to settle down for concrete hooks for Postgres-R
> (8), so I probably can't help.

Sure, I know that you don't like hooks.

I just suggested that we should compare *interfaces* to configure replication 
(i.e. variable names, grammar, etc), since it looks like we have a bunch of 
different syntaxes to achieve the same.

It might turn out that there is no common ground, but it is worth trying it.

> I'd better like to work together in other areas, for example, what do
> you use for testing? I've read that the Sequoia people use their
> home-grown (and closed source) test suite. I'm about to write the third
> generation of my own test suite...

It is somewhat difficult to share a test-suite if we have to maintain multiple 
versions of the code that sets up the replicated db. 

See the point? ;)

> > The worst outcome of this would be that we all
> > end up with smaller patches to maintain...
>
> Do you really maintain patches? I'm maintaining a source tree and I'd
> like to keep it that way, as of now.

We do maintain a patch, as you do, unless you have forked from mainline for 
good. Using a good revision control system helps (we use Cannonical's Bazaar, 
BTW), but does not fundamentally change the problem.

The smaller the diff, the better.

-- 
Jose Orlando Pereira

Re: Integrating Replication into Core

From

alfranio correia junior

Date:

23 November 2006, 08:34:10



> The idea is to design a single set of hooks that can be used to
> implement an entire class of replication. This only makes sense after
> existing solutions come to some agreement. I view that as a first step,
> assuming that it is necessary to alter the core in order to implement
> the class of replication in question.
> 
> Once that step is complete, ideally you'd be able to implement Postgres-
> R without having to patch the postgresql backend to accomplish it
> (except for maybe adding the syntax for your solution). Then, when a
> syntax is agreed upon, you won't need to patch the backend at all. Isn't
> that the goal, to be able to implement your replication without patching
> the backend?


We should go in that direction.

In a database life cycle, there are different events that may be useful 
for different replication solutions. For instance, we may say:- database startup and shutdown- connection startup and
shutdown-transaction begin, commit, rollback- statement request- updates (i.e., insert, delete, update)- logging
 
First, we should agree on which events we need to support a set of 
replication protocols (e.g., gorda, postgres-r, slony-i and ii, etc). 
Then, we should decide how such events will be notified.

In particular, the gorda project decided to use "special triggers" but 
any sort of callback would be great for us. We adopted these hooks 
because we thought that it would be useful to different applications 
(e.g, materialized views).

Third we should discuss what interface would be provided to inject 
information into remote replicas. Is the SPI_* interface good ? How
to inject binary data into tables ? I know that PostgreSQL allows to do 
that. But is the interface provided enough ? Would not be interesting to 
inject things directly into log ?

Fourth, we should have a discussion on locks, high priority 
transactions, notifications on blocking, etc...

And finally, we may be able to discuss meta information, syntax, etc...


Regards,

Alfranio Junior.

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

23 November 2006, 10:01:32

Hi,

[ I suggest to move from hackers to replica-hooks-discuss@pgfoundry.org,  as that's what that list has been created
for.]

José Orlando Pereira wrote:
> Sure, I know that you don't like hooks.

Yes, but that's yet another story. ;-)

> I just suggested that we should compare *interfaces* to configure replication 
> (i.e. variable names, grammar, etc), since it looks like we have a bunch of 
> different syntaxes to achieve the same.

The same?

Let's see. I currently have these additional commands:

ALTER DATABASE testdb START REPLICATION    IN GROUP testgroup USING egcs;

and

ALTER DATABASE testdb ACCEPT REPLICATION    FROM GROUP testgroup USING egcs;

I've added a system table pg_replication_gcs to describe the different 
group communication systems and connections to them:

Table "pg_catalog.pg_replication_gcs"  Column  |  Type   | Modifiers
----------+---------+----------- rgcsname | name    | not null rgcstype | integer | not null rgcsport | integer | not
nullrgcssock | text    |

(Splitting into rgcsport and rgcssock prooved to be not very helpful.)

And I've added two fields to pg_database to define the GCS and the group 
in which to replicate a database:

.. datreplgcs    | oid       | not null
.. datreplgrp    | text      |

But as I said: these might change any time. And I certainly will have to 
add others, but no idea what those additions will look like.

When comparing to the Mammoth Replicator syntax that Alvaro posted, this 
seems very different. PGCluster-II does not use a GCS at all. And I 
haven't seen others.

> It is somewhat difficult to share a test-suite if we have to maintain multiple 
> versions of the code that sets up the replicated db.

Well, we wouldn't have to share test cases, but at least the *suite*. 
All the code which starts and stops postmasters, does initdb etc..

Probably that's just me, but I'm not aware of any (OSS) project which 
can emulate a network (or even a GCS), start and stop processes as 
requested and check how they react upon different inputs. If you know 
such a thing, please email me! (I've looked at STAF, but that seems 
overly complex and targeted at completely different use-case.)

> See the point? ;)

Sure, but it's wishful thinking.

> We do maintain a patch, as you do, unless you have forked from mainline for 
> good. Using a good revision control system helps (we use Cannonical's Bazaar, 
> BTW), but does not fundamentally change the problem.

I'm using monotone. And I don't need much time to fiddle with patches. A 
simple 'mtn diff -r ${TRUNK_REVISION}' does all I need. That's why I'd 
still say that I don't maintain a patch.

> The smaller the diff, the better.

I disagree. Where exactly does size of the patch matter for you?

The number you mean, which is important, is the number of points in the 
code where you need to interact with the database, i.e. the number of 
hooks you would need. Because as PostgreSQL moves along, changes at 
these points are probably necessary. But that number certainly has 
nothing to do with the patch size.

Regards

Markus

Re: Integrating Replication into Core

From

Dimitri Fontaine

Date:

23 November 2006, 10:15:22

Hi Markus,

Le jeudi 23 novembre 2006 12:46, Markus Schiltknecht a écrit :
> For simulations, I'm using qemu, sometimes also trying Xen, but that
> does not run on my laptop. :-(

So you still only have your laptop as 'development facility' ?

At dalibo we have a couple of machines we're not using anymore, partly because
we don't have a need for them nowadays, mainly because it's end-of-life
hardware, not that trusty.

It's some bi pentium III, 2*700MHz, 1Go RAM, 2 ide disks (20Go system and
either 20Go or 120Go data), and two 100Mbps network card per machine.
Direct link should be possible to setup.

We can provide you access to those two servers for you to test postgres-r if
you want to,

Regards,
--
Dimitri Fontaine
http://www.dalibo.com/

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

23 November 2006, 10:26:46

Hello Dimitri,

Dimitri Fontaine wrote:
> So you still only have your laptop as 'development facility' ?

Yes.

> At dalibo we have a couple of machines we're not using anymore, partly because 
> we don't have a need for them nowadays, mainly because it's end-of-life 
> hardware, not that trusty.
> 
> It's some bi pentium III, 2*700MHz, 1Go RAM, 2 ide disks (20Go system and 
> either 20Go or 120Go data), and two 100Mbps network card per machine. 
> Direct link should be possible to setup.

Thank you very much. But I think two machines is not quite enough. :-(

Having a whole cluster emulated on my laptop allows me to work on the 
road. That's a very nice thing (tm). And the emulated machines are 
probably already faster than PIIIs... (Memory is the limiting factor, 
unfortunately I can't stuff more than 2GB in my laptop.)

Regards

Markus

Re: Integrating Replication into Core

From

David Boreham

Date:

23 November 2006, 10:50:49

Markus Schiltknecht wrote:

> Probably that's just me, but I'm not aware of any (OSS) project which 
> can emulate a network (or even a GCS), start and stop processes as 
> requested and check how they react upon different inputs.

I've worked on an emulated test rig for a replication system (not RDBMS 
but for LDAP).
We used netem (OSS) for the network emulation and a pile of python and 
shell scripts and
C client test apps.
Testing replication is hard, of course, and you have to roll most of it 
yourself :(

> If you know such a thing, please email me! (I've looked at STAF, but 
> that seems overly complex and targeted at completely different use-case.)

In my experience test frameworks tend to provide less useful 
functionality than one might hope.
Sometimes to the point that they're hardly worth bothering with at all.

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

23 November 2006, 11:03:11

Hi,

David Boreham wrote:
> I've worked on an emulated test rig for a replication system (not RDBMS 
> but for LDAP).
> We used netem (OSS)

Thanks. I've already heard about that one some while ago, but didn't 
remember it. I'll have another look.

> for the network emulation and a pile of python and 
> shell scripts and
> C client test apps.
> Testing replication is hard, of course, and you have to roll most of it 
> yourself :(

Yeah, I'm also using python for that.

>> If you know such a thing, please email me! (I've looked at STAF, but 
>> that seems overly complex and targeted at completely different use-case.)
> 
> In my experience test frameworks tend to provide less useful 
> functionality than one might hope.
> Sometimes to the point that they're hardly worth bothering with at all.

ACK. Same experience here.

Regards

Markus

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

23 November 2006, 11:06:53

Hi,

David Boreham wrote:
> We used netem (OSS) for the network emulation and a pile of python and 
> shell scripts and

LOL, I've just figured that netem is the project behind:

tc qdisc ... netem ...

I'm already using that, too ;-)  Just wasn't aware it's called netem. 
Sounds silly, since the name is in the command line, I know...

Regards

Markus

Re: Integrating Replication into Core

From

alfranio correia junior

Date:

23 November 2006, 12:55:33

>> I just suggested that we should compare *interfaces* to configure 
>> replication (i.e. variable names, grammar, etc), since it looks like 
>> we have a bunch of different syntaxes to achieve the same.
> 
> The same?
> 
> Let's see. I currently have these additional commands:
> 
> ALTER DATABASE testdb START REPLICATION
>     IN GROUP testgroup USING egcs;
> 
> and
> 
> ALTER DATABASE testdb ACCEPT REPLICATION
>     FROM GROUP testgroup USING egcs;
> 

We have the following commands:

SET TRANSACTION MASTER

and

CREATE TRIGGER <name> for { STARTUP | SHUTDOWN |   BEGIN TRANSACTION | COMMIT TRANSACTION | ROLLBACK TRANSACTION }
executeprocedure <func> ( <funcargs> )
 


It is worth noting that none of them have references to replication.
Metainformation on replication is stored in normal tables.

I think that we should discuss requirements first instead of going 
towards syntax. The latter is the last step to achieve a common
set of ideas.

I suggest the following road map.

In a database life cycle, there are different events that may be useful 
for different replication solutions. For instance, we may say:    - database startup and shutdown    - connection
startupand shutdown    - transaction begin, commit, rollback    - statement request    - updates (i.e., insert, delete,
update)   - logging
 

First, we should agree on which events we need to support a set of 
replication protocols (e.g., gorda, postgres-r, slony-i and ii, etc). 
Then, we should decide how such events will be notified.

In particular, the gorda project decided to use "special triggers" but 
any sort of callback would be great for us. We adopted these hooks 
because we thought that it would be useful to different applications 
(e.g, materialized views).

Third we should discuss what interface would be provided to inject 
information into remote replicas. Is the SPI_* interface good ? How
to inject binary data into tables ? I know that PostgreSQL allows to do 
that. But is the interface provided enough ? Would not be interesting to 
inject things directly into log ?

Fourth, we should have a discussion on locks, high priority 
transactions, notifications on blocking, etc...

And finally, we may be able to discuss meta information, syntax, etc...


What do you think ?

Re: [Replica-hooks-discuss] Integrating Replication ino

From

Markus Schiltknecht

Date:

23 November 2006, 13:26:28

Hi,

alfranio correia junior wrote:
> We have the following commands:
> 
> SET TRANSACTION MASTER
> 
> and
> 
> CREATE TRIGGER <name> for { STARTUP | SHUTDOWN |
>     BEGIN TRANSACTION | COMMIT TRANSACTION | ROLLBACK TRANSACTION }
>     execute procedure <func> ( <funcargs> )

Okay.

> I think that we should discuss requirements first instead of going 
> towards syntax. The latter is the last step to achieve a common
> set of ideas.

I still maintain the point that I want to check requirements first. For 
that I need a working prototype. And I'm easy with prototyping in C in 
the backend code. If there's really a requirement for hooks, I can add 
them and decouple from PostgreSQL source code later on.

What do you currently base your hooks on? IMO it's just naive to expect 
to be able to define hooks now, especially hooks as general as you seem 
to be heading to (I've read about sync and async multi master 
replication, single master replication as well as materialized views).

Another point: modularization is nice and well, where appropriate. But 
here I don't see how it could help the user. Or do you expect users to 
plug in and out replication solutions like USB sticks? I think most 
users want to have *one* replication solution that works. Out of the 
box. Maybe they want one which can do sync as well as async replication, 
sure. But hooks don't give you that, nor do they make it any easier.

I agree that it's helpful to modularize it in code. But you don't need 
hooks for that.

I know I'm probably somewhat alone with that point of view.

Regards

Markus

Re: [Replica-hooks-discuss] Integrating Replication ino

From

alfranio correia junior

Date:

24 November 2006, 07:53:58

Hi !!!

> I still maintain the point that I want to check requirements first. For 
> that I need a working prototype. And I'm easy with prototyping in C in 
> the backend code. If there's really a requirement for hooks, I can add 
> them and decouple from PostgreSQL source code later on.

I agree with you. You should build prototypes and try things in order to
figure out exactly what we need.
However, based on the experience that you already have in developing 
such prototypes most likely there are different futures that would like 
to see into PostgreSQL. What are they ?


> What do you currently base your hooks on? IMO it's just naive to expect 
> to be able to define hooks now, especially hooks as general as you seem 
> to be heading to (I've read about sync and async multi master 
> replication, single master replication as well as materialized views).

You have "prototypes" built upon such hooks: sync and async, single 
master and multi master. However, I am not arguing that hooks are the 
solution to any problem. But they work for the limited view that we have 
on the subject.

> Another point: modularization is nice and well, where appropriate. But 
> here I don't see how it could help the user. Or do you expect users to 
> plug in and out replication solutions like USB sticks? I think most 
> users want to have *one* replication solution that works. Out of the 
> box. Maybe they want one which can do sync as well as async replication, 
> sure. But hooks don't give you that, nor do they make it any easier.

I don't expect that. But I would like to test different replication 
protocols without patching the PostgreSQL. And I believe that we might 
come up with a set of in-core features that would enable this.

Regards,

Alfranio.

Re: [Replica-hooks-discuss] Integrating Replication ino

From

"Florian G. Pflug"

Date:

24 November 2006, 11:40:55

Markus Schiltknecht wrote:
> Another point: modularization is nice and well, where appropriate. But 
> here I don't see how it could help the user. Or do you expect users to 
> plug in and out replication solutions like USB sticks? I think most 
> users want to have *one* replication solution that works. Out of the 
> box. Maybe they want one which can do sync as well as async replication, 
> sure. But hooks don't give you that, nor do they make it any easier.

I, as a mostly-user, fully subscribe to that point of view. IMHO one
of the biggest mistakes mysql made were those "pluggable storage
managers". While all those different storage managers (innodb, bdb,
myisam, ...) _look_ interchangeable from an interface point of view
(You just specify which one to use when creating the table, right?),
they all have _different_ semantics. Just forgot to write "with innodb"
in _one_ of your table definitions, and transaction isolation goes
out of the window :-(.

I understand that different usecases need different replication
solutions - but I think "Hey, let's just make them plugins" is
not the way to go. It would work if all replication solutions
had _exactly_ the same semantics - but if they do, then what is
the point of all the different solutions anyway?

Just my 2 eurocents...
Greetings, Florian Pflug

Re: Integrating Replication into Core

From

Andrew Sullivan

Date:

24 November 2006, 15:09:02

On Wed, Nov 22, 2006 at 01:58:34PM -0500, Andrew Dunstan wrote:
> Wasn't there supposed to be some discussion among replication authors to 
> try to come up with at least some common hooks?

That was what I was aiming at, yes.

http://pgfoundry.org/projects/replica-hooks/

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Unfortunately reformatting the Internet is a little more painful 
than reformatting your hard drive when it gets out of whack.    --Scott Morris

Re: Integrating Replication into Core

From

Andrew Sullivan

Date:

24 November 2006, 15:56:13

On Wed, Nov 22, 2006 at 08:21:23PM +0100, Markus Schiltknecht wrote:
> 
> For Postgres-R, I definitely don't want to settle for any hooks, yet, 
> because I want to keep flexible. Hooks would only get into my way and 
> serve no purpose.

Let me make the following argument to the contrary.  This is a
rationale argument for the other discussion, and not a discussion of
the hooks themselves, so I think it's still appropriate for -hackers.

The reason to write down what the _requirements_ are for hooks is so
that the community can get to work on any of the general approaches
to replication that they want.  These hooks might, in fact, turn out
to be nothing more than a layer of indirection in the core PostgreSQL
code.

The reason the earlier attempts at Postgres-R didn't ever make it out
of testing was precisely, I argue, because there just wasn't an
interface for the rest of the PostgreSQL project (maybe not
interested in replication) to keep stable.  So merely keeping up with
the pace of change in the core code turned into a significant
undertaking.  Those are cycles stolen from the more useful work of
making the replication code work better.

The same thing is true of other pieces that have fallen by the side:
because the whole of the PostgreSQL project moves so quickly, a small
number of people working on a large feature set in relative isolation
can end up spending way too much time keeping up with the core, and
not enough time working on the features they desire.  The result is a
loss to everyone.

So that's why I was trying to outline what, at least, the
requirements are.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."    --Damien Katz

Re: [Replica-hooks-discuss] Integrating Replication ino

From

Andrew Sullivan

Date:

24 November 2006, 16:16:18

I'm responding with a short answer here.  But more of this sort of
discussion would really help our meta discussion on what the problem
is we're trying to solve.  I'm trying to host that on the other list
just on the grounds that -hackers has enough traffic about _actual_
features without cluttering it with discussion of wishlist items that
nobody is yet committed to do the work on.

On Fri, Nov 24, 2006 at 04:21:11PM +0100, Florian G. Pflug wrote:
> managers". While all those different storage managers (innodb, bdb,
> myisam, ...) _look_ interchangeable from an interface point of view
> (You just specify which one to use when creating the table, right?),
> they all have _different_ semantics. 

Yes.  But one way MySQL could have done that right was to identify in
their core that they needed an idea of storage management state. 
Then BEGIN; INSERT INTO innodb_table; UPDATE myisam_table; COMMIT;
would fail in the way the ACID gods intended.  But that, of course,
would have required writing down in advance how these things should
work.  Which is what I'm proposing to do.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.                --Brad Holland

Re: Integrating Replication into Core

From

"Joshua D. Drake"

Date:

25 November 2006, 15:07:35

> The reason the earlier attempts at Postgres-R didn't ever make it out
> of testing was precisely, I argue, because there just wasn't an
> interface for the rest of the PostgreSQL project (maybe not
> interested in replication) to keep stable.  So merely keeping up with
> the pace of change in the core code turned into a significant
> undertaking.  Those are cycles stolen from the more useful work of
> making the replication code work better.

Actually I don't buy this argument. The only major change in 
*postgresql* that has slowed down Replicator is the move from 
users/groups to roles. We added a feature in the internal 1.6 release to 
replicate users/groups.

We are currently behind because of things that have really nothing to do 
with PostgreSQL and more to do with reworking an evolutionary code base 
to be more manageable.

I don't know much (anything) about Postgres-R but my guess is that the 
only major change that would have effected that project in recent years 
would have been two phase commit and that is only if they chose to take 
advantage of it.

Sincerely,

Joshua D. Drake

Re: Integrating Replication into Core

From

Andrew Sullivan

Date:

26 November 2006, 00:16:49

On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote:
> Actually I don't buy this argument. The only major change in 

Ok, good.  So why isn't Postgres-R something we have _now_?  The work
that I've seen on it, so far (and I speak as someone who invested a
significant amount of staff time, cash money, and -- frankly --
"political" credibility in software based on that idea) is that there
isn't a way to make it production-grade without pretty severe
constraints on what it can do.

It was that unhappy discovery that led me to say, "Can we please
_write down_ what we think 'replication' might require, and what the
trade-offs can be?"  I'm trying to write requirements in public here;
but all I get is silence.  This frustrates me partly because, as
someone who stuck his neck out to make sure Slony was released as
free software, I hear a lot of demands for features people apparently
want without much in the way of design proposals -- never mind code -- 
to achieve those features.  When Jan delivered the initial release of
Slony, it was preceded by a design doc.  I note on -hackers long
emails from (for example) Tom doing something very similar when
proposing a major feature.  What I'm trying to do is to get the
replication-interested community of PostgreSQL users to say "here's
what we mean by 'replication'" before we all go off inventing the
grammar.  We need to have a clue about the domain of discourse before
we start settling the variable assignments.

It seems to me that every single replication discussion on -hackers
amounts to a bunch of futile attempts by colour blind people (of
which I am one) to describe the colour 'high note', while their
interlocutors describe the sound 'red'.  I'm trying to get us to say
what it would mean even to do the describing.

Specifying requirements for what software is supposed to do is one of
those thankless tasks that everyone complains is never done in the
free software community.  I am offering, earnestly, to do that.  I
just need a few people to tell me what _they think_ the software in
question ought to do.  I set up a mailing list.  I have solicited
comments.  I'm not sure what else to do, but so far, I have the
positive remarks of Jose (GORDA), the remarks of Markus (which amount
to "this is a waste of time", unless I misread him), and nothing
else.

Surely, in a community that spends time on the topic of whether
replication "should be in the back end", we oughta be able to come up
with 10 or so people who are willing to say what "being in the back
end" would mean.  At the moment, this trivial goal is all I'm aiming
for.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
When my information changes, I alter my conclusions.  What do you do sir?    --attr. John Maynard Keynes

Re: Integrating Replication into Core

From

"Joshua D. Drake"

Date:

26 November 2006, 00:52:38

Andrew Sullivan wrote:
> On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote:
>> Actually I don't buy this argument. The only major change in 
> 
> Ok, good.  So why isn't Postgres-R something we have _now_? 

That's is a good question and as I mentioned, I don't know much about 
Postgres-R. My point was directly to the argument that a fast moving 
PostgreSQL somehow limits the ability for replication to be built. That 
argument, I believe is false.

I originally responded to the rest of your email but thought better of 
it. The only thing I can say is, my experience is that something like 
replication will only be productively completed, outside the community.

Jan, for the most part created his own community with Slony. Postgres-R 
is doing the same as is the others such as pgPool.

The nature that they are all their own communities, not to mention 
several closed source products (Replicator, Unicluster) pretty much sets 
the whole thing up to fail IMHO.

Otherwise you are just hearding cats.

Joshua D. Drake

> 
> A
>

Re: Integrating Replication into Core

From

David Boreham

Date:

26 November 2006, 12:42:53

Markus Schiltknecht wrote:

> LOL, I've just figured that netem is the project behind:
>
> tc qdisc ... netem ...
>
> I'm already using that, too ;-)  Just wasn't aware it's called netem. 
> Sounds silly, since the name is in the command line, I know...

Heh. AFAIK netem is the tc stuff that isn't much use for production 
router use (e.g.
introduce a 10ms packet delay on this kind of traffic...). We used a 
mixture of
netem and regular tc kernel modules, in a Linux box that had 6 NICs, 
with Python
driving it. Each replication node test machine was connected with a 
straight-through
patch cable to one of the NICs on the 'spider' machine. The Python could 
set up
the netem/tc on the router such that various test scenarios with 
different banwidth/delay
values were implemented. Also of course loss of connectivity by dropping 
all packets
on an interface. Each test machine had two NICs - the second one
being used to communicate with it out of band from the replication 
traffic and
network emulation.  Then on top of all this the actual replication tests 
were run.
One of the things we were interested in was replication throughput vs
network latency, so we also measured performance and made that being 
acceptable a test pass
condition.

If you want really fancy network emulation you'd need to use nistnet.
It can do some things that are not possible with netem (statistical packet
drop for example). However IMHO this is only appropriate for testing
TCP/IP stack implementation. Varying latency, throughput, and 
introducing connectivity outages
is good enough for user mode code I believe. Nistnet is not in the stock 
kernel,
wheras netem is.

Re: Integrating Replication into Core

From

Bruce Momjian

Date:

27 November 2006, 08:29:17

Have you looked at the new HA/load balancing section of the docs?
http://developer.postgresql.org/pgdocs/postgres/high-availability.html

I got a lot of feedback on that.  Perhaps it can be a starting point for
you.

---------------------------------------------------------------------------

Andrew Sullivan wrote:
> On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote:
> > Actually I don't buy this argument. The only major change in 
> 
> Ok, good.  So why isn't Postgres-R something we have _now_?  The work
> that I've seen on it, so far (and I speak as someone who invested a
> significant amount of staff time, cash money, and -- frankly --
> "political" credibility in software based on that idea) is that there
> isn't a way to make it production-grade without pretty severe
> constraints on what it can do.
> 
> It was that unhappy discovery that led me to say, "Can we please
> _write down_ what we think 'replication' might require, and what the
> trade-offs can be?"  I'm trying to write requirements in public here;
> but all I get is silence.  This frustrates me partly because, as
> someone who stuck his neck out to make sure Slony was released as
> free software, I hear a lot of demands for features people apparently
> want without much in the way of design proposals -- never mind code -- 
> to achieve those features.  When Jan delivered the initial release of
> Slony, it was preceded by a design doc.  I note on -hackers long
> emails from (for example) Tom doing something very similar when
> proposing a major feature.  What I'm trying to do is to get the
> replication-interested community of PostgreSQL users to say "here's
> what we mean by 'replication'" before we all go off inventing the
> grammar.  We need to have a clue about the domain of discourse before
> we start settling the variable assignments.
> 
> It seems to me that every single replication discussion on -hackers
> amounts to a bunch of futile attempts by colour blind people (of
> which I am one) to describe the colour 'high note', while their
> interlocutors describe the sound 'red'.  I'm trying to get us to say
> what it would mean even to do the describing.
> 
> Specifying requirements for what software is supposed to do is one of
> those thankless tasks that everyone complains is never done in the
> free software community.  I am offering, earnestly, to do that.  I
> just need a few people to tell me what _they think_ the software in
> question ought to do.  I set up a mailing list.  I have solicited
> comments.  I'm not sure what else to do, but so far, I have the
> positive remarks of Jose (GORDA), the remarks of Markus (which amount
> to "this is a waste of time", unless I misread him), and nothing
> else.
> 
> Surely, in a community that spends time on the topic of whether
> replication "should be in the back end", we oughta be able to come up
> with 10 or so people who are willing to say what "being in the back
> end" would mean.  At the moment, this trivial goal is all I'm aiming
> for.
> 
> A
> 
> -- 
> Andrew Sullivan  | ajs@crankycanuck.ca
> When my information changes, I alter my conclusions.  What do you do sir?
>         --attr. John Maynard Keynes
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Integrating Replication into Core

From

Alvaro Herrera

Date:

27 November 2006, 10:12:40

Joshua D. Drake wrote:
> Andrew Sullivan wrote:
> >On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote:
> >>Actually I don't buy this argument. The only major change in 
> >
> >Ok, good.  So why isn't Postgres-R something we have _now_? 
> 
> That's is a good question and as I mentioned, I don't know much about 
> Postgres-R. My point was directly to the argument that a fast moving 
> PostgreSQL somehow limits the ability for replication to be built. That 
> argument, I believe is false.
> 
> I originally responded to the rest of your email but thought better of 
> it. The only thing I can say is, my experience is that something like 
> replication will only be productively completed, outside the community.

This is like nVidia saying that "open source developers are not
competent enough to understand the coding of a graphics card driver".

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Integrating Replication into Core

From

Andrew Sullivan

Date:

27 November 2006, 10:18:03

On Mon, Nov 27, 2006 at 07:27:32AM -0500, Bruce Momjian wrote:
> 
> Have you looked at the new HA/load balancing section of the docs?
> 
>     http://developer.postgresql.org/pgdocs/postgres/high-availability.html
> 
> I got a lot of feedback on that.  Perhaps it can be a starting point for
> you.

Yes, I have; and yes, it helps.  

What I was hoping to do, though, as well, was come up with the list
of facilities that developers of these various systems say they need. 
I don't expect this will happen quickly (which is why I figured it
needed a project -- if I could do in in six weeks, then we wouldn't
need a mailing list and the like).  But it seemed to me that, with so
many projects on the go, getting a list together of what the
developers of those systems say they need would be the obvious way to
define, later, what hooks, if any, are needed in the core system.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
If they don't do anything, we don't need their acronym.    --Josh Hamilton, on the US FEMA

Re: Integrating Replication into Core

From

"Joshua D. Drake"

Date:

27 November 2006, 11:50:06

> > I originally responded to the rest of your email but thought better of 
> > it. The only thing I can say is, my experience is that something like 
> > replication will only be productively completed, outside the community.
> 
> This is like nVidia saying that "open source developers are not
> competent enough to understand the coding of a graphics card driver".
> 

I believe you misunderstood me. I am not saying that replication can not
be built in an Open Source manner. Slony is a perfect example of that. I
am saying that involving the larger, general PostgreSQL community in
such a task would be counter-productive.

Sincerely,

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: Integrating Replication into Core

From

Hannu Krosing

Date:

27 November 2006, 14:57:18

Ühel kenal päeval, E, 2006-11-27 kell 07:50, kirjutas Joshua D. Drake:
> > > I originally responded to the rest of your email but thought better of 
> > > it. The only thing I can say is, my experience is that something like 
> > > replication will only be productively completed, outside the community.
> > 
> > This is like nVidia saying that "open source developers are not
> > competent enough to understand the coding of a graphics card driver".
> > 
> 
> I believe you misunderstood me. I am not saying that replication can not
> be built in an Open Source manner. Slony is a perfect example of that. I
> am saying that involving the larger, general PostgreSQL community in
> such a task would be counter-productive.

As several different approaches to "replication" involve same
requirements and/or touching the same places in code it seems a good
idea to at least get some more or less formal descriptions from parties
involved. 

Also, it seems that largely the same things are needed for other
projects, like precomputed/materialized views and auditing or some other
non-replication data moving methods.

While each of the replication and non-replication projects does do its
own thing, it may still be beneficial to try to provide some hooks in
right places for them. Not all projects need to use all of them but
having all projects patch the same places in core code will make it
pretty much impossible to use more than one at a time. 

As an example, one may want to have both synchronous auditing data and
async replication to be done on the same live database, both gathered at
the point of data manipulation and both moved to different machines.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

Re: Integrating Replication into Core

From

Jeff Davis

Date:

27 November 2006, 21:31:54

On Thu, 2006-11-23 at 08:50 +0100, Markus Schiltknecht wrote:
> Hi,
> 
> Jeff Davis wrote:
> > I think you misunderstand my point.
> 
> That may well be. Please keep in mind that I'm not a native English 
> speaker, thus please speak loud and clear ;-)
> 
> > I was talking about replication
> > implementations that already exist. They already have patches on the
> > backend that are necessary for their solution to work.
> 
> Do they? I'm only aware of the GORDA patch. The old Postgres-R patches 
> are out of date. Sequoia, PgPool and PgPool-II obviously do not need 
> patches. Slony-II, Postgres-R (8) (mine) as well as PGCluster-II are not 
> open sourced, yet. And I haven't heard much regarding hooks from any of 
> the proprietary vendors (except Joshua's recent statement that he's 
> happy without such hooks).

Because we're talking about replication, I don't think we can limit the
discussion to current open source solutions. I could be mistaken, but I
am under the impression that commercial replication solutions do patch
the backend.

> > The idea is to design a single set of hooks that can be used to
> > implement an entire class of replication. This only makes sense after
> > existing solutions come to some agreement. I view that as a first step,
> > assuming that it is necessary to alter the core in order to implement
> > the class of replication in question.
> 
> As there's not even *one* existing and open replication solution which 
> needs patching the backend, you are basing your statements on a false 
> premise. Thus, speaking of hooks as a "first step" is very confusing, at 
> least.
> 

You're right, there is no agreement yet. When I say "first step," I mean
that it's the first step toward getting any form of replication support
in the _backend_, _not_ a first step toward a replication solution at
all. It may be a long time before the backend has replication-specific
support of any kind, but many replication projects have passed the first
step toward replication a long time ago. 

I am not advocating replication support in the backend (since I don't
even know what form that would take), nor am I saying that it will
appear soon. I am just saying that replication-specific syntax is
unlikely to appear before other replication-specific details.

Regards,Jeff Davis

Re: Integrating Replication into Core

From

"Joshua D. Drake"

Date:

27 November 2006, 21:38:33

> > Do they? I'm only aware of the GORDA patch. The old Postgres-R patches 
> > are out of date. Sequoia, PgPool and PgPool-II obviously do not need 
> > patches. Slony-II, Postgres-R (8) (mine) as well as PGCluster-II are not 
> > open sourced, yet. And I haven't heard much regarding hooks from any of 
> > the proprietary vendors (except Joshua's recent statement that he's 
> > happy without such hooks).
> 
> Because we're talking about replication, I don't think we can limit the
> discussion to current open source solutions. I could be mistaken, but I
> am under the impression that commercial replication solutions do patch
> the backend.

Quite.


> I am not advocating replication support in the backend (since I don't
> even know what form that would take), nor am I saying that it will

patch -p1 < replicator.diff

;)

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

28 November 2006, 03:42:26

Hi,

Jeff Davis wrote:
> Because we're talking about replication, I don't think we can limit the
> discussion to current open source solutions. I could be mistaken, but I
> am under the impression that commercial replication solutions do patch
> the backend.

Sure. But as you see, at least Joshua D. Drake is quite happy with

patch -p1 < all_his_replicator_changes.diff

I'm, too. Because I don't think I could get anything useful into core. 
And as long as I'd still have to patch the backend, what would that 
serve me?

I really think this decision should be left to the developers of 
replication systems. We *will* ask core, if we want to have something 
added (as did the GORDA project). I state the same in my FAQ at [1].

> You're right, there is no agreement yet. When I say "first step," I mean
> that it's the first step toward getting any form of replication support
> in the _backend_, _not_ a first step toward a replication solution at
> all. 

Okay, sorry, then I misread you.

> It may be a long time before the backend has replication-specific
> support of any kind, but many replication projects have passed the first
> step toward replication a long time ago. 

Have they? Have you heard requests for specific additions into core from 
any of them?

> I am not advocating replication support in the backend (since I don't
> even know what form that would take), nor am I saying that it will
> appear soon. I am just saying that replication-specific syntax is
> unlikely to appear before other replication-specific details.

Sure.

Regards

Markus

[1]: http://www.postgres-r.org/about/faqs

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

28 November 2006, 09:21:20

Hello Andrew,

Andrew Sullivan wrote:
> On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote:
>> Actually I don't buy this argument.

Nether do I. I can only reiterate that interfacing with the database
backend is *not* the problem. I've been porting Postgres-R forward since
7.4 and only few changes were necessary since then. And using a decent
version control system simplifies the task of propagating from CVS HEAD
to my branch. The few conflicts that arose were mostly trivial to
resolve (renaming or slight calling convention changes).

Andrew Sullivan wrote:
> Ok, good.  So why isn't Postgres-R something we have _now_?

(I note you don't count my version of Postgres-R (8), that might be 
reasonable depending on your definition of 'having Postgres-R'.)

I can't speak for others, but I just don't have much spare time left.
And it's a complex matter involving lots of corner cases like network
outages, crashes of the replication manager or GCS daemon, etc. Testing
and making it production grade software really takes a lot of time. IMO
this is where replication solutions could work together, because all of
them need to simulate a cluster somehow, to test their project. But this
certainly has nothing to do with PostgreSQL Core.

Another point for me is that the feedback I got on Postgres-R since
Toronto is very close to zero. Some people haven't even noticed that
there is Postgres-R code for 8.2. Or they don't count my variant for 
some reasons. For example Tom Lane who recently pointed out Postgres-R 
as an example of code drift in [1]. No offense, it's just very 
contradictory to the hype around replication.

> The work that I've seen on it, so far (and I speak as someone who
> invested a significant amount of staff time, cash money, and --
> frankly -- "political" credibility in software based on that idea) is
> that there isn't a way to make it production-grade without pretty
> severe constraints on what it can do.

Right, the Postgres-R algorithm has limitations. And it certainly does
not fit all use cases. The Toronto Meeting has opened my eyes in that 
aspect and I'm thankful for that.

> It was that unhappy discovery that led me to say, "Can we please
> _write down_ what we think 'replication' might require, and what the
> trade-offs can be?"  I'm trying to write requirements in public here;
> but all I get is silence.  This frustrates me partly because, as
> someone who stuck his neck out to make sure Slony was released as
> free software, I hear a lot of demands for features people apparently
> want without much in the way of design proposals -- never mind code -- 
> to achieve those features.  When Jan delivered the initial release of
> Slony, it was preceded by a design doc.  I note on -hackers long
> emails from (for example) Tom doing something very similar when
> proposing a major feature.  What I'm trying to do is to get the
> replication-interested community of PostgreSQL users to say "here's
> what we mean by 'replication'" before we all go off inventing the
> grammar.  We need to have a clue about the domain of discourse before
> we start settling the variable assignments.

As you surely have noticed, I've been discussing forth and back with
Bruce about replication for the documentation. I've been doing that
because I wanted to clarify what 'replication' is, what we are talking
about when we say 'multi-master replication' or 'data partitioning', etc..

Sadly, only very few people from the 'replication interested community'
were discussing. I've even been trying to get more of them involved.

> It seems to me that every single replication discussion on -hackers
> amounts to a bunch of futile attempts by colour blind people (of
> which I am one) to describe the colour 'high note', while their
> interlocutors describe the sound 'red'.  I'm trying to get us to say
> what it would mean even to do the describing.
>
> Specifying requirements for what software is supposed to do is one of
> those thankless tasks that everyone complains is never done in the
> free software community.  I am offering, earnestly, to do that.  I
> just need a few people to tell me what _they think_ the software in
> question ought to do.  I set up a mailing list.  I have solicited
> comments.  I'm not sure what else to do, but so far, I have the
> positive remarks of Jose (GORDA), the remarks of Markus (which amount
> to "this is a waste of time", unless I misread him), and nothing
> else.

I'm sorry if this sounded that negative. Defining what software is
supposed to do is certainly necessary, especially as long as replication
discussions on -hackers look like what you described above. Thus we
should better first define what we mean to make sure we are talking
about the same when speaking of 'multi-master replication' for example.

Please note that I've never raised my voice against that. I'm just
saying: it's not time for hooks or any other framework, yet. We don't 
even agree in that we need hooks to interface with the database. Even 
having to define points in code where I could hook would limit me in an 
unacceptable way, if I couldn't redefine them whenever I wanted.

> Surely, in a community that spends time on the topic of whether
> replication "should be in the back end", we oughta be able to come up
> with 10 or so people who are willing to say what "being in the back
> end" would mean.  At the moment, this trivial goal is all I'm aiming
> for.

Being in the back end for me means, I can code in C, use shared memory
and system catalogs, add another sub-process to PostgreSQL, introduce
another operation mode for (remote) backends, mess with the postmaster
and communicate to the backends via shared memory and signals (IPC).

IPC is even a good example for something which could be of use for me.
Back in April, I've sent a patch implementing internal messages passing
(see [2]).  It's a very general feature I need and, as pointed out in
the mail, it could even be of use for others. But I have no hope for it
to make it into core, because I've never seen something accepted which
could perhaps be of use in the future.

I've very well noticed that you and others offered to help in various
ways. Thank you for that. But I also got the impression that there's an
urge towards hooks or a framework or something so as PostgreSQL can
provide that and refer to it as "having everything needed" for 
replication. That sounds marketing driven, IMO.

I can assure you that I will continue to work on Postgres-R. I think its
design has been described well enough already. I will post more
design ideas for extensions and additions on the Postgres-R or on the 
replica-hooks mailing list as soon as I have them completely thought 
through and written down. And for sure I'll let you know if and how you 
or others can help me.

Regards

Markus

[1]: Tom Lane: Re: Getting a move on for 8.2 beta:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg00139.php

[2]: My Patch for IMessages:
http://archives.postgresql.org/pgsql-patches/2006-04/msg00047.php

Re: Integrating Replication into Core

From

Andrew Sullivan

Date:

28 November 2006, 12:58:41

On Tue, Nov 28, 2006 at 02:19:51PM +0100, Markus Schiltknecht wrote:
> (I note you don't count my version of Postgres-R (8), that might be 
> reasonable depending on your definition of 'having Postgres-R'.)

Yes; what I meant was "production-grade, ready to go."  I've played
with your code.  I'm mightily impressed that you managed to get it
working.  But I don't think it's ready for production use tomorrow in
the environments where this sort of availability is actually worth
the cost (think "money depends on this").  That's what I mean by
"have".

> and making it production grade software really takes a lot of time. IMO
> this is where replication solutions could work together, because all of
> them need to simulate a cluster somehow, to test their project. But this
> certainly has nothing to do with PostgreSQL Core.

I agree with you that such supporting tools would be a very good
thing.  Maybe nothing else is needed.  Like I said before, a negative
result is still a result.

> Another point for me is that the feedback I got on Postgres-R since
> Toronto is very close to zero. Some people haven't even noticed that
> there is Postgres-R code for 8.2. 

Well, part of the problem is there isn't much to say to code that I
can't look at.  I can play with it on the live CD, but so far the
source isn't on the web page at postgres-r.org, which is the only
source I know for it.  This makes the whole matter trickier for
potential adopters, because it's basically a black box.

> As you surely have noticed, I've been discussing forth and back with
> Bruce about replication for the documentation. I've been doing that
> because I wanted to clarify what 'replication' is, what we are talking
> about when we say 'multi-master replication' or 'data partitioning', etc..

Yes, I think those docs are very good.  But it's one thing to say,
"This is what replication means," &c., and quite another to say,
"Here are the sorts of things we plan to do, which have to work with
that pile of code over there."

> I'm sorry if this sounded that negative.

No, not negative.  Remember, as I said, if it turns out that we can't
actually come up with an outline of replication framework necessary
conditions, we have also discovered something.  That's a useful
result, because it tells us that the next thing we need to do
is figure out where the exclusive features are, so we can say "you
can have A or B, but not both."

> through and written down. And for sure I'll let you know if and how you 
> or others can help me.

Ok, thanks.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
When my information changes, I alter my conclusions.  What do you do sir?    --attr. John Maynard Keynes

Re: Integrating Replication into Core

From

Jeff Davis

Date:

28 November 2006, 13:30:45

On Tue, 2006-11-28 at 08:42 +0100, Markus Schiltknecht wrote:
> > You're right, there is no agreement yet. When I say "first step," I mean
> > that it's the first step toward getting any form of replication support
> > in the _backend_, _not_ a first step toward a replication solution at
> > all. 
> 
> Okay, sorry, then I misread you.
> 
> > It may be a long time before the backend has replication-specific
> > support of any kind, but many replication projects have passed the first
> > step toward replication a long time ago. 
> 
> Have they? Have you heard requests for specific additions into core from 
> any of them?
> 

I think you misread me again. I was again trying to make a distinction
between the progress of replication _for_ postgresql (which has been
very good, way past the first step) and the progress of replication
natively in the community version of the postgresql core, which has a
long way to go.

I wasn't very clear, but I don't think you actually disagree with me.

Regards,Jeff Davis

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

28 November 2006, 14:18:52

Hi,

Andrew Sullivan wrote:
> Yes; what I meant was "production-grade, ready to go."  I've played
> with your code.  I'm mightily impressed that you managed to get it
> working.  But I don't think it's ready for production use tomorrow in
> the environments where this sort of availability is actually worth
> the cost (think "money depends on this").  That's what I mean by
> "have".

Agreed.

> I agree with you that such supporting tools would be a very good
> thing.  Maybe nothing else is needed.  Like I said before, a negative
> result is still a result.

Okay.

> Well, part of the problem is there isn't much to say to code that I
> can't look at.  I can play with it on the live CD, but so far the
> source isn't on the web page at postgres-r.org, which is the only
> source I know for it.  This makes the whole matter trickier for
> potential adopters, because it's basically a black box.

Very understandable. I'm trying to find ways to open source Postgres-R.

> Yes, I think those docs are very good.  But it's one thing to say,
> "This is what replication means," &c., and quite another to say,
> "Here are the sorts of things we plan to do, which have to work with
> that pile of code over there."

ACK.

>> I'm sorry if this sounded that negative.
> 
> No, not negative.  Remember, as I said, if it turns out that we can't
> actually come up with an outline of replication framework necessary
> conditions, we have also discovered something.  That's a useful
> result, because it tells us that the next thing we need to do
> is figure out where the exclusive features are, so we can say "you
> can have A or B, but not both."

Okay.

>> through and written down. And for sure I'll let you know if and how you 
>> or others can help me.
> 
> Ok, thanks.

Thank you.

Markus

Re: Integrating Replication into Core

From

Brad Nicholson

Date:

28 November 2006, 15:23:06

On Wed, 2006-11-22 at 19:27 +0000, Simon Riggs wrote:
> On Wed, 2006-11-22 at 19:23 +0100, Markus Schiltknecht wrote:
> 
> > Jeff Davis wrote:
> > > If there is some great replication solution that a lot of people need
> > > and it will only work with a change to core, that change might make it
> > > in.
> > 
> > That's what I'm saying. Although it's hypothetical.
> 
> My interest is in extending Warm Standby [8.2] to include the following
> forms of replication:
> 1. asynchronous WAL-record level transfer to Standby server
> 2. synchronous WAL-record level transfer to Standby server
> My foresight includes that this would likely require some improvements
> in Group Commit, but I've not done the design for this *yet*.
> 
> I would also like to include some performance optimisations into Core
> that are specifically aimed at improving Slony performance. (I'm more
> than happy if those things also increase performance of other
> situations). That's slightly different thing to embedding Slony in Core,
> which I am *not* suggesting. Suggestions welcome.
> 
> This will then give PostgreSQL:
> - improved performance for the most popular production replication
> system for PostgreSQL (Slony)
> - a capability for Synchronous Replication, when it is requested
> 
> That's the limit of my ambitions for 8.3.

Very curious slony user here.  Can I ask what you have planned for 8.3
in regards to Slony performance?

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: Integrating Replication into Core

From

"Simon Riggs"

Date:

29 November 2006, 08:49:00

On Tue, 2006-11-28 at 14:22 -0500, Brad Nicholson wrote:
> On Wed, 2006-11-22 at 19:27 +0000, Simon Riggs wrote:
> > I would also like to include some performance optimisations into Core
> > that are specifically aimed at improving Slony performance. (I'm more
> > than happy if those things also increase performance of other
> > situations). That's slightly different thing to embedding Slony in Core,
> > which I am *not* suggesting. Suggestions welcome.

> Very curious slony user here.  Can I ask what you have planned for 8.3
> in regards to Slony performance?

Discussion opened on slony-general list. See you there.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: Integrating Replication into Core

From

Jim Nasby

Date:

05 December 2006, 01:48:41

On Nov 28, 2006, at 10:18 AM, Markus Schiltknecht wrote:
>> Well, part of the problem is there isn't much to say to code that I
>> can't look at.  I can play with it on the live CD, but so far the
>> source isn't on the web page at postgres-r.org, which is the only
>> source I know for it.  This makes the whole matter trickier for
>> potential adopters, because it's basically a black box.
>
> Very understandable. I'm trying to find ways to open source  
> Postgres-R.

Related to that, and your comment about people not using Postgres- 
R... I think it's going to be very, very hard to get people to  
seriously consider using Postgres-R while it's essentially a fork of  
the community code, with little/no visibility into what changes have  
been made and how they could affect data stored in the database.  
Contrast this with Slony, where there are no back-end changes and the  
trigger code (which is essentially the only thing that touches your  
live data) is readily visible just via \df+. That makes it very easy  
for people to convince themselves that Slony is unlikely to hose  
their data. Of course at this point there's enough people using Slony  
that that's no longer a concern, but back when it was introduced it  
would have been.

Given the nature of Postgres-R, I suppose there's no real way people  
could become comfortable without looking at most/all of the code,  
since it does tie pretty deeply into the backend. But that's one way  
that having published hooks would help; if you could at least put the  
code that touches the guts of the database and the source data out in  
the open, people might be more willing to give Postgres-R a try.

You also mentioned putting IPC in the backend, since it's something  
that you need. I think breaking something as complex as replication  
into smaller chunks that can stand on their own is a great idea.  
Oracle's replication does this, and I wish Slony would. Having access  
to the queuing/communications mechanism that the Slony folks have  
built would be very useful. So I'd definitely encourage making  
subsets of Postgres-R functionality available, and promoting them via  
pgFoundry.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Integrating Replication into Core

From

Markus Schiltknecht

Date:

05 December 2006, 04:08:46

Hi,

Jim Nasby wrote:
> Related to that, and your comment about people not using Postgres-R... 

I commented about the feedback I got, which would include rants about 
why it's not open source on such. But I didn't even get such responses.

I'm not supposing anybody to use Postgres-R currently. I don't use it in 
production myself. And the LiveCD currently serves mainly as an evidence 
for real code behind my words. ;-)

> I think it's going to be very, very hard to get people to seriously 
> consider using Postgres-R while it's essentially a fork of the community 
> code, with little/no visibility into what changes have been made and how 
> they could affect data stored in the database. 

Agreed.

> Given the nature of Postgres-R, I suppose there's no real way people 
> could become comfortable without looking at most/all of the code, since 
> it does tie pretty deeply into the backend. 

Most *people* use PostgreSQL in production without having ever looked at 
it's source code. Why should *they* want to look at Postgres-R sources?

I surely see that I could gain *developers* acceptance by opening up the 
source code. Please note that I'm absolutely for open source software, I 
always wanted to release my changes to Postgres-R under a BSD license 
one day.

I'm so much for open source software that I want to make a living from 
writing OSS. I simply don't know exactly how to do that, yet. So I'm 
keeping Postgres-R closed to leave me more options open.

> But that's one way that 
> having published hooks would help; if you could at least put the code 
> that touches the guts of the database and the source data out in the 
> open, people might be more willing to give Postgres-R a try.

I don't really buy that argument. It would be quite some work for me and 
not really help other developers, because the real code is still hidden 
away.

> You also mentioned putting IPC in the backend, since it's something that 
> you need. I think breaking something as complex as replication into 
> smaller chunks that can stand on their own is a great idea.

Agreed.

But once again, responses on my trivial IMessages implementations 
were... zero. Not even complaints about how lacking it is. Or discussing 
performance of pipes vs. this shared memory message passing approach. 
Nothing. Why should I work on something nobody else seems to be 
interested in?

> Oracle's 
> replication does this, and I wish Slony would. Having access to the 
> queuing/communications mechanism that the Slony folks have built would 
> be very useful. So I'd definitely encourage making subsets of Postgres-R 
> functionality available, and promoting them via pgFoundry.

Agreed.

I myself have thought about splitting some things out (i.e. this IPC 
stuff, another chunk to split out could be the GCS interface). It could 
make testing and development easier. But making it available via 
pgFoundry and promoting it as a separate project is another story which 
certainly depends on some interested people asking for it.

If Linus didn't get any answers to his famous post "What would you like 
to see most in minix?" he most probably wouldn't have published Linux.

Regards

Markus

Re: Integrating Replication into Core

From

Andrew Sullivan

Date:

06 December 2006, 19:33:46

On Sun, Dec 03, 2006 at 10:04:46PM -0800, Jim Nasby wrote:
> Oracle's replication does this, and I wish Slony would. Having access  
> to the queuing/communications mechanism that the Slony folks have  
> built would be very useful.

Abstraction patches are welcome ;-)

Seriously, though, part of what I'm attempting to achieve (and that
it keeps happening here suggests to me that another list was a bad
idea) is to identify these _elements_.  Then we can recycle them,
after all.

A
-- 
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.    --H.W. Fowler