Thread: PostgreSQL + Replicator developer meeting 10/28

PostgreSQL + Replicator developer meeting 10/28

From
Joshua Drake
Date:
With the recent open sourcing of Replicator, the team has been trying
to come up with ways to ensure an open development process. In that
light we have decided to have our first release 1.9 meeting on
Freenode. All people interested in participating in a discussion about
the upcoming Replicator 1.9 are welcome to attend. The current topics
are:

   * New MCP architecture   * DDL Replication   * Release timeline   * Questions


Replicator is set to be a short cycle release, hopefully landing before
PostgreSQL 8.4. It will support PostgreSQL 8.3 and PostgreSQL 8.4 (when
8.4 is available). We will be meeting in the #replicator channel at
10:00 AM PDT...

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/




Re: PostgreSQL + Replicator developer meeting 10/28

From
Hannu Krosing
Date:
On Mon, 2008-10-27 at 13:42 -0700, Joshua Drake wrote:
> With the recent open sourcing of Replicator, the team has been trying
> to come up with ways to ensure an open development process. In that
> light we have decided to have our first release 1.9 meeting on
> Freenode. All people interested in participating in a discussion about
> the upcoming Replicator 1.9 are welcome to attend.

I missed the meeting at #replicator, but still have some questions
* Is there a mailing list for replicator ?

>  The current topics are:
>
>     * New MCP architecture

What's new ? 

I have some doubts about the current architecture based on my reading of
replicator wiki, but would like to learn about the "new" architecture
before spending too much time in studying the "old" one.

>     * DDL Replication

Is it there alread, or is it just  a planned feature ?

>     * Release timeline
>     * Questions
* How hard would it be to extract DDL replication part and use it as  basis for DDL after trigger support for use in
triggerbased  replication/auditing like Slony ann pgQ/Londiste ?
 
* special DDL triggers capturing DDL statement source could also     be an alternative to full DDL triggers.

> Replicator is set to be a short cycle release, hopefully landing before
> PostgreSQL 8.4. It will support PostgreSQL 8.3 and PostgreSQL 8.4 (when
> 8.4 is available). We will be meeting in the #replicator channel at
> 10:00 AM PDT...


--------------
Hannu




Re: PostgreSQL + Replicator developer meeting 10/28

From
Joshua Drake
Date:
On Tue, 28 Oct 2008 19:46:42 +0200
Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> On Mon, 2008-10-27 at 13:42 -0700, Joshua Drake wrote:
> > With the recent open sourcing of Replicator, the team has been
> > trying to come up with ways to ensure an open development process.
> > In that light we have decided to have our first release 1.9 meeting
> > on Freenode. All people interested in participating in a discussion
> > about the upcoming Replicator 1.9 are welcome to attend.
> 
> I missed the meeting at #replicator, but still have some questions

Meeting is still going on. I will answer your questions and post
results this afternoon.

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/




Re: PostgreSQL + Replicator developer meeting 10/28

From
Devrim GÜNDÜZ
Date:
On Tue, 2008-10-28 at 19:46 +0200, Hannu Krosing wrote:
>
>  * Is there a mailing list for replicator ?

https://lists.commandprompt.com/mailman/listinfo/replicator-general

Regards,
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr                  http://www.gunduz.org

Re: PostgreSQL + Replicator developer meeting 10/28

From
Hannu Krosing
Date:
On Tue, 2008-10-28 at 11:01 -0700, Joshua Drake wrote:
> On Tue, 28 Oct 2008 19:46:42 +0200
> Hannu Krosing <hannu@2ndQuadrant.com> wrote:
> 
> > On Mon, 2008-10-27 at 13:42 -0700, Joshua Drake wrote:
> > > With the recent open sourcing of Replicator, the team has been
> > > trying to come up with ways to ensure an open development process.
> > > In that light we have decided to have our first release 1.9 meeting
> > > on Freenode. All people interested in participating in a discussion
> > > about the upcoming Replicator 1.9 are welcome to attend.
> > 
> > I missed the meeting at #replicator, but still have some questions
> 
> Meeting is still going on. 

Heh. Seems I underestimated the time difference :)

> I will answer your questions and post results this afternoon.

Thanks!

----------------
Hannu



Re: PostgreSQL + Replicator developer meeting 10/28

From
Joshua Drake
Date:
On Tue, 28 Oct 2008 19:46:42 +0200
Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> >  The current topics are:
> >
> >     * New MCP architecture
> 
> What's new ? 
> 
> I have some doubts about the current architecture based on my reading
> of replicator wiki, but would like to learn about the "new"
> architecture before spending too much time in studying the "old" one.

The two obvious problems with the existing MCP architecture is:
1. Single point of failure2. Portability

The new architecture is set to remove both of those. The short version
is the MCP will be moved into the backend. Thus:

Master->MCP|Slave ->Slave1                 ->Slave2                 ->Slave3

The process being, Master sends data to MCP|Slave, MCP|Slave writes it
to disk (optionally restores it) and then forwards it to 1,2,3 who then
receive the data, write it to disk and then restore it.

If master dies, you can promote to any of the slaves and the left over
slaves will connect to the promoted slave and begin receiving updates.

If the MCP|Slave dies a new Slave can begin the MCP|Slave process.

Alvaro or Alexey can speak more technically about implementation than I
can.

> 
> >     * DDL Replication
> 
> Is it there alread, or is it just  a planned feature ?

Planned feature.

> 
>  * How hard would it be to extract DDL replication part and use it as
>    basis for DDL after trigger support for use in trigger based
>    replication/auditing like Slony ann pgQ/Londiste ?

Hmm I am not sure. We are pretty deep into the core and only use
triggers for GRANT/REVOKE/CREATE ROLE .

Sincerely,

Joshua D. Drake



-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/




Re: PostgreSQL + Replicator developer meeting 10/28

From
Hannu Krosing
Date:
On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:
> On Tue, 28 Oct 2008 19:46:42 +0200
> Hannu Krosing <hannu@2ndQuadrant.com> wrote:
> 
> > >  The current topics are:
> > >
> > >     * New MCP architecture
> > 
> > What's new ? 
> > 
> > I have some doubts about the current architecture based on my reading
> > of replicator wiki, but would like to learn about the "new"
> > architecture before spending too much time in studying the "old" one.
> 
> The two obvious problems with the existing MCP architecture is:
> 
>  1. Single point of failure

For async replication there is always SPoF, at least the master until
first slave has aquired log is a SPoF, or do you plan that both Master
and "MCP|Slave" to keep the log and be able to step in for each other if
the other fails?

>  2. Portability

Portability to where ? Other DBMS's ? Other PG versions ?


for me there was also two more problems:

3. separate "replication log", which at least seems to be able to get
out of sync with main DB. 

Why don't you just use a DB table, WAL-logged and all

4. Also, again from reading Replicator FAQ, it seems that there is a
window of corruption/data loss when rotating the Replicators transaction
log. I think that doing it with copy/truncate either needs locking the
logfile (== bad performance, during copy/truncate) or is just a
data-eating failure waiting to happen.

pgQ has a solution to that by rotating queue tables, and postgreSQL core
also does effectively "rotate" WAL log segments. To be robust _and_
effective, Replicator should also rotate the logfile itself.

> The new architecture is set to remove both of those. The short version
> is the MCP will be moved into the backend. Thus:
> 
> Master->MCP|Slave ->Slave1
>                   ->Slave2
>                   ->Slave3
> 
> The process being, Master sends data to MCP|Slave, MCP|Slave writes it
> to disk (optionally restores it)

Will this first send be sync or async ? Or have you planned to have it
be configurable among several robustness vs. performance levels, similar
to the planned integrated WAL-shipping.

if async, will it also use MVCC for keeping log on Master (l.ike Slony
and pgQ do), just to be at least as reliable as postgreSQL core itself
and not require a full resync at server crash.

> and then forwards it to 1,2,3 who then
> receive the data, write it to disk and then restore it.
> 
> If master dies, you can promote to any of the slaves and the left over
> slaves will connect to the promoted slave and begin receiving updates.
> 
> If the MCP|Slave dies a new Slave can begin the MCP|Slave process.
> 
> Alvaro or Alexey can speak more technically about implementation than I
> can.

Alvaro - I guess you already have discussed most of it, but basically
you need to solve all the same problems that WAL-shipping based Hot
Standby is solving and Slony/pgQ/Londiste have solved.

Hopefully you get it more robust than Slony when making changes under
high load :)

Will there be an helper application for setting up and configuring
changes in replication. or will it all be done using added SQL
commands ?

How will DDL be handled ( i understood that you don't yet have DDL
replication) ?

Will Slave tables be kind-of-read-only like Slony slaves ? Or even
_really_ read only like Simon's Hot Standby ?

> > 
> > >     * DDL Replication
> > 
> > Is it there alread, or is it just  a planned feature ?
> 
> Planned feature.

Did the plans got any clearer during this meeting ?
> >  * How hard would it be to extract DDL replication part and use it as
> >    basis for DDL after trigger support for use in trigger based
> >    replication/auditing like Slony ann pgQ/Londiste ?
> 
> Hmm I am not sure. We are pretty deep into the core and only use
> triggers for GRANT/REVOKE/CREATE ROLE .

By the way, why did you choose "pretty deep into the core" approach
instead of triggers ?

I mean, you probably end up duplicating (or missing) lots of
postgreSQL-s internal goodness instead of just using what is already
available ?

> Sincerely,
> 
> Joshua D. Drake

Thanks for the update.

I hope something useful will come out of this too, though I hoped that
it already had some advantages over trigger-based replication, like
ability to replicate DDL .

-----------------------
Hannu Krosing







Re: PostgreSQL + Replicator developer meeting 10/28

From
Alvaro Herrera
Date:
Hannu Krosing wrote:
> On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:

> > The two obvious problems with the existing MCP architecture is:
> > 
> >  1. Single point of failure
> 
> For async replication there is always SPoF, at least the master until
> first slave has aquired log is a SPoF, or do you plan that both Master
> and "MCP|Slave" to keep the log and be able to step in for each other if
> the other fails?

Yeah, with the new architecture there is still going to be a bit of a
SPoF in the master->MCP but it's a lot smaller than the current setup,
in which if you lose the MCP you basically lose everything.

> >  2. Portability
> 
> Portability to where ? Other DBMS's ? Other PG versions ?

Other operating systems mainly.  The trouble is we never got around to
porting the MCP to any OS beyond Linux; I think it should work on
Solaris and BSDs, but surely not Windows.  We want to just get rid of
what I consider a (crappy) reimplementation of postmaster; instead we
should just let postmaster do the job.

Additionally we would get rid of the ugly way we "import" backend code
into the MCP server.


> for me there was also two more problems:
> 
> 3. separate "replication log", which at least seems to be able to get
> out of sync with main DB. 
> 
> Why don't you just use a DB table, WAL-logged and all

The whole replication log thing is a topic of dissent in the team ;-)

> 4. Also, again from reading Replicator FAQ, it seems that there is a
> window of corruption/data loss when rotating the Replicators transaction
> log. I think that doing it with copy/truncate either needs locking the
> logfile (== bad performance, during copy/truncate) or is just a
> data-eating failure waiting to happen.

Hmm, what Replicator FAQ?  We used to have this copy/truncate problem,
and we rearchitected the log to avoid this (we use a rotating setup
now)

> > Master->MCP|Slave ->Slave1
> >                   ->Slave2
> >                   ->Slave3
> > 
> > The process being, Master sends data to MCP|Slave, MCP|Slave writes it
> > to disk (optionally restores it)
> 
> Will this first send be sync or async ? Or have you planned to have it
> be configurable among several robustness vs. performance levels, similar
> to the planned integrated WAL-shipping.

It is async, and we haven't talked about sync.

> if async, will it also use MVCC for keeping log on Master (l.ike Slony
> and pgQ do), just to be at least as reliable as postgreSQL core itself
> and not require a full resync at server crash.

You mean WAL?  We don't currently.


> > Alvaro or Alexey can speak more technically about implementation than I
> > can.
> 
> Alvaro - I guess you already have discussed most of it, but basically
> you need to solve all the same problems that WAL-shipping based Hot
> Standby is solving and Slony/pgQ/Londiste have solved.

If you mean that we're duplicating the effort that's already going
elsewhere, my opinion is yes, we are.

> Hopefully you get it more robust than Slony when making changes under
> high load :)

Hmm, I don't know about lack of robustness in Slony, so I don't know.

> Will there be an helper application for setting up and configuring
> changes in replication. or will it all be done using added SQL
> commands ?

Well, the interface I work on is all SQL commands :-)

> How will DDL be handled ( i understood that you don't yet have DDL
> replication) ?

We don't have it yet.  However, since we can just add any code in any
place we like, and that we have a protocol to transmit changes, it is
relatively easy to add calls to collect the needed information and
replay it on the slave.

> Will Slave tables be kind-of-read-only like Slony slaves ? Or even
> _really_ read only like Simon's Hot Standby ?

Heh -- they are read only, and they turn into read-write when the slave
promotes.  I'm not sure what kind does that make it :-)


> > > >     * DDL Replication
> > > 
> > > Is it there alread, or is it just  a planned feature ?
> > 
> > Planned feature.
> 
> Did the plans got any clearer during this meeting ?

Not really; we didn't talk about that.


> > Hmm I am not sure. We are pretty deep into the core and only use
> > triggers for GRANT/REVOKE/CREATE ROLE .
> 
> By the way, why did you choose "pretty deep into the core" approach
> instead of triggers ?

Speed maybe?  I don't know.

> I mean, you probably end up duplicating (or missing) lots of
> postgreSQL-s internal goodness instead of just using what is already
> available ?

Yeah.


> I hope something useful will come out of this too, though I hoped that
> it already had some advantages over trigger-based replication, like
> ability to replicate DDL .

I fear that our approach to replication is so ad-hoc that there's not
much to be gained from elsewhere.  Replicator is pretty much a fork
that's not likely to yield anything useful to upstream.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: PostgreSQL + Replicator developer meeting 10/28

From
"Joshua D. Drake"
Date:
Alvaro Herrera wrote:
> Hannu Krosing wrote:
>> On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:

>> Will there be an helper application for setting up and configuring
>> changes in replication. or will it all be done using added SQL
>> commands ?
> 
> Well, the interface I work on is all SQL commands :-)
>

Case in point. To replicate a table currently you do this:

ALTER TABLE foo ENABLE REPLICATION;
ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0;

> 
>> Will Slave tables be kind-of-read-only like Slony slaves ? Or even
>> _really_ read only like Simon's Hot Standby ?
> 
> Heh -- they are read only, and they turn into read-write when the slave
> promotes.  I'm not sure what kind does that make it :-)

The relation is locked read only if the relation is considered 
replicated. If it is not replicated you could write to it.

>> Did the plans got any clearer during this meeting ?
> 
> Not really; we didn't talk about that.

Right the meeting was about two specific things. Which are the core 
features for 1.9. MCP architecture and DDL replication.

>>> Hmm I am not sure. We are pretty deep into the core and only use
>>> triggers for GRANT/REVOKE/CREATE ROLE .
>> By the way, why did you choose "pretty deep into the core" approach
>> instead of triggers ?
> 
> Speed maybe?  I don't know.

Well actually that was my decision years ago and it was made for two 
reasons:

1. Speed
2. Lots of people think trigger replication is a hack. (not interested 
in arguing just stating an observation).

One of the core requirements of the original replicator which can still 
be seen today AND is a requirement of 1.9 as well is:

Though shall not cause grief to thy master even if thy replicates many 
children.

That is where the original idea of the MCP came from. They theory was, 
we could have 50 slaves and the master wouldn't care.

> 
>> I mean, you probably end up duplicating (or missing) lots of
>> postgreSQL-s internal goodness instead of just using what is already
>> available ?
> 

Well yes and no. We have made more and more use of the internal 
postgresql code through 1.7, 1.8. I expect that trend will continue. A 
hairy wart would be the use of own log, but hey its BSD if someone can 
figure out how to make it work with WAL and not lose the feature set we 
have I would not vote against it and would happily except the patch 
(assuming technical sign off by Alvaro and Alexey).

> 
>> I hope something useful will come out of this too, though I hoped that
>> it already had some advantages over trigger-based replication, like

Well it does. We can replicate large objects for example and our user 
experience is far more friendly than anything else.

>> ability to replicate DDL .
> 

That is what 1.9 is all about. Remember that the "hope" (no laughing 
now) is that 1.9 will hit for 8.3 and 8.4 around the time 8.4 releases. 
So this isn't a year away.

> I fear that our approach to replication is so ad-hoc that there's not
> much to be gained from elsewhere.  Replicator is pretty much a fork
> that's not likely to yield anything useful to upstream.

That is probably true.

Sincerely,

Joshua D. Drake




Re: PostgreSQL + Replicator developer meeting 10/28

From
Hannu Krosing
Date:
On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote:
> Hannu Krosing wrote:
> > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:
> 
> > > The two obvious problems with the existing MCP architecture is:
> > > 
> > >  1. Single point of failure
> > 
> > For async replication there is always SPoF, at least the master until
> > first slave has aquired log is a SPoF, or do you plan that both Master
> > and "MCP|Slave" to keep the log and be able to step in for each other if
> > the other fails?
> 
> Yeah, with the new architecture there is still going to be a bit of a
> SPoF in the master->MCP but it's a lot smaller than the current setup,
> in which if you lose the MCP you basically lose everything.
> 
> > >  2. Portability
> > 
> > Portability to where ? Other DBMS's ? Other PG versions ?
> 
> Other operating systems mainly.  The trouble is we never got around to
> porting the MCP to any OS beyond Linux; I think it should work on
> Solaris and BSDs, but surely not Windows.  We want to just get rid of
> what I consider a (crappy) reimplementation of postmaster; instead we
> should just let postmaster do the job.
> 
> Additionally we would get rid of the ugly way we "import" backend code
> into the MCP server.
> 
> 
> > for me there was also two more problems:
> > 
> > 3. separate "replication log", which at least seems to be able to get
> > out of sync with main DB. 
> > 
> > Why don't you just use a DB table, WAL-logged and all
> 
> The whole replication log thing is a topic of dissent in the team ;-)

I see. To work reliably, the replication log should work very similar to
WAL, so why just not use a table + WAL, or if you want extra performance
from storing it on a separate disk, then work on having multiple WAL's
in backend ;)

> > 4. Also, again from reading Replicator FAQ, it seems that there is a
> > window of corruption/data loss when rotating the Replicators transaction
> > log. I think that doing it with copy/truncate either needs locking the
> > logfile (== bad performance, during copy/truncate) or is just a
> > data-eating failure waiting to happen.
> 
> Hmm, what Replicator FAQ?  We used to have this copy/truncate problem,
> and we rearchitected the log to avoid this (we use a rotating setup
> now)

it was in subsection "mcp_server mysteriously dies"
http://www.commandprompt.com/products/mammothreplicator/tips , 

> > > Master->MCP|Slave ->Slave1
> > >                   ->Slave2
> > >                   ->Slave3
> > > 
> > > The process being, Master sends data to MCP|Slave, MCP|Slave writes it
> > > to disk (optionally restores it)
> > 
> > Will this first send be sync or async ? Or have you planned to have it
> > be configurable among several robustness vs. performance levels, similar
> > to the planned integrated WAL-shipping.
> 
> It is async, and we haven't talked about sync.
> 
> > if async, will it also use MVCC for keeping log on Master (l.ike Slony
> > and pgQ do), just to be at least as reliable as postgreSQL core itself
> > and not require a full resync at server crash.
> 
> You mean WAL?  We don't currently.

So hopw do you cope with possible loss of sync on master crash ?

> > > Alvaro or Alexey can speak more technically about implementation than I
> > > can.
> > 
> > Alvaro - I guess you already have discussed most of it, but basically
> > you need to solve all the same problems that WAL-shipping based Hot
> > Standby is solving and Slony/pgQ/Londiste have solved.
> 
> If you mean that we're duplicating the effort that's already going
> elsewhere, my opinion is yes, we are.

duplicating the effort is not always a bad thing. I was mostly
suggesting to watch discussions and dig around in materials and/or
asking people who have been working on these same issues.

And of course to _think_ deeply about design before writing lots of
duplicate code which ends up being an often inferior implementation of
something that already exists, ( see:
http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx )
;-)


> > Hopefully you get it more robust than Slony when making changes under
> > high load :)
> 
> Hmm, I don't know about lack of robustness in Slony, so I don't know.

Slony is brittle once you start using it under high load and tends to
display all kinds of frustrating qualities 
1) it has not enough controls put in for conf changes  to guarantee
either success or clean rollback, do if something goes wrong (like some
conf change has not propagated to all nodes, in right order, you end up
with no working replication.
2) you usually can't test for 1) on your test setup, as it happens only
under really high loads, which most test setups don't provide.

there are/were other warts (like forcing an index scan covering the
whole table, or being unable to continue replication after some slonik
downtime because postgreSQL would give query too complex errors on
generated 700kb lobg query), some of which are fixed in 1.x, some are
maybe fixed in 2.0. 

I was a heavy user (at Skype) at some point and have helped in fixing
some. But in the end we could not figure out how to make it robust and
extracted the good stuff for pgQ and wrote our own replication based on
that, which we could make perform and be robust when changing conf.


> > Will there be an helper application for setting up and configuring
> > changes in replication. or will it all be done using added SQL
> > commands ?
> 
> Well, the interface I work on is all SQL commands :-)
> 
> > How will DDL be handled ( i understood that you don't yet have DDL
> > replication) ?
> 
> We don't have it yet.  However, since we can just add any code in any
> place we like, and that we have a protocol to transmit changes, it is
> relatively easy to add calls to collect the needed information and
> replay it on the slave.

Do you transmit changes to and apply changes on slave as binary or as
SQL statements ? 

Do slaves also have to be modified just to receive changes ?

I think the hairy part will be getting the order of commands _exactly_
right (like Hot Standby again), but if you are similar to
Slony/pgQ/Londiste in that you just transfer logical changes, not
physical page-level changes, then the DDL locking on master may be
enough to guarantee the right order. That is assuming that you already
can guarantee right (commit "time") order on slaves. this is not the
same as transaction start order, which may give wrong/inconsistent data
states. 

> > Will Slave tables be kind-of-read-only like Slony slaves ? Or even
> > _really_ read only like Simon's Hot Standby ?
> 
> Heh -- they are read only, and they turn into read-write when the
> slave
> promotes.  I'm not sure what kind does that make it :-)

This seems similar to Hot Standby. Slony enforces write-only using
triggers and it can be circumvented by telling these triggers that you
are a sloni replication process yourself.


-- 
------------------------------------------
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training



Re: PostgreSQL + Replicator developer meeting 10/28

From
Hannu Krosing
Date:
On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: 
> Alvaro Herrera wrote:
> > Hannu Krosing wrote:
> >> On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:
> 
> >> Will there be an helper application for setting up and configuring
> >> changes in replication. or will it all be done using added SQL
> >> commands ?
> > 
> > Well, the interface I work on is all SQL commands :-)
> >
> 
> Case in point. To replicate a table currently you do this:
> 
> ALTER TABLE foo ENABLE REPLICATION;
> ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0;

How do you define SLAVE 0 ?

That is, do you use something similar to SQL/MED , discussed currently
on other thread on pgsql-hackers, or just tables, or external conf
files, or ... ?

> >>> Hmm I am not sure. We are pretty deep into the core and only use
> >>> triggers for GRANT/REVOKE/CREATE ROLE .
> >> By the way, why did you choose "pretty deep into the core" approach
> >> instead of triggers ?
> > 
> > Speed maybe?  I don't know.
> 
> Well actually that was my decision years ago and it was made for two 
> reasons:
> 
> 1. Speed

Is this some MySQL kind of speed, achieved by compromising on ACID ?

> 2. Lots of people think trigger replication is a hack. (not interested 
> in arguing just stating an observation).

And making the illogical conclusion that non-trigger replication is not a hack ?

Probably the same people who think that "expensive" and "closed source" 
automatically means high quality :)

Actually I see trigger based replication as a power tool. You can make it (almost) 
as fast as any source level hack, even if the latter compromises on on ACID, plus 
you can be much more flexible on what/when/how you replicate.

> One of the core requirements of the original replicator which can still 
> be seen today AND is a requirement of 1.9 as well is:
> 
> Though shall not cause grief to thy master even if thy replicates many 
> children.

IOW, one slave and many slaves cause similar load on master.

And you do it by initially replicating to a single "distributor" slave ?

> That is where the original idea of the MCP came from. They theory was, 
> we could have 50 slaves and the master wouldn't care.

A good goal. But why would anybody _need_ 50 slaves ? 

Perhaps some application with small data amount (meaning it fits on one server) 
and massively (100:1) skewed towards reading, like dynamically generated 
mostly-static-content web pages ?

> >> I mean, you probably end up duplicating (or missing) lots of
> >> postgreSQL-s internal goodness instead of just using what is already
> >> available ?
> > 
> 
> Well yes and no. We have made more and more use of the internal 
> postgresql code through 1.7, 1.8. I expect that trend will continue. A 
> hairy wart would be the use of own log, but hey its BSD if someone can 
> figure out how to make it work with WAL and not lose the feature set we 
> have 

Why not just make a new kind of wal-logged heap, which stores your version of trx log ?

> I would not vote against it and would happily except the patch 
> (assuming technical sign off by Alvaro and Alexey).
>
> >> I hope something useful will come out of this too, though I hoped that
> >> it already had some advantages over trigger-based replication, like
> 
> Well it does. We can replicate large objects for example 

I guess we get more benefit to general public in long run by making it possible 
to have triggers on large objects (and  system objects or DLL statements).

They could be some kind of restricted or "system" triggers, at least for DDL, but 
exposing the mechanisms to power users without them having to start patching backend 
code would be a good thing.

> and our user 
> experience is far more friendly than anything else.

Do you mean that you have a friendly UI for beginners ?

Or is it proved to be robust in continued use, with changes going smoothly 
in and automatic recovery from network outages and random reboots on live 
systems in 24/7 environment ?

> >> ability to replicate DDL .
> > 
> 
> That is what 1.9 is all about. Remember that the "hope" (no laughing 
> now) is that 1.9 will hit for 8.3 and 8.4 around the time 8.4 releases. 
> So this isn't a year away.
> 
> > I fear that our approach to replication is so ad-hoc that there's not
> > much to be gained from elsewhere.  Replicator is pretty much a fork
> > that's not likely to yield anything useful to upstream.
> 
> That is probably true.

If/when you get DDL replication done (and tested, and robust, ... :) ), it 
at least solves the following two problems for others as well

1) where to get at DDL statements at right time

2) how to put them in replication stream in right place/ right order

Having these solved is a big thing in itself, even if no actual code reuse is possible.

It may be easier to get nr 2) right in a more flexible trigger-based environment, 
so I try to talk Marko into looking at it for pgQ as well ;)


-- 
------------------------------------------
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training



Re: PostgreSQL + Replicator developer meeting 10/28

From
Andrew Sullivan
Date:
On Wed, Oct 29, 2008 at 12:02:20PM +0200, Hannu Krosing wrote:
> A good goal. But why would anybody _need_ 50 slaves ? 

They might have a contractual responsibility for extremely wide
geographic distribution.  Or they might be building an application
that needs extremely wide network-topological distribution to avoid
large loads on any one network.  For instance, I can imagine building
a network of nameservers in which you peered the nameservers,
colocated in every ISP you could think of.  If you were backing the
nameserver with Postgres, this would work.  To be clear, this is _not_
the case with any product I've ever built, but it is a design I have
seen deployed.  That design was supposed to be on top of Oracle.
There were well over 50 slaves.  I don't really believe they had that
many Oracle-using slaves, though.

A


-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/


Re: PostgreSQL + Replicator developer meeting 10/28

From
"Joshua D. Drake"
Date:
On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote:
> On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: 
> > Alvaro Herrera wrote:
> > Case in point. To replicate a table currently you do this:
> > 
> > ALTER TABLE foo ENABLE REPLICATION;
> > ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0;
> 
> How do you define SLAVE 0 ?

SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think.
( I would have to check the docs)

> > Well actually that was my decision years ago and it was made for two 
> > reasons:
> > 
> > 1. Speed
> 
> Is this some MySQL kind of speed, achieved by compromising on ACID ?
> 

No this is the kind that realizes that throwing triggers on 300
relations is a slow way to go about replication.

> Actually I see trigger based replication as a power tool. You can make it (almost) 
> as fast as any source level hack, even if the latter compromises on on ACID, plus 
> you can be much more flexible on what/when/how you replicate.
> 

Like I said, really not interested in the trigger versus not discussion.

> One of the core requirements of the original replicator which can still 
> > be seen today AND is a requirement of 1.9 as well is:
> > 
> > Though shall not cause grief to thy master even if thy replicates many 
> > children.
> 
> IOW, one slave and many slaves cause similar load on master.

One slave doesn't effect the load on the master any more than fifty.

> 
> And you do it by initially replicating to a single "distributor" slave ?
> 

Essentially.

> > That is where the original idea of the MCP came from. They theory was, 
> > we could have 50 slaves and the master wouldn't care.
> 
> A good goal. But why would anybody _need_ 50 slaves ? 
> 
> Perhaps some application with small data amount (meaning it fits on one server) 
> and massively (100:1) skewed towards reading, like dynamically generated 
> mostly-static-content web pages ?

Well it was just a number. It could be any number but yes your example
would fit.

> Why not just make a new kind of wal-logged heap, which stores your version of trx log ?
> 

That would be a question for Alvaro or Alexey not I :)

> > and our user 
> > experience is far more friendly than anything else.
> 
> Do you mean that you have a friendly UI for beginners ?
> 

Not just beginners. People don't like complicated software. Generally
speaking, if you go to a DBA of other systems, someone with 10 years
experience and you show them Slony and they look at you like your nuts.
It doesn't matter that Slony works great. It doesn't matter that it is
proven in the field. 

> Or is it proved to be robust in continued use, with changes going smoothly 
> in and automatic recovery from network outages and random reboots on live 
> systems in 24/7 environment ?
> 

Well we have had plenty of people use it although I admit over the last
year we have been weening them off in preparation for 1.8. 1.8 is a big
step up from previous releases and yes it has some issues all software
does but we are working on them.

Sincerely,

Joshua D. Drake


-- 



Re: PostgreSQL + Replicator developer meeting 10/28

From
"Joshua D. Drake"
Date:
On Wed, 2008-10-29 at 09:54 +0200, Hannu Krosing wrote:
> On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote:
> > Hannu Krosing wrote:
> > > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:

> it was in subsection "mcp_server mysteriously dies"
> http://www.commandprompt.com/products/mammothreplicator/tips , 
> 

Ehh, need to remove that. That is very old.

Joshua D. Drake

-- 



Re: PostgreSQL + Replicator developer meeting 10/28

From
Hannu Krosing
Date:
On Wed, 2008-10-29 at 09:01 -0700, Joshua D. Drake wrote:
> On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote:
> > On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: 
> > > Alvaro Herrera wrote:
> > > Case in point. To replicate a table currently you do this:
> > > 
> > > ALTER TABLE foo ENABLE REPLICATION;
> > > ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0;
> > 
> > How do you define SLAVE 0 ?
> 
> SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think.
> ( I would have to check the docs)

And the only other GUC slave needs for replication is MCP/Slave connect
info ?

> > > Well actually that was my decision years ago and it was made for two 
> > > reasons:
> > > 
> > > 1. Speed
> > 
> > Is this some MySQL kind of speed, achieved by compromising on ACID ?
> > 
> 
> No this is the kind that realizes that throwing triggers on 300
> relations is a slow way to go about replication.

It may be slow-ish to set up initially, but in your (old) tips wikipage
you say that replicating more than 1000 relations would be prohibitively
slow.

I don't think this would be true for trigger-based replication.

And FK  checks implemented using with triggers, and I have not heard
much complaining about it being a hack, or unsuitable upwards of some
N-hundred tables.

> > Actually I see trigger based replication as a power tool. You can make it (almost) 
> > as fast as any source level hack, even if the latter compromises on on ACID, plus 
> > you can be much more flexible on what/when/how you replicate.
> 
> Like I said, really not interested in the trigger versus not discussion.

Why not ? Because you are actually doing it with triggers, just not
standard postgreSQL table-level triggers, but your own ones patched into
backend ?

When doing it without triggers (or to be more exact, with triggers
hardwired into backend) you still have to solve most of the same
problems that trigger-based replication does, just with less visibility
and less help from existing infrastructure.


One more question about triggers - what happens to triggers, foreign
keys and other constraints on slave ? 


> > One of the core requirements of the original replicator which can still 
> > > be seen today AND is a requirement of 1.9 as well is:
> > > 
> > > Though shall not cause grief to thy master even if thy replicates many 
> > > children.
> > 
> > IOW, one slave and many slaves cause similar load on master.
> 
> One slave doesn't effect the load on the master any more than fifty.

But have you measured, what is impact of replication when going from 0
to 1 slaves ? 

...

> > > and our user 
> > > experience is far more friendly than anything else.
> > 
> > Do you mean that you have a friendly UI for beginners ?
> > 
> 
> Not just beginners. People don't like complicated software. Generally
> speaking, if you go to a DBA of other systems, someone with 10 years
> experience and you show them Slony and they look at you like your nuts.

I wonder why. To me Slony's UI (slonik) seems quite simplistic, the only
thing they have above what Replicator seems to have is grouping tables
so that tables inside one group are always guaranteed to be in a
consistent state on the slave.

Otherways I can't see how 

------------------
create set (id=1, origin=1, comment="Master");
set add table (set id=1, origin=1, id=1, fully qualified name = 'public.accounts');
---
store node (id=2, comment = 'Slave node');
store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER');
--
subscribe set ( id = 1, provider = 1, receiver = 2,
------------------

can be made much simpler

You can't replicate a table on Replicator by just one command:

ALTER TABLE public.accounts ENABLE REPLICATION;
ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2;

You first have to set things up by telling master and slave about 
themselves ("Thou Art the Master") and about each other.

> It doesn't matter that Slony works great. It doesn't matter that it is
> proven in the field. 

Actually slony works great for simple cases, and not-so-great for more
complex topologies or higher loads. It is relatively easy to push Slony
over by accident.

And it was Slony's apparent _simplicity_ that fooled me into believing
that it is a robust system, similar to postgreSQL proper. A belief, the
consequences of which caused me a lot of grief and frustration for about
two years

> > Or is it proved to be robust in continued use, with changes going smoothly 
> > in and automatic recovery from network outages and random reboots on live 
> > systems in 24/7 environment ?
> > 
> 
> Well we have had plenty of people use it although I admit over the last
> year we have been weening them off in preparation for 1.8. 1.8 is a big
> step up from previous releases and yes it has some issues all software
> does but we are working on them.

Is Replicator always replicating from one single master to one or more
slaves ?

Or is it possible to set it up so, that some tables are replicated from
one and some others from some other master.

I don't mean real multi-master, just that the single master is not the
same for all tables - A good example would be a sales system where each
office has its own sales table for inserting/updating data, but has a
read-only fresh copy of all other offices data via replication .

-- 
------------------------------------------
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training



Re: PostgreSQL + Replicator developer meeting 10/28

From
"Joshua D. Drake"
Date:
On Wed, 2008-10-29 at 19:15 +0200, Hannu Krosing wrote:
>  
> > SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think.
> > ( I would have to check the docs)
> 
> And the only other GUC slave needs for replication is MCP/Slave connect
> info ?

https://projects.commandprompt.com/public/replicator/wiki/Documentation


> 
> > > > Well actually that was my decision years ago and it was made for two 
> > > > reasons:
> > > > 
> > > > 1. Speed
> > > 
> > > Is this some MySQL kind of speed, achieved by compromising on ACID ?
> > > 
> > 
> > No this is the kind that realizes that throwing triggers on 300
> > relations is a slow way to go about replication.
> 
> It may be slow-ish to set up initially, but in your (old) tips wikipage
> you say that replicating more than 1000 relations would be prohibitively
> slow.
> 

Depends on the workload I would assume.


> > > Actually I see trigger based replication as a power tool. You can make it (almost) 
> > > as fast as any source level hack, even if the latter compromises on on ACID, plus 
> > > you can be much more flexible on what/when/how you replicate.
> > 
> > Like I said, really not interested in the trigger versus not discussion.
> 
> Why not ? Because you are actually doing it with triggers, just not
> standard postgreSQL table-level triggers, but your own ones patched into
> backend ?

Because it isn't productive. I am happy to answer all questions (that I
can) about usage, implementation etc... but hashing through decisions
that were made 6 years ago isn't helpful, imo.

> One more question about triggers - what happens to triggers, foreign
> keys and other constraints on slave ? 

Alvaro?

>  
> > One slave doesn't effect the load on the master any more than fifty.
> 
> But have you measured, what is impact of replication when going from 0
> to 1 slaves ? 

Not anytime recently no.

> 
> You can't replicate a table on Replicator by just one command:

Actually you can. We provide stored procedures you can use if you like.

> 
> ALTER TABLE public.accounts ENABLE REPLICATION;
> ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2;
> 
> You first have to set things up by telling master and slave about 
> themselves ("Thou Art the Master") and about each other.

Actually you have to tell the MCP but yes there is configuration to be
had. It takes about 15 minutes from start to finish for a current
master,mcp,two slave scenario.

> Is Replicator always replicating from one single master to one or more
> slaves ?

Yes.

> 
> Or is it possible to set it up so, that some tables are replicated from
> one and some others from some other master.

We do not support cascading like that but we do support partial
replication. E.g;

ALTER TABLE foo ENABLE REPLICATION;
ALTER TABLE foo ENABLE REPLICATION ON SLAVE 1;
ALTER TABLE BAR ENABLE REPLICATION;
ALTER TABLE BAR ENABLE REPLICATION ON SLAVE 2;

> 
> I don't mean real multi-master, just that the single master is not the
> same for all tables - A good example would be a sales system where each
> office has its own sales table for inserting/updating data, but has a
> read-only fresh copy of all other offices data via replication .
> 

Yes this you could do. What you can't do is this:

MASTER->[SLAVE0|MASTER]->SLAVE1

Sincerely,

Joshua D. Drake


-- 



Re: PostgreSQL + Replicator developer meeting 10/28

From
Alvaro Herrera
Date:
Hannu Krosing wrote:
> On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote:

> > If you mean that we're duplicating the effort that's already going
> > elsewhere, my opinion is yes, we are.
> 
> duplicating the effort is not always a bad thing. I was mostly
> suggesting to watch discussions and dig around in materials and/or
> asking people who have been working on these same issues.
> 
> And of course to _think_ deeply about design before writing lots of
> duplicate code which ends up being an often inferior implementation of
> something that already exists, ( see:
> http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx )
> ;-)

Are you asking whether the Replicator implementation contains gloves'
replacement?  Yes, we are.  In fact, a lot of the time I've spent on
Replicator code has been removing those (which were written by the
original Replicator developers) and just using already existing gloves.
For example we removed dealing TCP sockets directly; we now use libpq,
to which we only had to add a single support function.

> I was a heavy user (at Skype) at some point and have helped in fixing
> some. But in the end we could not figure out how to make it robust and
> extracted the good stuff for pgQ and wrote our own replication based on
> that, which we could make perform and be robust when changing conf.

A lot of the implementation effort has gone into the queue
implementation.  Personally I am not happy with it yet.  Perhaps we
should take a look at pgQ and remove our own implementation ...

> Do you transmit changes to and apply changes on slave as binary or as
> SQL statements ? 

We have a binary protocol.  We haven't worked out the details yet but it
certainly won't be SQL statements.

> Do slaves also have to be modified just to receive changes ?

Modify in what form?  Obviously they must be capable of decoding the
binary protocol ...


> I think the hairy part will be getting the order of commands _exactly_
> right (like Hot Standby again), but if you are similar to
> Slony/pgQ/Londiste in that you just transfer logical changes, not
> physical page-level changes, then the DDL locking on master may be
> enough to guarantee the right order. That is assuming that you already
> can guarantee right (commit "time") order on slaves. this is not the
> same as transaction start order, which may give wrong/inconsistent data
> states. 

Yeah, we transfer logical changes, not physical.  We enqueue transaction
data at commit time (in commit order).  I think being able to modify the
Postgres commit source code was the only reason we could do it.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support