Thread: PostgreSQL + Replicator developer meeting 10/28
With the recent open sourcing of Replicator, the team has been trying to come up with ways to ensure an open development process. In that light we have decided to have our first release 1.9 meeting on Freenode. All people interested in participating in a discussion about the upcoming Replicator 1.9 are welcome to attend. The current topics are: * New MCP architecture * DDL Replication * Release timeline * Questions Replicator is set to be a short cycle release, hopefully landing before PostgreSQL 8.4. It will support PostgreSQL 8.3 and PostgreSQL 8.4 (when 8.4 is available). We will be meeting in the #replicator channel at 10:00 AM PDT... Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
On Mon, 2008-10-27 at 13:42 -0700, Joshua Drake wrote: > With the recent open sourcing of Replicator, the team has been trying > to come up with ways to ensure an open development process. In that > light we have decided to have our first release 1.9 meeting on > Freenode. All people interested in participating in a discussion about > the upcoming Replicator 1.9 are welcome to attend. I missed the meeting at #replicator, but still have some questions * Is there a mailing list for replicator ? > The current topics are: > > * New MCP architecture What's new ? I have some doubts about the current architecture based on my reading of replicator wiki, but would like to learn about the "new" architecture before spending too much time in studying the "old" one. > * DDL Replication Is it there alread, or is it just a planned feature ? > * Release timeline > * Questions * How hard would it be to extract DDL replication part and use it as basis for DDL after trigger support for use in triggerbased replication/auditing like Slony ann pgQ/Londiste ? * special DDL triggers capturing DDL statement source could also be an alternative to full DDL triggers. > Replicator is set to be a short cycle release, hopefully landing before > PostgreSQL 8.4. It will support PostgreSQL 8.3 and PostgreSQL 8.4 (when > 8.4 is available). We will be meeting in the #replicator channel at > 10:00 AM PDT... -------------- Hannu
On Tue, 28 Oct 2008 19:46:42 +0200 Hannu Krosing <hannu@2ndQuadrant.com> wrote: > On Mon, 2008-10-27 at 13:42 -0700, Joshua Drake wrote: > > With the recent open sourcing of Replicator, the team has been > > trying to come up with ways to ensure an open development process. > > In that light we have decided to have our first release 1.9 meeting > > on Freenode. All people interested in participating in a discussion > > about the upcoming Replicator 1.9 are welcome to attend. > > I missed the meeting at #replicator, but still have some questions Meeting is still going on. I will answer your questions and post results this afternoon. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
On Tue, 2008-10-28 at 19:46 +0200, Hannu Krosing wrote: > > * Is there a mailing list for replicator ? https://lists.commandprompt.com/mailman/listinfo/replicator-general Regards, -- Devrim GÜNDÜZ, RHCE devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org
On Tue, 2008-10-28 at 11:01 -0700, Joshua Drake wrote: > On Tue, 28 Oct 2008 19:46:42 +0200 > Hannu Krosing <hannu@2ndQuadrant.com> wrote: > > > On Mon, 2008-10-27 at 13:42 -0700, Joshua Drake wrote: > > > With the recent open sourcing of Replicator, the team has been > > > trying to come up with ways to ensure an open development process. > > > In that light we have decided to have our first release 1.9 meeting > > > on Freenode. All people interested in participating in a discussion > > > about the upcoming Replicator 1.9 are welcome to attend. > > > > I missed the meeting at #replicator, but still have some questions > > Meeting is still going on. Heh. Seems I underestimated the time difference :) > I will answer your questions and post results this afternoon. Thanks! ---------------- Hannu
On Tue, 28 Oct 2008 19:46:42 +0200 Hannu Krosing <hannu@2ndQuadrant.com> wrote: > > The current topics are: > > > > * New MCP architecture > > What's new ? > > I have some doubts about the current architecture based on my reading > of replicator wiki, but would like to learn about the "new" > architecture before spending too much time in studying the "old" one. The two obvious problems with the existing MCP architecture is: 1. Single point of failure2. Portability The new architecture is set to remove both of those. The short version is the MCP will be moved into the backend. Thus: Master->MCP|Slave ->Slave1 ->Slave2 ->Slave3 The process being, Master sends data to MCP|Slave, MCP|Slave writes it to disk (optionally restores it) and then forwards it to 1,2,3 who then receive the data, write it to disk and then restore it. If master dies, you can promote to any of the slaves and the left over slaves will connect to the promoted slave and begin receiving updates. If the MCP|Slave dies a new Slave can begin the MCP|Slave process. Alvaro or Alexey can speak more technically about implementation than I can. > > > * DDL Replication > > Is it there alread, or is it just a planned feature ? Planned feature. > > * How hard would it be to extract DDL replication part and use it as > basis for DDL after trigger support for use in trigger based > replication/auditing like Slony ann pgQ/Londiste ? Hmm I am not sure. We are pretty deep into the core and only use triggers for GRANT/REVOKE/CREATE ROLE . Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: > On Tue, 28 Oct 2008 19:46:42 +0200 > Hannu Krosing <hannu@2ndQuadrant.com> wrote: > > > > The current topics are: > > > > > > * New MCP architecture > > > > What's new ? > > > > I have some doubts about the current architecture based on my reading > > of replicator wiki, but would like to learn about the "new" > > architecture before spending too much time in studying the "old" one. > > The two obvious problems with the existing MCP architecture is: > > 1. Single point of failure For async replication there is always SPoF, at least the master until first slave has aquired log is a SPoF, or do you plan that both Master and "MCP|Slave" to keep the log and be able to step in for each other if the other fails? > 2. Portability Portability to where ? Other DBMS's ? Other PG versions ? for me there was also two more problems: 3. separate "replication log", which at least seems to be able to get out of sync with main DB. Why don't you just use a DB table, WAL-logged and all 4. Also, again from reading Replicator FAQ, it seems that there is a window of corruption/data loss when rotating the Replicators transaction log. I think that doing it with copy/truncate either needs locking the logfile (== bad performance, during copy/truncate) or is just a data-eating failure waiting to happen. pgQ has a solution to that by rotating queue tables, and postgreSQL core also does effectively "rotate" WAL log segments. To be robust _and_ effective, Replicator should also rotate the logfile itself. > The new architecture is set to remove both of those. The short version > is the MCP will be moved into the backend. Thus: > > Master->MCP|Slave ->Slave1 > ->Slave2 > ->Slave3 > > The process being, Master sends data to MCP|Slave, MCP|Slave writes it > to disk (optionally restores it) Will this first send be sync or async ? Or have you planned to have it be configurable among several robustness vs. performance levels, similar to the planned integrated WAL-shipping. if async, will it also use MVCC for keeping log on Master (l.ike Slony and pgQ do), just to be at least as reliable as postgreSQL core itself and not require a full resync at server crash. > and then forwards it to 1,2,3 who then > receive the data, write it to disk and then restore it. > > If master dies, you can promote to any of the slaves and the left over > slaves will connect to the promoted slave and begin receiving updates. > > If the MCP|Slave dies a new Slave can begin the MCP|Slave process. > > Alvaro or Alexey can speak more technically about implementation than I > can. Alvaro - I guess you already have discussed most of it, but basically you need to solve all the same problems that WAL-shipping based Hot Standby is solving and Slony/pgQ/Londiste have solved. Hopefully you get it more robust than Slony when making changes under high load :) Will there be an helper application for setting up and configuring changes in replication. or will it all be done using added SQL commands ? How will DDL be handled ( i understood that you don't yet have DDL replication) ? Will Slave tables be kind-of-read-only like Slony slaves ? Or even _really_ read only like Simon's Hot Standby ? > > > > > * DDL Replication > > > > Is it there alread, or is it just a planned feature ? > > Planned feature. Did the plans got any clearer during this meeting ? > > * How hard would it be to extract DDL replication part and use it as > > basis for DDL after trigger support for use in trigger based > > replication/auditing like Slony ann pgQ/Londiste ? > > Hmm I am not sure. We are pretty deep into the core and only use > triggers for GRANT/REVOKE/CREATE ROLE . By the way, why did you choose "pretty deep into the core" approach instead of triggers ? I mean, you probably end up duplicating (or missing) lots of postgreSQL-s internal goodness instead of just using what is already available ? > Sincerely, > > Joshua D. Drake Thanks for the update. I hope something useful will come out of this too, though I hoped that it already had some advantages over trigger-based replication, like ability to replicate DDL . ----------------------- Hannu Krosing
Hannu Krosing wrote: > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: > > The two obvious problems with the existing MCP architecture is: > > > > 1. Single point of failure > > For async replication there is always SPoF, at least the master until > first slave has aquired log is a SPoF, or do you plan that both Master > and "MCP|Slave" to keep the log and be able to step in for each other if > the other fails? Yeah, with the new architecture there is still going to be a bit of a SPoF in the master->MCP but it's a lot smaller than the current setup, in which if you lose the MCP you basically lose everything. > > 2. Portability > > Portability to where ? Other DBMS's ? Other PG versions ? Other operating systems mainly. The trouble is we never got around to porting the MCP to any OS beyond Linux; I think it should work on Solaris and BSDs, but surely not Windows. We want to just get rid of what I consider a (crappy) reimplementation of postmaster; instead we should just let postmaster do the job. Additionally we would get rid of the ugly way we "import" backend code into the MCP server. > for me there was also two more problems: > > 3. separate "replication log", which at least seems to be able to get > out of sync with main DB. > > Why don't you just use a DB table, WAL-logged and all The whole replication log thing is a topic of dissent in the team ;-) > 4. Also, again from reading Replicator FAQ, it seems that there is a > window of corruption/data loss when rotating the Replicators transaction > log. I think that doing it with copy/truncate either needs locking the > logfile (== bad performance, during copy/truncate) or is just a > data-eating failure waiting to happen. Hmm, what Replicator FAQ? We used to have this copy/truncate problem, and we rearchitected the log to avoid this (we use a rotating setup now) > > Master->MCP|Slave ->Slave1 > > ->Slave2 > > ->Slave3 > > > > The process being, Master sends data to MCP|Slave, MCP|Slave writes it > > to disk (optionally restores it) > > Will this first send be sync or async ? Or have you planned to have it > be configurable among several robustness vs. performance levels, similar > to the planned integrated WAL-shipping. It is async, and we haven't talked about sync. > if async, will it also use MVCC for keeping log on Master (l.ike Slony > and pgQ do), just to be at least as reliable as postgreSQL core itself > and not require a full resync at server crash. You mean WAL? We don't currently. > > Alvaro or Alexey can speak more technically about implementation than I > > can. > > Alvaro - I guess you already have discussed most of it, but basically > you need to solve all the same problems that WAL-shipping based Hot > Standby is solving and Slony/pgQ/Londiste have solved. If you mean that we're duplicating the effort that's already going elsewhere, my opinion is yes, we are. > Hopefully you get it more robust than Slony when making changes under > high load :) Hmm, I don't know about lack of robustness in Slony, so I don't know. > Will there be an helper application for setting up and configuring > changes in replication. or will it all be done using added SQL > commands ? Well, the interface I work on is all SQL commands :-) > How will DDL be handled ( i understood that you don't yet have DDL > replication) ? We don't have it yet. However, since we can just add any code in any place we like, and that we have a protocol to transmit changes, it is relatively easy to add calls to collect the needed information and replay it on the slave. > Will Slave tables be kind-of-read-only like Slony slaves ? Or even > _really_ read only like Simon's Hot Standby ? Heh -- they are read only, and they turn into read-write when the slave promotes. I'm not sure what kind does that make it :-) > > > > * DDL Replication > > > > > > Is it there alread, or is it just a planned feature ? > > > > Planned feature. > > Did the plans got any clearer during this meeting ? Not really; we didn't talk about that. > > Hmm I am not sure. We are pretty deep into the core and only use > > triggers for GRANT/REVOKE/CREATE ROLE . > > By the way, why did you choose "pretty deep into the core" approach > instead of triggers ? Speed maybe? I don't know. > I mean, you probably end up duplicating (or missing) lots of > postgreSQL-s internal goodness instead of just using what is already > available ? Yeah. > I hope something useful will come out of this too, though I hoped that > it already had some advantages over trigger-based replication, like > ability to replicate DDL . I fear that our approach to replication is so ad-hoc that there's not much to be gained from elsewhere. Replicator is pretty much a fork that's not likely to yield anything useful to upstream. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Hannu Krosing wrote: >> On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: >> Will there be an helper application for setting up and configuring >> changes in replication. or will it all be done using added SQL >> commands ? > > Well, the interface I work on is all SQL commands :-) > Case in point. To replicate a table currently you do this: ALTER TABLE foo ENABLE REPLICATION; ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0; > >> Will Slave tables be kind-of-read-only like Slony slaves ? Or even >> _really_ read only like Simon's Hot Standby ? > > Heh -- they are read only, and they turn into read-write when the slave > promotes. I'm not sure what kind does that make it :-) The relation is locked read only if the relation is considered replicated. If it is not replicated you could write to it. >> Did the plans got any clearer during this meeting ? > > Not really; we didn't talk about that. Right the meeting was about two specific things. Which are the core features for 1.9. MCP architecture and DDL replication. >>> Hmm I am not sure. We are pretty deep into the core and only use >>> triggers for GRANT/REVOKE/CREATE ROLE . >> By the way, why did you choose "pretty deep into the core" approach >> instead of triggers ? > > Speed maybe? I don't know. Well actually that was my decision years ago and it was made for two reasons: 1. Speed 2. Lots of people think trigger replication is a hack. (not interested in arguing just stating an observation). One of the core requirements of the original replicator which can still be seen today AND is a requirement of 1.9 as well is: Though shall not cause grief to thy master even if thy replicates many children. That is where the original idea of the MCP came from. They theory was, we could have 50 slaves and the master wouldn't care. > >> I mean, you probably end up duplicating (or missing) lots of >> postgreSQL-s internal goodness instead of just using what is already >> available ? > Well yes and no. We have made more and more use of the internal postgresql code through 1.7, 1.8. I expect that trend will continue. A hairy wart would be the use of own log, but hey its BSD if someone can figure out how to make it work with WAL and not lose the feature set we have I would not vote against it and would happily except the patch (assuming technical sign off by Alvaro and Alexey). > >> I hope something useful will come out of this too, though I hoped that >> it already had some advantages over trigger-based replication, like Well it does. We can replicate large objects for example and our user experience is far more friendly than anything else. >> ability to replicate DDL . > That is what 1.9 is all about. Remember that the "hope" (no laughing now) is that 1.9 will hit for 8.3 and 8.4 around the time 8.4 releases. So this isn't a year away. > I fear that our approach to replication is so ad-hoc that there's not > much to be gained from elsewhere. Replicator is pretty much a fork > that's not likely to yield anything useful to upstream. That is probably true. Sincerely, Joshua D. Drake
On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote: > Hannu Krosing wrote: > > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: > > > > The two obvious problems with the existing MCP architecture is: > > > > > > 1. Single point of failure > > > > For async replication there is always SPoF, at least the master until > > first slave has aquired log is a SPoF, or do you plan that both Master > > and "MCP|Slave" to keep the log and be able to step in for each other if > > the other fails? > > Yeah, with the new architecture there is still going to be a bit of a > SPoF in the master->MCP but it's a lot smaller than the current setup, > in which if you lose the MCP you basically lose everything. > > > > 2. Portability > > > > Portability to where ? Other DBMS's ? Other PG versions ? > > Other operating systems mainly. The trouble is we never got around to > porting the MCP to any OS beyond Linux; I think it should work on > Solaris and BSDs, but surely not Windows. We want to just get rid of > what I consider a (crappy) reimplementation of postmaster; instead we > should just let postmaster do the job. > > Additionally we would get rid of the ugly way we "import" backend code > into the MCP server. > > > > for me there was also two more problems: > > > > 3. separate "replication log", which at least seems to be able to get > > out of sync with main DB. > > > > Why don't you just use a DB table, WAL-logged and all > > The whole replication log thing is a topic of dissent in the team ;-) I see. To work reliably, the replication log should work very similar to WAL, so why just not use a table + WAL, or if you want extra performance from storing it on a separate disk, then work on having multiple WAL's in backend ;) > > 4. Also, again from reading Replicator FAQ, it seems that there is a > > window of corruption/data loss when rotating the Replicators transaction > > log. I think that doing it with copy/truncate either needs locking the > > logfile (== bad performance, during copy/truncate) or is just a > > data-eating failure waiting to happen. > > Hmm, what Replicator FAQ? We used to have this copy/truncate problem, > and we rearchitected the log to avoid this (we use a rotating setup > now) it was in subsection "mcp_server mysteriously dies" http://www.commandprompt.com/products/mammothreplicator/tips , > > > Master->MCP|Slave ->Slave1 > > > ->Slave2 > > > ->Slave3 > > > > > > The process being, Master sends data to MCP|Slave, MCP|Slave writes it > > > to disk (optionally restores it) > > > > Will this first send be sync or async ? Or have you planned to have it > > be configurable among several robustness vs. performance levels, similar > > to the planned integrated WAL-shipping. > > It is async, and we haven't talked about sync. > > > if async, will it also use MVCC for keeping log on Master (l.ike Slony > > and pgQ do), just to be at least as reliable as postgreSQL core itself > > and not require a full resync at server crash. > > You mean WAL? We don't currently. So hopw do you cope with possible loss of sync on master crash ? > > > Alvaro or Alexey can speak more technically about implementation than I > > > can. > > > > Alvaro - I guess you already have discussed most of it, but basically > > you need to solve all the same problems that WAL-shipping based Hot > > Standby is solving and Slony/pgQ/Londiste have solved. > > If you mean that we're duplicating the effort that's already going > elsewhere, my opinion is yes, we are. duplicating the effort is not always a bad thing. I was mostly suggesting to watch discussions and dig around in materials and/or asking people who have been working on these same issues. And of course to _think_ deeply about design before writing lots of duplicate code which ends up being an often inferior implementation of something that already exists, ( see: http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx ) ;-) > > Hopefully you get it more robust than Slony when making changes under > > high load :) > > Hmm, I don't know about lack of robustness in Slony, so I don't know. Slony is brittle once you start using it under high load and tends to display all kinds of frustrating qualities 1) it has not enough controls put in for conf changes to guarantee either success or clean rollback, do if something goes wrong (like some conf change has not propagated to all nodes, in right order, you end up with no working replication. 2) you usually can't test for 1) on your test setup, as it happens only under really high loads, which most test setups don't provide. there are/were other warts (like forcing an index scan covering the whole table, or being unable to continue replication after some slonik downtime because postgreSQL would give query too complex errors on generated 700kb lobg query), some of which are fixed in 1.x, some are maybe fixed in 2.0. I was a heavy user (at Skype) at some point and have helped in fixing some. But in the end we could not figure out how to make it robust and extracted the good stuff for pgQ and wrote our own replication based on that, which we could make perform and be robust when changing conf. > > Will there be an helper application for setting up and configuring > > changes in replication. or will it all be done using added SQL > > commands ? > > Well, the interface I work on is all SQL commands :-) > > > How will DDL be handled ( i understood that you don't yet have DDL > > replication) ? > > We don't have it yet. However, since we can just add any code in any > place we like, and that we have a protocol to transmit changes, it is > relatively easy to add calls to collect the needed information and > replay it on the slave. Do you transmit changes to and apply changes on slave as binary or as SQL statements ? Do slaves also have to be modified just to receive changes ? I think the hairy part will be getting the order of commands _exactly_ right (like Hot Standby again), but if you are similar to Slony/pgQ/Londiste in that you just transfer logical changes, not physical page-level changes, then the DDL locking on master may be enough to guarantee the right order. That is assuming that you already can guarantee right (commit "time") order on slaves. this is not the same as transaction start order, which may give wrong/inconsistent data states. > > Will Slave tables be kind-of-read-only like Slony slaves ? Or even > > _really_ read only like Simon's Hot Standby ? > > Heh -- they are read only, and they turn into read-write when the > slave > promotes. I'm not sure what kind does that make it :-) This seems similar to Hot Standby. Slony enforces write-only using triggers and it can be circumvented by telling these triggers that you are a sloni replication process yourself. -- ------------------------------------------ Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: > Alvaro Herrera wrote: > > Hannu Krosing wrote: > >> On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: > > >> Will there be an helper application for setting up and configuring > >> changes in replication. or will it all be done using added SQL > >> commands ? > > > > Well, the interface I work on is all SQL commands :-) > > > > Case in point. To replicate a table currently you do this: > > ALTER TABLE foo ENABLE REPLICATION; > ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0; How do you define SLAVE 0 ? That is, do you use something similar to SQL/MED , discussed currently on other thread on pgsql-hackers, or just tables, or external conf files, or ... ? > >>> Hmm I am not sure. We are pretty deep into the core and only use > >>> triggers for GRANT/REVOKE/CREATE ROLE . > >> By the way, why did you choose "pretty deep into the core" approach > >> instead of triggers ? > > > > Speed maybe? I don't know. > > Well actually that was my decision years ago and it was made for two > reasons: > > 1. Speed Is this some MySQL kind of speed, achieved by compromising on ACID ? > 2. Lots of people think trigger replication is a hack. (not interested > in arguing just stating an observation). And making the illogical conclusion that non-trigger replication is not a hack ? Probably the same people who think that "expensive" and "closed source" automatically means high quality :) Actually I see trigger based replication as a power tool. You can make it (almost) as fast as any source level hack, even if the latter compromises on on ACID, plus you can be much more flexible on what/when/how you replicate. > One of the core requirements of the original replicator which can still > be seen today AND is a requirement of 1.9 as well is: > > Though shall not cause grief to thy master even if thy replicates many > children. IOW, one slave and many slaves cause similar load on master. And you do it by initially replicating to a single "distributor" slave ? > That is where the original idea of the MCP came from. They theory was, > we could have 50 slaves and the master wouldn't care. A good goal. But why would anybody _need_ 50 slaves ? Perhaps some application with small data amount (meaning it fits on one server) and massively (100:1) skewed towards reading, like dynamically generated mostly-static-content web pages ? > >> I mean, you probably end up duplicating (or missing) lots of > >> postgreSQL-s internal goodness instead of just using what is already > >> available ? > > > > Well yes and no. We have made more and more use of the internal > postgresql code through 1.7, 1.8. I expect that trend will continue. A > hairy wart would be the use of own log, but hey its BSD if someone can > figure out how to make it work with WAL and not lose the feature set we > have Why not just make a new kind of wal-logged heap, which stores your version of trx log ? > I would not vote against it and would happily except the patch > (assuming technical sign off by Alvaro and Alexey). > > >> I hope something useful will come out of this too, though I hoped that > >> it already had some advantages over trigger-based replication, like > > Well it does. We can replicate large objects for example I guess we get more benefit to general public in long run by making it possible to have triggers on large objects (and system objects or DLL statements). They could be some kind of restricted or "system" triggers, at least for DDL, but exposing the mechanisms to power users without them having to start patching backend code would be a good thing. > and our user > experience is far more friendly than anything else. Do you mean that you have a friendly UI for beginners ? Or is it proved to be robust in continued use, with changes going smoothly in and automatic recovery from network outages and random reboots on live systems in 24/7 environment ? > >> ability to replicate DDL . > > > > That is what 1.9 is all about. Remember that the "hope" (no laughing > now) is that 1.9 will hit for 8.3 and 8.4 around the time 8.4 releases. > So this isn't a year away. > > > I fear that our approach to replication is so ad-hoc that there's not > > much to be gained from elsewhere. Replicator is pretty much a fork > > that's not likely to yield anything useful to upstream. > > That is probably true. If/when you get DDL replication done (and tested, and robust, ... :) ), it at least solves the following two problems for others as well 1) where to get at DDL statements at right time 2) how to put them in replication stream in right place/ right order Having these solved is a big thing in itself, even if no actual code reuse is possible. It may be easier to get nr 2) right in a more flexible trigger-based environment, so I try to talk Marko into looking at it for pgQ as well ;) -- ------------------------------------------ Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Wed, Oct 29, 2008 at 12:02:20PM +0200, Hannu Krosing wrote: > A good goal. But why would anybody _need_ 50 slaves ? They might have a contractual responsibility for extremely wide geographic distribution. Or they might be building an application that needs extremely wide network-topological distribution to avoid large loads on any one network. For instance, I can imagine building a network of nameservers in which you peered the nameservers, colocated in every ISP you could think of. If you were backing the nameserver with Postgres, this would work. To be clear, this is _not_ the case with any product I've ever built, but it is a design I have seen deployed. That design was supposed to be on top of Oracle. There were well over 50 slaves. I don't really believe they had that many Oracle-using slaves, though. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote: > On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: > > Alvaro Herrera wrote: > > Case in point. To replicate a table currently you do this: > > > > ALTER TABLE foo ENABLE REPLICATION; > > ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0; > > How do you define SLAVE 0 ? SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think. ( I would have to check the docs) > > Well actually that was my decision years ago and it was made for two > > reasons: > > > > 1. Speed > > Is this some MySQL kind of speed, achieved by compromising on ACID ? > No this is the kind that realizes that throwing triggers on 300 relations is a slow way to go about replication. > Actually I see trigger based replication as a power tool. You can make it (almost) > as fast as any source level hack, even if the latter compromises on on ACID, plus > you can be much more flexible on what/when/how you replicate. > Like I said, really not interested in the trigger versus not discussion. > One of the core requirements of the original replicator which can still > > be seen today AND is a requirement of 1.9 as well is: > > > > Though shall not cause grief to thy master even if thy replicates many > > children. > > IOW, one slave and many slaves cause similar load on master. One slave doesn't effect the load on the master any more than fifty. > > And you do it by initially replicating to a single "distributor" slave ? > Essentially. > > That is where the original idea of the MCP came from. They theory was, > > we could have 50 slaves and the master wouldn't care. > > A good goal. But why would anybody _need_ 50 slaves ? > > Perhaps some application with small data amount (meaning it fits on one server) > and massively (100:1) skewed towards reading, like dynamically generated > mostly-static-content web pages ? Well it was just a number. It could be any number but yes your example would fit. > Why not just make a new kind of wal-logged heap, which stores your version of trx log ? > That would be a question for Alvaro or Alexey not I :) > > and our user > > experience is far more friendly than anything else. > > Do you mean that you have a friendly UI for beginners ? > Not just beginners. People don't like complicated software. Generally speaking, if you go to a DBA of other systems, someone with 10 years experience and you show them Slony and they look at you like your nuts. It doesn't matter that Slony works great. It doesn't matter that it is proven in the field. > Or is it proved to be robust in continued use, with changes going smoothly > in and automatic recovery from network outages and random reboots on live > systems in 24/7 environment ? > Well we have had plenty of people use it although I admit over the last year we have been weening them off in preparation for 1.8. 1.8 is a big step up from previous releases and yes it has some issues all software does but we are working on them. Sincerely, Joshua D. Drake --
On Wed, 2008-10-29 at 09:54 +0200, Hannu Krosing wrote: > On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote: > > Hannu Krosing wrote: > > > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: > it was in subsection "mcp_server mysteriously dies" > http://www.commandprompt.com/products/mammothreplicator/tips , > Ehh, need to remove that. That is very old. Joshua D. Drake --
On Wed, 2008-10-29 at 09:01 -0700, Joshua D. Drake wrote: > On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote: > > On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: > > > Alvaro Herrera wrote: > > > Case in point. To replicate a table currently you do this: > > > > > > ALTER TABLE foo ENABLE REPLICATION; > > > ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0; > > > > How do you define SLAVE 0 ? > > SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think. > ( I would have to check the docs) And the only other GUC slave needs for replication is MCP/Slave connect info ? > > > Well actually that was my decision years ago and it was made for two > > > reasons: > > > > > > 1. Speed > > > > Is this some MySQL kind of speed, achieved by compromising on ACID ? > > > > No this is the kind that realizes that throwing triggers on 300 > relations is a slow way to go about replication. It may be slow-ish to set up initially, but in your (old) tips wikipage you say that replicating more than 1000 relations would be prohibitively slow. I don't think this would be true for trigger-based replication. And FK checks implemented using with triggers, and I have not heard much complaining about it being a hack, or unsuitable upwards of some N-hundred tables. > > Actually I see trigger based replication as a power tool. You can make it (almost) > > as fast as any source level hack, even if the latter compromises on on ACID, plus > > you can be much more flexible on what/when/how you replicate. > > Like I said, really not interested in the trigger versus not discussion. Why not ? Because you are actually doing it with triggers, just not standard postgreSQL table-level triggers, but your own ones patched into backend ? When doing it without triggers (or to be more exact, with triggers hardwired into backend) you still have to solve most of the same problems that trigger-based replication does, just with less visibility and less help from existing infrastructure. One more question about triggers - what happens to triggers, foreign keys and other constraints on slave ? > > One of the core requirements of the original replicator which can still > > > be seen today AND is a requirement of 1.9 as well is: > > > > > > Though shall not cause grief to thy master even if thy replicates many > > > children. > > > > IOW, one slave and many slaves cause similar load on master. > > One slave doesn't effect the load on the master any more than fifty. But have you measured, what is impact of replication when going from 0 to 1 slaves ? ... > > > and our user > > > experience is far more friendly than anything else. > > > > Do you mean that you have a friendly UI for beginners ? > > > > Not just beginners. People don't like complicated software. Generally > speaking, if you go to a DBA of other systems, someone with 10 years > experience and you show them Slony and they look at you like your nuts. I wonder why. To me Slony's UI (slonik) seems quite simplistic, the only thing they have above what Replicator seems to have is grouping tables so that tables inside one group are always guaranteed to be in a consistent state on the slave. Otherways I can't see how ------------------ create set (id=1, origin=1, comment="Master"); set add table (set id=1, origin=1, id=1, fully qualified name = 'public.accounts'); --- store node (id=2, comment = 'Slave node'); store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER'); -- subscribe set ( id = 1, provider = 1, receiver = 2, ------------------ can be made much simpler You can't replicate a table on Replicator by just one command: ALTER TABLE public.accounts ENABLE REPLICATION; ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2; You first have to set things up by telling master and slave about themselves ("Thou Art the Master") and about each other. > It doesn't matter that Slony works great. It doesn't matter that it is > proven in the field. Actually slony works great for simple cases, and not-so-great for more complex topologies or higher loads. It is relatively easy to push Slony over by accident. And it was Slony's apparent _simplicity_ that fooled me into believing that it is a robust system, similar to postgreSQL proper. A belief, the consequences of which caused me a lot of grief and frustration for about two years > > Or is it proved to be robust in continued use, with changes going smoothly > > in and automatic recovery from network outages and random reboots on live > > systems in 24/7 environment ? > > > > Well we have had plenty of people use it although I admit over the last > year we have been weening them off in preparation for 1.8. 1.8 is a big > step up from previous releases and yes it has some issues all software > does but we are working on them. Is Replicator always replicating from one single master to one or more slaves ? Or is it possible to set it up so, that some tables are replicated from one and some others from some other master. I don't mean real multi-master, just that the single master is not the same for all tables - A good example would be a sales system where each office has its own sales table for inserting/updating data, but has a read-only fresh copy of all other offices data via replication . -- ------------------------------------------ Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Wed, 2008-10-29 at 19:15 +0200, Hannu Krosing wrote: > > > SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think. > > ( I would have to check the docs) > > And the only other GUC slave needs for replication is MCP/Slave connect > info ? https://projects.commandprompt.com/public/replicator/wiki/Documentation > > > > > Well actually that was my decision years ago and it was made for two > > > > reasons: > > > > > > > > 1. Speed > > > > > > Is this some MySQL kind of speed, achieved by compromising on ACID ? > > > > > > > No this is the kind that realizes that throwing triggers on 300 > > relations is a slow way to go about replication. > > It may be slow-ish to set up initially, but in your (old) tips wikipage > you say that replicating more than 1000 relations would be prohibitively > slow. > Depends on the workload I would assume. > > > Actually I see trigger based replication as a power tool. You can make it (almost) > > > as fast as any source level hack, even if the latter compromises on on ACID, plus > > > you can be much more flexible on what/when/how you replicate. > > > > Like I said, really not interested in the trigger versus not discussion. > > Why not ? Because you are actually doing it with triggers, just not > standard postgreSQL table-level triggers, but your own ones patched into > backend ? Because it isn't productive. I am happy to answer all questions (that I can) about usage, implementation etc... but hashing through decisions that were made 6 years ago isn't helpful, imo. > One more question about triggers - what happens to triggers, foreign > keys and other constraints on slave ? Alvaro? > > > One slave doesn't effect the load on the master any more than fifty. > > But have you measured, what is impact of replication when going from 0 > to 1 slaves ? Not anytime recently no. > > You can't replicate a table on Replicator by just one command: Actually you can. We provide stored procedures you can use if you like. > > ALTER TABLE public.accounts ENABLE REPLICATION; > ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2; > > You first have to set things up by telling master and slave about > themselves ("Thou Art the Master") and about each other. Actually you have to tell the MCP but yes there is configuration to be had. It takes about 15 minutes from start to finish for a current master,mcp,two slave scenario. > Is Replicator always replicating from one single master to one or more > slaves ? Yes. > > Or is it possible to set it up so, that some tables are replicated from > one and some others from some other master. We do not support cascading like that but we do support partial replication. E.g; ALTER TABLE foo ENABLE REPLICATION; ALTER TABLE foo ENABLE REPLICATION ON SLAVE 1; ALTER TABLE BAR ENABLE REPLICATION; ALTER TABLE BAR ENABLE REPLICATION ON SLAVE 2; > > I don't mean real multi-master, just that the single master is not the > same for all tables - A good example would be a sales system where each > office has its own sales table for inserting/updating data, but has a > read-only fresh copy of all other offices data via replication . > Yes this you could do. What you can't do is this: MASTER->[SLAVE0|MASTER]->SLAVE1 Sincerely, Joshua D. Drake --
Hannu Krosing wrote: > On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote: > > If you mean that we're duplicating the effort that's already going > > elsewhere, my opinion is yes, we are. > > duplicating the effort is not always a bad thing. I was mostly > suggesting to watch discussions and dig around in materials and/or > asking people who have been working on these same issues. > > And of course to _think_ deeply about design before writing lots of > duplicate code which ends up being an often inferior implementation of > something that already exists, ( see: > http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx ) > ;-) Are you asking whether the Replicator implementation contains gloves' replacement? Yes, we are. In fact, a lot of the time I've spent on Replicator code has been removing those (which were written by the original Replicator developers) and just using already existing gloves. For example we removed dealing TCP sockets directly; we now use libpq, to which we only had to add a single support function. > I was a heavy user (at Skype) at some point and have helped in fixing > some. But in the end we could not figure out how to make it robust and > extracted the good stuff for pgQ and wrote our own replication based on > that, which we could make perform and be robust when changing conf. A lot of the implementation effort has gone into the queue implementation. Personally I am not happy with it yet. Perhaps we should take a look at pgQ and remove our own implementation ... > Do you transmit changes to and apply changes on slave as binary or as > SQL statements ? We have a binary protocol. We haven't worked out the details yet but it certainly won't be SQL statements. > Do slaves also have to be modified just to receive changes ? Modify in what form? Obviously they must be capable of decoding the binary protocol ... > I think the hairy part will be getting the order of commands _exactly_ > right (like Hot Standby again), but if you are similar to > Slony/pgQ/Londiste in that you just transfer logical changes, not > physical page-level changes, then the DDL locking on master may be > enough to guarantee the right order. That is assuming that you already > can guarantee right (commit "time") order on slaves. this is not the > same as transaction start order, which may give wrong/inconsistent data > states. Yeah, we transfer logical changes, not physical. We enqueue transaction data at commit time (in commit order). I think being able to modify the Postgres commit source code was the only reason we could do it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support