Thread: Integrating Replication into Core
Hi, [ moving to -hackers, that seems more appropriate. ] Jeff Davis wrote: > If there is some great replication solution that a lot of people need > and it will only work with a change to core, that change might make it > in. That's what I'm saying. Although it's hypothetical. > However, there may not be nifty syntax changes nor GUCs in core to > support a specific implementation of a replicator. I'd love to get into that one. Some of the people who have attended my talk at the summit might know that I've introduced the following syntax to Postgres-R: ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs; And I'm using the system catalogs to store replication settings. What's so wrong with that? Joshua D. Drake wrote:> There is definitely another reason though :). Adding a replication> solution that is integrated *will*increase development overhead in> terms of support. Sure. It's an additional feature after all. Refusing to add stuff to core because it increases development overhead certainly is a dead end. > Replication touches (alot) of places. Yes, that's exactly why I'm going the integrated way with Postgres-R. :-) Regards Markus
Markus Schiltknecht wrote: > >However, there may not be nifty syntax changes nor GUCs in core to > >support a specific implementation of a replicator. > > I'd love to get into that one. Some of the people who have attended my > talk at the summit might know that I've introduced the following syntax > to Postgres-R: > > ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs; > > And I'm using the system catalogs to store replication settings. What's > so wrong with that? I don't know if there's anything wrong, but in Mammoth Replicator, the syntax to enable replication of a single table is ALTER TABLE foo ENABLE REPLICATION and we store the replication settings in system catalogs as well. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, Alvaro Herrera wrote: > I don't know if there's anything wrong, but in Mammoth Replicator, the > syntax to enable replication of a single table is > > ALTER TABLE foo ENABLE REPLICATION > > and we store the replication settings in system catalogs as well. Oh, that's nice to know. Regards Markus
Alvaro Herrera wrote: > Markus Schiltknecht wrote: > > >>> However, there may not be nifty syntax changes nor GUCs in core to >>> support a specific implementation of a replicator. >>> >> I'd love to get into that one. Some of the people who have attended my >> talk at the summit might know that I've introduced the following syntax >> to Postgres-R: >> >> ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs; >> >> And I'm using the system catalogs to store replication settings. What's >> so wrong with that? >> > > I don't know if there's anything wrong, but in Mammoth Replicator, the > syntax to enable replication of a single table is > > ALTER TABLE foo ENABLE REPLICATION > > and we store the replication settings in system catalogs as well. > > Wasn't there supposed to be some discussion among replication authors to try to come up with at least some common hooks? If everybody invents their own grammar, GUC vars, etc. etc. it will be impossible to handle down the track. We'd be faced with a choice of never having any replication in core, or picking one and leaving the others out in the cold. This is supposed to be a *community*. cheers andrew
On 11/22/06, Andrew Dunstan <andrew@dunslane.net> wrote: > Wasn't there supposed to be some discussion among replication authors to > try to come up with at least some common hooks? That was my understanding as well. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
> Wasn't there supposed to be some discussion among replication authors to > try to come up with at least some common hooks? Well yes, but as far as I know that never happen, and we have been implementing the new version with the above syntax for a year and our GUC variables have been around for over 4 years. > > If everybody invents their own grammar, GUC vars, etc. etc. it will be > impossible to handle down the track. We'd be faced with a choice of > never having any replication in core, or picking one and leaving the > others out in the cold. This is supposed to be a *community*. Agreed. Joshua D. Drake > > cheers > > andrew > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Wed, 2006-11-22 at 19:23 +0100, Markus Schiltknecht wrote: > Hi, > > [ moving to -hackers, that seems more appropriate. ] > > Jeff Davis wrote: > > If there is some great replication solution that a lot of people need > > and it will only work with a change to core, that change might make it > > in. > > That's what I'm saying. Although it's hypothetical. > > > However, there may not be nifty syntax changes nor GUCs in core to > > support a specific implementation of a replicator. > > I'd love to get into that one. Some of the people who have attended my > talk at the summit might know that I've introduced the following syntax > to Postgres-R: > > ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs; > > And I'm using the system catalogs to store replication settings. What's > so wrong with that? > Nothing's wrong with that approach. My prediction, however, is that: (1) Similar replication solutions will first agree on some common hooks they need in the backend that may have no actual SQL syntax associated, and get patches in (2) then agree on some implementations details (3) then agree on the syntax To talk about getting syntax in the backend now seems like putting the cart before the horse, to me anyway. But there's nothing wrong with having SQL syntax for the replication. Regards,Jeff Davis
Hi, Andrew Dunstan wrote: > Wasn't there supposed to be some discussion among replication authors to > try to come up with at least some common hooks? Yes, Andrew Sullivan even opened a PgFoundry project and a mailing list. But up to now, only the GORDA project has proposed some hooks. For Postgres-R, I definitely don't want to settle for any hooks, yet, because I want to keep flexible. Hooks would only get into my way and serve no purpose. > If everybody invents their own grammar, GUC vars, etc. etc. it will be > impossible to handle down the track. Why is that? I can very well change all of the configuration stuff, I just don't see no use for that. > We'd be faced with a choice of > never having any replication in core, or picking one and leaving the > others out in the cold. ...or wait for *the one* superior set of hooks we never can come up with? Remember that the problem in replication is not interfacing with the database. That can and has been solved in multiple different ways. And interfaces can change (especially as long as they are still part of experimental software). Regards Markus
Hi, Joshua D. Drake wrote: > Well yes, but as far as I know that never happen, and we have been > implementing the new version with the above syntax for a year and our > GUC variables have been around for over 4 years. Sorry, new version of what? what GUC variables? Regards Markus
On Wed, 2006-11-22 at 19:23 +0100, Markus Schiltknecht wrote: > Jeff Davis wrote: > > If there is some great replication solution that a lot of people need > > and it will only work with a change to core, that change might make it > > in. > > That's what I'm saying. Although it's hypothetical. My interest is in extending Warm Standby [8.2] to include the following forms of replication: 1. asynchronous WAL-record level transfer to Standby server 2. synchronous WAL-record level transfer to Standby server My foresight includes that this would likely require some improvements in Group Commit, but I've not done the design for this *yet*. I would also like to include some performance optimisations into Core that are specifically aimed at improving Slony performance. (I'm more than happy if those things also increase performance of other situations). That's slightly different thing to embedding Slony in Core, which I am *not* suggesting. Suggestions welcome. This will then give PostgreSQL: - improved performance for the most popular production replication system for PostgreSQL (Slony) - a capability for Synchronous Replication, when it is requested That's the limit of my ambitions for 8.3. Personally, I won't be investing time in multi-master solutions for a host of reasons; please just regard that as a personal time allocation decision rather than a suggestion to prevent others from doing so. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Wed, 2006-11-22 at 20:23 +0100, Markus Schiltknecht wrote: > Hi, > > Joshua D. Drake wrote: > > Well yes, but as far as I know that never happen, and we have been > > implementing the new version with the above syntax for a year and our > > GUC variables have been around for over 4 years. > > Sorry, new version of what? what GUC variables? Our new version of replicator (1.7) has been in development for a year and that is the version that supports ALTER TABLE. The GUC variables we use have mostly been static for years. 1.7 has some clean up etc.. Sincerely, Joshua D. Drake > > Regards > > Markus > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Hi, Jeff Davis wrote: > Nothing's wrong with that approach. My prediction, however, is that: > > (1) Similar replication solutions will first agree on some common hooks > they need in the backend that may have no actual SQL syntax associated, > and get patches in Well, before that, you need to know what hooks you need. And that again involves lots of implementation details. Thus better first implement without hooks, otherwise you might later notice that there is something you didn't think of. > (2) then agree on some implementations details > (3) then agree on the syntax > > To talk about getting syntax in the backend now seems like putting the > cart before the horse, to me anyway. That was just an example. Postgres-R actually already does a lot behind the scenes if you type that command. So, yes, the horse definitely came before the cart. > But there's nothing wrong with > having SQL syntax for the replication. Okay. After reading the tsearch2 discussion I got another feeling, but that might just have been me. Regards Markus
Hi, Joshua D. Drake wrote: >> Joshua D. Drake wrote: >>> Well yes, but as far as I know that never happen, and we have been >>> implementing the new version with the above syntax for a year and our >>> GUC variables have been around for over 4 years. >> Sorry, new version of what? what GUC variables? > > Our new version of replicator (1.7) has been in development for a year > and that is the version that supports ALTER TABLE. > > The GUC variables we use have mostly been static for years. 1.7 has some > clean up etc.. Aha. Well, could you name the places where you'd need hooks? Would you like to use hooks? What purpose would that serve you? Regards Markus
On Wed, 2006-11-22 at 20:35 +0100, Markus Schiltknecht wrote: > Hi, > > Joshua D. Drake wrote: > >> Joshua D. Drake wrote: > >>> Well yes, but as far as I know that never happen, and we have been > >>> implementing the new version with the above syntax for a year and our > >>> GUC variables have been around for over 4 years. > >> Sorry, new version of what? what GUC variables? > > > > Our new version of replicator (1.7) has been in development for a year > > and that is the version that supports ALTER TABLE. > > > > The GUC variables we use have mostly been static for years. 1.7 has some > > clean up etc.. > > Aha. Well, could you name the places where you'd need hooks? Would you > like to use hooks? What purpose would that serve you? I would be the wrong person to ask however, I can say that I don't see a need for the hooks. If we somehow (the community) created some reasonable generic interface, we would likely make use of it but other then that, I am happy with how we are doing it. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Markus Schiltknecht wrote: > >> But there's nothing wrong with >> having SQL syntax for the replication. > > Okay. After reading the tsearch2 discussion I got another feeling, but > that might just have been me. > The objection then was that about 8 or 9 new commands were proposed, and that a functional interface might be just as good. What sort of grammar support do you want? cheers andrew
Hi, Andrew Dunstan wrote: > What sort of grammar support do you want? Support? I would have just extended the bison gram.y myself. :-) I don't yet know what I will need. I'll probably have to add settings per database, some per table, others per transaction. I thought about some additions to existing ALTER DATABASE and ALTER TABLE commands as well as some SET variables, probably within the syntax of SET TRANSACTION... Stuffing them into such a syntax seems more consistent to me than using function calls. Regards Markus
On Wed, 2006-11-22 at 20:31 +0100, Markus Schiltknecht wrote: > Hi, > > Jeff Davis wrote: > > Nothing's wrong with that approach. My prediction, however, is that: > > > > (1) Similar replication solutions will first agree on some common hooks > > they need in the backend that may have no actual SQL syntax associated, > > and get patches in > > Well, before that, you need to know what hooks you need. And that again > involves lots of implementation details. Thus better first implement > without hooks, otherwise you might later notice that there is something > you didn't think of. I think you misunderstand my point. I was talking about replication implementations that already exist. They already have patches on the backend that are necessary for their solution to work. The idea is to design a single set of hooks that can be used to implement an entire class of replication. This only makes sense after existing solutions come to some agreement. I view that as a first step, assuming that it is necessary to alter the core in order to implement the class of replication in question. Once that step is complete, ideally you'd be able to implement Postgres- R without having to patch the postgresql backend to accomplish it (except for maybe adding the syntax for your solution). Then, when a syntax is agreed upon, you won't need to patch the backend at all. Isn't that the goal, to be able to implement your replication without patching the backend? Regards,Jeff Davis
Andrew Dunstan wrote: > Wasn't there supposed to be some discussion among replication authors to > try to come up with at least some common hooks? > > If everybody invents their own grammar, GUC vars, etc. etc. it will be > impossible to handle down the track. We'd be faced with a choice of > never having any replication in core, or picking one and leaving the > others out in the cold. This is supposed to be a *community*. I don't have the expectation that Mammoth Replicator will ever be open-sourced (this is my personal opinion; the company owner may differ). And even if it were, I doubt it would serve as a basis for whatever community effort to build a replication engine. I don't think it's in anybody's best interest to base design decisions on Mammoth Replicator "experience". The projects that are already open source are in a much better standing for that (GORDA, Postgres-R, Slony, etc). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Hi, [ copying back to -hackers. ] Andrew Dunstan wrote: > You have totally misunderstood me. I mean, what sort of grammar changes > do you need. Oh, and *you* might submit changes, but *we* have to > support them Agreed, as long as *we* includes me. I'm not reading it like that, but I don't know how you meant it. > if they go into the core. Sure, if... I think the group of developers working on PostgreSQL can be extended, by accepting patches, in the hope that the original authors keep supporting it - especially because it's easy to revert patches again. Not accepting extensions because the current group thinks they can't support it won't help in attracting more developers and enlarging that group. I'm taking Postgres-R along since 7.4, for example. You can be very sure that I won't drop it because it got into core. (Bad example, because this is not going to happen anytime soon, if at all.) Regards Markus
Hi, Jeff Davis wrote: > I think you misunderstand my point. That may well be. Please keep in mind that I'm not a native English speaker, thus please speak loud and clear ;-) > I was talking about replication > implementations that already exist. They already have patches on the > backend that are necessary for their solution to work. Do they? I'm only aware of the GORDA patch. The old Postgres-R patches are out of date. Sequoia, PgPool and PgPool-II obviously do not need patches. Slony-II, Postgres-R (8) (mine) as well as PGCluster-II are not open sourced, yet. And I haven't heard much regarding hooks from any of the proprietary vendors (except Joshua's recent statement that he's happy without such hooks). > The idea is to design a single set of hooks that can be used to > implement an entire class of replication. This only makes sense after > existing solutions come to some agreement. I view that as a first step, > assuming that it is necessary to alter the core in order to implement > the class of replication in question. As there's not even *one* existing and open replication solution which needs patching the backend, you are basing your statements on a false premise. Thus, speaking of hooks as a "first step" is very confusing, at least. > Once that step is complete, ideally you'd be able to implement Postgres- > R without having to patch the postgresql backend to accomplish it > (except for maybe adding the syntax for your solution). Then, when a > syntax is agreed upon, you won't need to patch the backend at all. Isn't > that the goal, to be able to implement your replication without patching > the backend? No, it's not. What would that buy me? My goal is to write a widely usable replication system. How that interacts with the backend is of much less importance to me. And currently fiddling with the backend is much easier than maintaining hooks and keep all the replication stuff separate. Postgres-R can be one of the solutions used to decide what hooks we need. Waiting for hooks to establish before implementing Postgres-R would be what you call 'putting the cart before the horse'. Regards Markus
On Wednesday 22 November 2006 7:21 pm, Markus Schiltknecht wrote: > > Yes, Andrew Sullivan even opened a PgFoundry project and a mailing list. > But up to now, only the GORDA project has proposed some hooks. > > For Postgres-R, I definitely don't want to settle for any hooks, yet, > because I want to keep flexible. Hooks would only get into my way and > serve no purpose. > > > If everybody invents their own grammar, GUC vars, etc. etc. it will be > > impossible to handle down the track. > > Why is that? I can very well change all of the configuration stuff, I > just don't see no use for that. Indeed, we in GORDA have also came up with yet another set of changes to grammar and GUC variables. This is not the ideal scenario. :-( I understand that different people have different motives not to agree with the GORDA-style hook based approach. Therefore, I suggest that we try to agree on small but sure steps. The worst outcome of this would be that we all end up with smaller patches to maintain... The configuration stuff seems to be a good place to start. What about each of us summarizing their changes to grammar and GUC to the hooks list to get the discussion started on a solid ground? BTW, we have released a new version of the GORDA platform. This version has been completly rewritten (reusing some code from PL-J) and has a lot more functionality. The annoucement will follow shortly. -- Jose Orlando Pereira
Hello Jose, José Orlando Pereira wrote: > Indeed, we in GORDA have also came up with yet another set of changes to > grammar and GUC variables. This is not the ideal scenario. :-( > > I understand that different people have different motives not to agree with > the GORDA-style hook based approach. Therefore, I suggest that we try to > agree on small but sure steps. I appreciate your efforts to come up with hooks. But as I've already stated, I'm not ready to settle down for concrete hooks for Postgres-R (8), so I probably can't help. > The worst outcome of this would be that we all > end up with smaller patches to maintain... Do you really maintain patches? I'm maintaining a source tree and I'd like to keep it that way, as of now. I'd better like to work together in other areas, for example, what do you use for testing? I've read that the Sequoia people use their home-grown (and closed source) test suite. I'm about to write the third generation of my own test suite... For simulations, I'm using qemu, sometimes also trying Xen, but that does not run on my laptop. :-( Perhaps we can share test suites, or even automated benchmarks? IMO, we would gain a whole lot more with that than with hooks. Regards Markus
On Thursday 23 November 2006 11:46 am, Markus Schiltknecht wrote: > I appreciate your efforts to come up with hooks. Thank you. :-) > But as I've already > stated, I'm not ready to settle down for concrete hooks for Postgres-R > (8), so I probably can't help. Sure, I know that you don't like hooks. I just suggested that we should compare *interfaces* to configure replication (i.e. variable names, grammar, etc), since it looks like we have a bunch of different syntaxes to achieve the same. It might turn out that there is no common ground, but it is worth trying it. > I'd better like to work together in other areas, for example, what do > you use for testing? I've read that the Sequoia people use their > home-grown (and closed source) test suite. I'm about to write the third > generation of my own test suite... It is somewhat difficult to share a test-suite if we have to maintain multiple versions of the code that sets up the replicated db. See the point? ;) > > The worst outcome of this would be that we all > > end up with smaller patches to maintain... > > Do you really maintain patches? I'm maintaining a source tree and I'd > like to keep it that way, as of now. We do maintain a patch, as you do, unless you have forked from mainline for good. Using a good revision control system helps (we use Cannonical's Bazaar, BTW), but does not fundamentally change the problem. The smaller the diff, the better. -- Jose Orlando Pereira
> The idea is to design a single set of hooks that can be used to > implement an entire class of replication. This only makes sense after > existing solutions come to some agreement. I view that as a first step, > assuming that it is necessary to alter the core in order to implement > the class of replication in question. > > Once that step is complete, ideally you'd be able to implement Postgres- > R without having to patch the postgresql backend to accomplish it > (except for maybe adding the syntax for your solution). Then, when a > syntax is agreed upon, you won't need to patch the backend at all. Isn't > that the goal, to be able to implement your replication without patching > the backend? We should go in that direction. In a database life cycle, there are different events that may be useful for different replication solutions. For instance, we may say:- database startup and shutdown- connection startup and shutdown-transaction begin, commit, rollback- statement request- updates (i.e., insert, delete, update)- logging First, we should agree on which events we need to support a set of replication protocols (e.g., gorda, postgres-r, slony-i and ii, etc). Then, we should decide how such events will be notified. In particular, the gorda project decided to use "special triggers" but any sort of callback would be great for us. We adopted these hooks because we thought that it would be useful to different applications (e.g, materialized views). Third we should discuss what interface would be provided to inject information into remote replicas. Is the SPI_* interface good ? How to inject binary data into tables ? I know that PostgreSQL allows to do that. But is the interface provided enough ? Would not be interesting to inject things directly into log ? Fourth, we should have a discussion on locks, high priority transactions, notifications on blocking, etc... And finally, we may be able to discuss meta information, syntax, etc... Regards, Alfranio Junior.
Hi, [ I suggest to move from hackers to replica-hooks-discuss@pgfoundry.org, as that's what that list has been created for.] José Orlando Pereira wrote: > Sure, I know that you don't like hooks. Yes, but that's yet another story. ;-) > I just suggested that we should compare *interfaces* to configure replication > (i.e. variable names, grammar, etc), since it looks like we have a bunch of > different syntaxes to achieve the same. The same? Let's see. I currently have these additional commands: ALTER DATABASE testdb START REPLICATION IN GROUP testgroup USING egcs; and ALTER DATABASE testdb ACCEPT REPLICATION FROM GROUP testgroup USING egcs; I've added a system table pg_replication_gcs to describe the different group communication systems and connections to them: Table "pg_catalog.pg_replication_gcs" Column | Type | Modifiers ----------+---------+----------- rgcsname | name | not null rgcstype | integer | not null rgcsport | integer | not nullrgcssock | text | (Splitting into rgcsport and rgcssock prooved to be not very helpful.) And I've added two fields to pg_database to define the GCS and the group in which to replicate a database: .. datreplgcs | oid | not null .. datreplgrp | text | But as I said: these might change any time. And I certainly will have to add others, but no idea what those additions will look like. When comparing to the Mammoth Replicator syntax that Alvaro posted, this seems very different. PGCluster-II does not use a GCS at all. And I haven't seen others. > It is somewhat difficult to share a test-suite if we have to maintain multiple > versions of the code that sets up the replicated db. Well, we wouldn't have to share test cases, but at least the *suite*. All the code which starts and stops postmasters, does initdb etc.. Probably that's just me, but I'm not aware of any (OSS) project which can emulate a network (or even a GCS), start and stop processes as requested and check how they react upon different inputs. If you know such a thing, please email me! (I've looked at STAF, but that seems overly complex and targeted at completely different use-case.) > See the point? ;) Sure, but it's wishful thinking. > We do maintain a patch, as you do, unless you have forked from mainline for > good. Using a good revision control system helps (we use Cannonical's Bazaar, > BTW), but does not fundamentally change the problem. I'm using monotone. And I don't need much time to fiddle with patches. A simple 'mtn diff -r ${TRUNK_REVISION}' does all I need. That's why I'd still say that I don't maintain a patch. > The smaller the diff, the better. I disagree. Where exactly does size of the patch matter for you? The number you mean, which is important, is the number of points in the code where you need to interact with the database, i.e. the number of hooks you would need. Because as PostgreSQL moves along, changes at these points are probably necessary. But that number certainly has nothing to do with the patch size. Regards Markus
Hi Markus, Le jeudi 23 novembre 2006 12:46, Markus Schiltknecht a écrit : > For simulations, I'm using qemu, sometimes also trying Xen, but that > does not run on my laptop. :-( So you still only have your laptop as 'development facility' ? At dalibo we have a couple of machines we're not using anymore, partly because we don't have a need for them nowadays, mainly because it's end-of-life hardware, not that trusty. It's some bi pentium III, 2*700MHz, 1Go RAM, 2 ide disks (20Go system and either 20Go or 120Go data), and two 100Mbps network card per machine. Direct link should be possible to setup. We can provide you access to those two servers for you to test postgres-r if you want to, Regards, -- Dimitri Fontaine http://www.dalibo.com/
Hello Dimitri, Dimitri Fontaine wrote: > So you still only have your laptop as 'development facility' ? Yes. > At dalibo we have a couple of machines we're not using anymore, partly because > we don't have a need for them nowadays, mainly because it's end-of-life > hardware, not that trusty. > > It's some bi pentium III, 2*700MHz, 1Go RAM, 2 ide disks (20Go system and > either 20Go or 120Go data), and two 100Mbps network card per machine. > Direct link should be possible to setup. Thank you very much. But I think two machines is not quite enough. :-( Having a whole cluster emulated on my laptop allows me to work on the road. That's a very nice thing (tm). And the emulated machines are probably already faster than PIIIs... (Memory is the limiting factor, unfortunately I can't stuff more than 2GB in my laptop.) Regards Markus
Markus Schiltknecht wrote: > Probably that's just me, but I'm not aware of any (OSS) project which > can emulate a network (or even a GCS), start and stop processes as > requested and check how they react upon different inputs. I've worked on an emulated test rig for a replication system (not RDBMS but for LDAP). We used netem (OSS) for the network emulation and a pile of python and shell scripts and C client test apps. Testing replication is hard, of course, and you have to roll most of it yourself :( > If you know such a thing, please email me! (I've looked at STAF, but > that seems overly complex and targeted at completely different use-case.) In my experience test frameworks tend to provide less useful functionality than one might hope. Sometimes to the point that they're hardly worth bothering with at all.
Hi, David Boreham wrote: > I've worked on an emulated test rig for a replication system (not RDBMS > but for LDAP). > We used netem (OSS) Thanks. I've already heard about that one some while ago, but didn't remember it. I'll have another look. > for the network emulation and a pile of python and > shell scripts and > C client test apps. > Testing replication is hard, of course, and you have to roll most of it > yourself :( Yeah, I'm also using python for that. >> If you know such a thing, please email me! (I've looked at STAF, but >> that seems overly complex and targeted at completely different use-case.) > > In my experience test frameworks tend to provide less useful > functionality than one might hope. > Sometimes to the point that they're hardly worth bothering with at all. ACK. Same experience here. Regards Markus
Hi, David Boreham wrote: > We used netem (OSS) for the network emulation and a pile of python and > shell scripts and LOL, I've just figured that netem is the project behind: tc qdisc ... netem ... I'm already using that, too ;-) Just wasn't aware it's called netem. Sounds silly, since the name is in the command line, I know... Regards Markus
>> I just suggested that we should compare *interfaces* to configure >> replication (i.e. variable names, grammar, etc), since it looks like >> we have a bunch of different syntaxes to achieve the same. > > The same? > > Let's see. I currently have these additional commands: > > ALTER DATABASE testdb START REPLICATION > IN GROUP testgroup USING egcs; > > and > > ALTER DATABASE testdb ACCEPT REPLICATION > FROM GROUP testgroup USING egcs; > We have the following commands: SET TRANSACTION MASTER and CREATE TRIGGER <name> for { STARTUP | SHUTDOWN | BEGIN TRANSACTION | COMMIT TRANSACTION | ROLLBACK TRANSACTION } executeprocedure <func> ( <funcargs> ) It is worth noting that none of them have references to replication. Metainformation on replication is stored in normal tables. I think that we should discuss requirements first instead of going towards syntax. The latter is the last step to achieve a common set of ideas. I suggest the following road map. In a database life cycle, there are different events that may be useful for different replication solutions. For instance, we may say: - database startup and shutdown - connection startupand shutdown - transaction begin, commit, rollback - statement request - updates (i.e., insert, delete, update) - logging First, we should agree on which events we need to support a set of replication protocols (e.g., gorda, postgres-r, slony-i and ii, etc). Then, we should decide how such events will be notified. In particular, the gorda project decided to use "special triggers" but any sort of callback would be great for us. We adopted these hooks because we thought that it would be useful to different applications (e.g, materialized views). Third we should discuss what interface would be provided to inject information into remote replicas. Is the SPI_* interface good ? How to inject binary data into tables ? I know that PostgreSQL allows to do that. But is the interface provided enough ? Would not be interesting to inject things directly into log ? Fourth, we should have a discussion on locks, high priority transactions, notifications on blocking, etc... And finally, we may be able to discuss meta information, syntax, etc... What do you think ?
Hi, alfranio correia junior wrote: > We have the following commands: > > SET TRANSACTION MASTER > > and > > CREATE TRIGGER <name> for { STARTUP | SHUTDOWN | > BEGIN TRANSACTION | COMMIT TRANSACTION | ROLLBACK TRANSACTION } > execute procedure <func> ( <funcargs> ) Okay. > I think that we should discuss requirements first instead of going > towards syntax. The latter is the last step to achieve a common > set of ideas. I still maintain the point that I want to check requirements first. For that I need a working prototype. And I'm easy with prototyping in C in the backend code. If there's really a requirement for hooks, I can add them and decouple from PostgreSQL source code later on. What do you currently base your hooks on? IMO it's just naive to expect to be able to define hooks now, especially hooks as general as you seem to be heading to (I've read about sync and async multi master replication, single master replication as well as materialized views). Another point: modularization is nice and well, where appropriate. But here I don't see how it could help the user. Or do you expect users to plug in and out replication solutions like USB sticks? I think most users want to have *one* replication solution that works. Out of the box. Maybe they want one which can do sync as well as async replication, sure. But hooks don't give you that, nor do they make it any easier. I agree that it's helpful to modularize it in code. But you don't need hooks for that. I know I'm probably somewhat alone with that point of view. Regards Markus
Hi !!! > I still maintain the point that I want to check requirements first. For > that I need a working prototype. And I'm easy with prototyping in C in > the backend code. If there's really a requirement for hooks, I can add > them and decouple from PostgreSQL source code later on. I agree with you. You should build prototypes and try things in order to figure out exactly what we need. However, based on the experience that you already have in developing such prototypes most likely there are different futures that would like to see into PostgreSQL. What are they ? > What do you currently base your hooks on? IMO it's just naive to expect > to be able to define hooks now, especially hooks as general as you seem > to be heading to (I've read about sync and async multi master > replication, single master replication as well as materialized views). You have "prototypes" built upon such hooks: sync and async, single master and multi master. However, I am not arguing that hooks are the solution to any problem. But they work for the limited view that we have on the subject. > Another point: modularization is nice and well, where appropriate. But > here I don't see how it could help the user. Or do you expect users to > plug in and out replication solutions like USB sticks? I think most > users want to have *one* replication solution that works. Out of the > box. Maybe they want one which can do sync as well as async replication, > sure. But hooks don't give you that, nor do they make it any easier. I don't expect that. But I would like to test different replication protocols without patching the PostgreSQL. And I believe that we might come up with a set of in-core features that would enable this. Regards, Alfranio.
Markus Schiltknecht wrote: > Another point: modularization is nice and well, where appropriate. But > here I don't see how it could help the user. Or do you expect users to > plug in and out replication solutions like USB sticks? I think most > users want to have *one* replication solution that works. Out of the > box. Maybe they want one which can do sync as well as async replication, > sure. But hooks don't give you that, nor do they make it any easier. I, as a mostly-user, fully subscribe to that point of view. IMHO one of the biggest mistakes mysql made were those "pluggable storage managers". While all those different storage managers (innodb, bdb, myisam, ...) _look_ interchangeable from an interface point of view (You just specify which one to use when creating the table, right?), they all have _different_ semantics. Just forgot to write "with innodb" in _one_ of your table definitions, and transaction isolation goes out of the window :-(. I understand that different usecases need different replication solutions - but I think "Hey, let's just make them plugins" is not the way to go. It would work if all replication solutions had _exactly_ the same semantics - but if they do, then what is the point of all the different solutions anyway? Just my 2 eurocents... Greetings, Florian Pflug
On Wed, Nov 22, 2006 at 01:58:34PM -0500, Andrew Dunstan wrote: > Wasn't there supposed to be some discussion among replication authors to > try to come up with at least some common hooks? That was what I was aiming at, yes. http://pgfoundry.org/projects/replica-hooks/ A -- Andrew Sullivan | ajs@crankycanuck.ca Unfortunately reformatting the Internet is a little more painful than reformatting your hard drive when it gets out of whack. --Scott Morris
On Wed, Nov 22, 2006 at 08:21:23PM +0100, Markus Schiltknecht wrote: > > For Postgres-R, I definitely don't want to settle for any hooks, yet, > because I want to keep flexible. Hooks would only get into my way and > serve no purpose. Let me make the following argument to the contrary. This is a rationale argument for the other discussion, and not a discussion of the hooks themselves, so I think it's still appropriate for -hackers. The reason to write down what the _requirements_ are for hooks is so that the community can get to work on any of the general approaches to replication that they want. These hooks might, in fact, turn out to be nothing more than a layer of indirection in the core PostgreSQL code. The reason the earlier attempts at Postgres-R didn't ever make it out of testing was precisely, I argue, because there just wasn't an interface for the rest of the PostgreSQL project (maybe not interested in replication) to keep stable. So merely keeping up with the pace of change in the core code turned into a significant undertaking. Those are cycles stolen from the more useful work of making the replication code work better. The same thing is true of other pieces that have fallen by the side: because the whole of the PostgreSQL project moves so quickly, a small number of people working on a large feature set in relative isolation can end up spending way too much time keeping up with the core, and not enough time working on the features they desire. The result is a loss to everyone. So that's why I was trying to outline what, at least, the requirements are. A -- Andrew Sullivan | ajs@crankycanuck.ca Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz
I'm responding with a short answer here. But more of this sort of discussion would really help our meta discussion on what the problem is we're trying to solve. I'm trying to host that on the other list just on the grounds that -hackers has enough traffic about _actual_ features without cluttering it with discussion of wishlist items that nobody is yet committed to do the work on. On Fri, Nov 24, 2006 at 04:21:11PM +0100, Florian G. Pflug wrote: > managers". While all those different storage managers (innodb, bdb, > myisam, ...) _look_ interchangeable from an interface point of view > (You just specify which one to use when creating the table, right?), > they all have _different_ semantics. Yes. But one way MySQL could have done that right was to identify in their core that they needed an idea of storage management state. Then BEGIN; INSERT INTO innodb_table; UPDATE myisam_table; COMMIT; would fail in the way the ACID gods intended. But that, of course, would have required writing down in advance how these things should work. Which is what I'm proposing to do. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
> The reason the earlier attempts at Postgres-R didn't ever make it out > of testing was precisely, I argue, because there just wasn't an > interface for the rest of the PostgreSQL project (maybe not > interested in replication) to keep stable. So merely keeping up with > the pace of change in the core code turned into a significant > undertaking. Those are cycles stolen from the more useful work of > making the replication code work better. Actually I don't buy this argument. The only major change in *postgresql* that has slowed down Replicator is the move from users/groups to roles. We added a feature in the internal 1.6 release to replicate users/groups. We are currently behind because of things that have really nothing to do with PostgreSQL and more to do with reworking an evolutionary code base to be more manageable. I don't know much (anything) about Postgres-R but my guess is that the only major change that would have effected that project in recent years would have been two phase commit and that is only if they chose to take advantage of it. Sincerely, Joshua D. Drake
On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote: > Actually I don't buy this argument. The only major change in Ok, good. So why isn't Postgres-R something we have _now_? The work that I've seen on it, so far (and I speak as someone who invested a significant amount of staff time, cash money, and -- frankly -- "political" credibility in software based on that idea) is that there isn't a way to make it production-grade without pretty severe constraints on what it can do. It was that unhappy discovery that led me to say, "Can we please _write down_ what we think 'replication' might require, and what the trade-offs can be?" I'm trying to write requirements in public here; but all I get is silence. This frustrates me partly because, as someone who stuck his neck out to make sure Slony was released as free software, I hear a lot of demands for features people apparently want without much in the way of design proposals -- never mind code -- to achieve those features. When Jan delivered the initial release of Slony, it was preceded by a design doc. I note on -hackers long emails from (for example) Tom doing something very similar when proposing a major feature. What I'm trying to do is to get the replication-interested community of PostgreSQL users to say "here's what we mean by 'replication'" before we all go off inventing the grammar. We need to have a clue about the domain of discourse before we start settling the variable assignments. It seems to me that every single replication discussion on -hackers amounts to a bunch of futile attempts by colour blind people (of which I am one) to describe the colour 'high note', while their interlocutors describe the sound 'red'. I'm trying to get us to say what it would mean even to do the describing. Specifying requirements for what software is supposed to do is one of those thankless tasks that everyone complains is never done in the free software community. I am offering, earnestly, to do that. I just need a few people to tell me what _they think_ the software in question ought to do. I set up a mailing list. I have solicited comments. I'm not sure what else to do, but so far, I have the positive remarks of Jose (GORDA), the remarks of Markus (which amount to "this is a waste of time", unless I misread him), and nothing else. Surely, in a community that spends time on the topic of whether replication "should be in the back end", we oughta be able to come up with 10 or so people who are willing to say what "being in the back end" would mean. At the moment, this trivial goal is all I'm aiming for. A -- Andrew Sullivan | ajs@crankycanuck.ca When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes
Andrew Sullivan wrote: > On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote: >> Actually I don't buy this argument. The only major change in > > Ok, good. So why isn't Postgres-R something we have _now_? That's is a good question and as I mentioned, I don't know much about Postgres-R. My point was directly to the argument that a fast moving PostgreSQL somehow limits the ability for replication to be built. That argument, I believe is false. I originally responded to the rest of your email but thought better of it. The only thing I can say is, my experience is that something like replication will only be productively completed, outside the community. Jan, for the most part created his own community with Slony. Postgres-R is doing the same as is the others such as pgPool. The nature that they are all their own communities, not to mention several closed source products (Replicator, Unicluster) pretty much sets the whole thing up to fail IMHO. Otherwise you are just hearding cats. Joshua D. Drake > > A >
Markus Schiltknecht wrote: > LOL, I've just figured that netem is the project behind: > > tc qdisc ... netem ... > > I'm already using that, too ;-) Just wasn't aware it's called netem. > Sounds silly, since the name is in the command line, I know... Heh. AFAIK netem is the tc stuff that isn't much use for production router use (e.g. introduce a 10ms packet delay on this kind of traffic...). We used a mixture of netem and regular tc kernel modules, in a Linux box that had 6 NICs, with Python driving it. Each replication node test machine was connected with a straight-through patch cable to one of the NICs on the 'spider' machine. The Python could set up the netem/tc on the router such that various test scenarios with different banwidth/delay values were implemented. Also of course loss of connectivity by dropping all packets on an interface. Each test machine had two NICs - the second one being used to communicate with it out of band from the replication traffic and network emulation. Then on top of all this the actual replication tests were run. One of the things we were interested in was replication throughput vs network latency, so we also measured performance and made that being acceptable a test pass condition. If you want really fancy network emulation you'd need to use nistnet. It can do some things that are not possible with netem (statistical packet drop for example). However IMHO this is only appropriate for testing TCP/IP stack implementation. Varying latency, throughput, and introducing connectivity outages is good enough for user mode code I believe. Nistnet is not in the stock kernel, wheras netem is.
Have you looked at the new HA/load balancing section of the docs? http://developer.postgresql.org/pgdocs/postgres/high-availability.html I got a lot of feedback on that. Perhaps it can be a starting point for you. --------------------------------------------------------------------------- Andrew Sullivan wrote: > On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote: > > Actually I don't buy this argument. The only major change in > > Ok, good. So why isn't Postgres-R something we have _now_? The work > that I've seen on it, so far (and I speak as someone who invested a > significant amount of staff time, cash money, and -- frankly -- > "political" credibility in software based on that idea) is that there > isn't a way to make it production-grade without pretty severe > constraints on what it can do. > > It was that unhappy discovery that led me to say, "Can we please > _write down_ what we think 'replication' might require, and what the > trade-offs can be?" I'm trying to write requirements in public here; > but all I get is silence. This frustrates me partly because, as > someone who stuck his neck out to make sure Slony was released as > free software, I hear a lot of demands for features people apparently > want without much in the way of design proposals -- never mind code -- > to achieve those features. When Jan delivered the initial release of > Slony, it was preceded by a design doc. I note on -hackers long > emails from (for example) Tom doing something very similar when > proposing a major feature. What I'm trying to do is to get the > replication-interested community of PostgreSQL users to say "here's > what we mean by 'replication'" before we all go off inventing the > grammar. We need to have a clue about the domain of discourse before > we start settling the variable assignments. > > It seems to me that every single replication discussion on -hackers > amounts to a bunch of futile attempts by colour blind people (of > which I am one) to describe the colour 'high note', while their > interlocutors describe the sound 'red'. I'm trying to get us to say > what it would mean even to do the describing. > > Specifying requirements for what software is supposed to do is one of > those thankless tasks that everyone complains is never done in the > free software community. I am offering, earnestly, to do that. I > just need a few people to tell me what _they think_ the software in > question ought to do. I set up a mailing list. I have solicited > comments. I'm not sure what else to do, but so far, I have the > positive remarks of Jose (GORDA), the remarks of Markus (which amount > to "this is a waste of time", unless I misread him), and nothing > else. > > Surely, in a community that spends time on the topic of whether > replication "should be in the back end", we oughta be able to come up > with 10 or so people who are willing to say what "being in the back > end" would mean. At the moment, this trivial goal is all I'm aiming > for. > > A > > -- > Andrew Sullivan | ajs@crankycanuck.ca > When my information changes, I alter my conclusions. What do you do sir? > --attr. John Maynard Keynes > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Joshua D. Drake wrote: > Andrew Sullivan wrote: > >On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote: > >>Actually I don't buy this argument. The only major change in > > > >Ok, good. So why isn't Postgres-R something we have _now_? > > That's is a good question and as I mentioned, I don't know much about > Postgres-R. My point was directly to the argument that a fast moving > PostgreSQL somehow limits the ability for replication to be built. That > argument, I believe is false. > > I originally responded to the rest of your email but thought better of > it. The only thing I can say is, my experience is that something like > replication will only be productively completed, outside the community. This is like nVidia saying that "open source developers are not competent enough to understand the coding of a graphics card driver". -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Nov 27, 2006 at 07:27:32AM -0500, Bruce Momjian wrote: > > Have you looked at the new HA/load balancing section of the docs? > > http://developer.postgresql.org/pgdocs/postgres/high-availability.html > > I got a lot of feedback on that. Perhaps it can be a starting point for > you. Yes, I have; and yes, it helps. What I was hoping to do, though, as well, was come up with the list of facilities that developers of these various systems say they need. I don't expect this will happen quickly (which is why I figured it needed a project -- if I could do in in six weeks, then we wouldn't need a mailing list and the like). But it seemed to me that, with so many projects on the go, getting a list together of what the developers of those systems say they need would be the obvious way to define, later, what hooks, if any, are needed in the core system. A -- Andrew Sullivan | ajs@crankycanuck.ca If they don't do anything, we don't need their acronym. --Josh Hamilton, on the US FEMA
> > I originally responded to the rest of your email but thought better of > > it. The only thing I can say is, my experience is that something like > > replication will only be productively completed, outside the community. > > This is like nVidia saying that "open source developers are not > competent enough to understand the coding of a graphics card driver". > I believe you misunderstood me. I am not saying that replication can not be built in an Open Source manner. Slony is a perfect example of that. I am saying that involving the larger, general PostgreSQL community in such a task would be counter-productive. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Ühel kenal päeval, E, 2006-11-27 kell 07:50, kirjutas Joshua D. Drake: > > > I originally responded to the rest of your email but thought better of > > > it. The only thing I can say is, my experience is that something like > > > replication will only be productively completed, outside the community. > > > > This is like nVidia saying that "open source developers are not > > competent enough to understand the coding of a graphics card driver". > > > > I believe you misunderstood me. I am not saying that replication can not > be built in an Open Source manner. Slony is a perfect example of that. I > am saying that involving the larger, general PostgreSQL community in > such a task would be counter-productive. As several different approaches to "replication" involve same requirements and/or touching the same places in code it seems a good idea to at least get some more or less formal descriptions from parties involved. Also, it seems that largely the same things are needed for other projects, like precomputed/materialized views and auditing or some other non-replication data moving methods. While each of the replication and non-replication projects does do its own thing, it may still be beneficial to try to provide some hooks in right places for them. Not all projects need to use all of them but having all projects patch the same places in core code will make it pretty much impossible to use more than one at a time. As an example, one may want to have both synchronous auditing data and async replication to be done on the same live database, both gathered at the point of data manipulation and both moved to different machines. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
On Thu, 2006-11-23 at 08:50 +0100, Markus Schiltknecht wrote: > Hi, > > Jeff Davis wrote: > > I think you misunderstand my point. > > That may well be. Please keep in mind that I'm not a native English > speaker, thus please speak loud and clear ;-) > > > I was talking about replication > > implementations that already exist. They already have patches on the > > backend that are necessary for their solution to work. > > Do they? I'm only aware of the GORDA patch. The old Postgres-R patches > are out of date. Sequoia, PgPool and PgPool-II obviously do not need > patches. Slony-II, Postgres-R (8) (mine) as well as PGCluster-II are not > open sourced, yet. And I haven't heard much regarding hooks from any of > the proprietary vendors (except Joshua's recent statement that he's > happy without such hooks). Because we're talking about replication, I don't think we can limit the discussion to current open source solutions. I could be mistaken, but I am under the impression that commercial replication solutions do patch the backend. > > The idea is to design a single set of hooks that can be used to > > implement an entire class of replication. This only makes sense after > > existing solutions come to some agreement. I view that as a first step, > > assuming that it is necessary to alter the core in order to implement > > the class of replication in question. > > As there's not even *one* existing and open replication solution which > needs patching the backend, you are basing your statements on a false > premise. Thus, speaking of hooks as a "first step" is very confusing, at > least. > You're right, there is no agreement yet. When I say "first step," I mean that it's the first step toward getting any form of replication support in the _backend_, _not_ a first step toward a replication solution at all. It may be a long time before the backend has replication-specific support of any kind, but many replication projects have passed the first step toward replication a long time ago. I am not advocating replication support in the backend (since I don't even know what form that would take), nor am I saying that it will appear soon. I am just saying that replication-specific syntax is unlikely to appear before other replication-specific details. Regards,Jeff Davis
> > Do they? I'm only aware of the GORDA patch. The old Postgres-R patches > > are out of date. Sequoia, PgPool and PgPool-II obviously do not need > > patches. Slony-II, Postgres-R (8) (mine) as well as PGCluster-II are not > > open sourced, yet. And I haven't heard much regarding hooks from any of > > the proprietary vendors (except Joshua's recent statement that he's > > happy without such hooks). > > Because we're talking about replication, I don't think we can limit the > discussion to current open source solutions. I could be mistaken, but I > am under the impression that commercial replication solutions do patch > the backend. Quite. > I am not advocating replication support in the backend (since I don't > even know what form that would take), nor am I saying that it will patch -p1 < replicator.diff ;) Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Hi, Jeff Davis wrote: > Because we're talking about replication, I don't think we can limit the > discussion to current open source solutions. I could be mistaken, but I > am under the impression that commercial replication solutions do patch > the backend. Sure. But as you see, at least Joshua D. Drake is quite happy with patch -p1 < all_his_replicator_changes.diff I'm, too. Because I don't think I could get anything useful into core. And as long as I'd still have to patch the backend, what would that serve me? I really think this decision should be left to the developers of replication systems. We *will* ask core, if we want to have something added (as did the GORDA project). I state the same in my FAQ at [1]. > You're right, there is no agreement yet. When I say "first step," I mean > that it's the first step toward getting any form of replication support > in the _backend_, _not_ a first step toward a replication solution at > all. Okay, sorry, then I misread you. > It may be a long time before the backend has replication-specific > support of any kind, but many replication projects have passed the first > step toward replication a long time ago. Have they? Have you heard requests for specific additions into core from any of them? > I am not advocating replication support in the backend (since I don't > even know what form that would take), nor am I saying that it will > appear soon. I am just saying that replication-specific syntax is > unlikely to appear before other replication-specific details. Sure. Regards Markus [1]: http://www.postgres-r.org/about/faqs
Hello Andrew, Andrew Sullivan wrote: > On Sat, Nov 25, 2006 at 11:05:34AM -0800, Joshua D. Drake wrote: >> Actually I don't buy this argument. Nether do I. I can only reiterate that interfacing with the database backend is *not* the problem. I've been porting Postgres-R forward since 7.4 and only few changes were necessary since then. And using a decent version control system simplifies the task of propagating from CVS HEAD to my branch. The few conflicts that arose were mostly trivial to resolve (renaming or slight calling convention changes). Andrew Sullivan wrote: > Ok, good. So why isn't Postgres-R something we have _now_? (I note you don't count my version of Postgres-R (8), that might be reasonable depending on your definition of 'having Postgres-R'.) I can't speak for others, but I just don't have much spare time left. And it's a complex matter involving lots of corner cases like network outages, crashes of the replication manager or GCS daemon, etc. Testing and making it production grade software really takes a lot of time. IMO this is where replication solutions could work together, because all of them need to simulate a cluster somehow, to test their project. But this certainly has nothing to do with PostgreSQL Core. Another point for me is that the feedback I got on Postgres-R since Toronto is very close to zero. Some people haven't even noticed that there is Postgres-R code for 8.2. Or they don't count my variant for some reasons. For example Tom Lane who recently pointed out Postgres-R as an example of code drift in [1]. No offense, it's just very contradictory to the hype around replication. > The work that I've seen on it, so far (and I speak as someone who > invested a significant amount of staff time, cash money, and -- > frankly -- "political" credibility in software based on that idea) is > that there isn't a way to make it production-grade without pretty > severe constraints on what it can do. Right, the Postgres-R algorithm has limitations. And it certainly does not fit all use cases. The Toronto Meeting has opened my eyes in that aspect and I'm thankful for that. > It was that unhappy discovery that led me to say, "Can we please > _write down_ what we think 'replication' might require, and what the > trade-offs can be?" I'm trying to write requirements in public here; > but all I get is silence. This frustrates me partly because, as > someone who stuck his neck out to make sure Slony was released as > free software, I hear a lot of demands for features people apparently > want without much in the way of design proposals -- never mind code -- > to achieve those features. When Jan delivered the initial release of > Slony, it was preceded by a design doc. I note on -hackers long > emails from (for example) Tom doing something very similar when > proposing a major feature. What I'm trying to do is to get the > replication-interested community of PostgreSQL users to say "here's > what we mean by 'replication'" before we all go off inventing the > grammar. We need to have a clue about the domain of discourse before > we start settling the variable assignments. As you surely have noticed, I've been discussing forth and back with Bruce about replication for the documentation. I've been doing that because I wanted to clarify what 'replication' is, what we are talking about when we say 'multi-master replication' or 'data partitioning', etc.. Sadly, only very few people from the 'replication interested community' were discussing. I've even been trying to get more of them involved. > It seems to me that every single replication discussion on -hackers > amounts to a bunch of futile attempts by colour blind people (of > which I am one) to describe the colour 'high note', while their > interlocutors describe the sound 'red'. I'm trying to get us to say > what it would mean even to do the describing. > > Specifying requirements for what software is supposed to do is one of > those thankless tasks that everyone complains is never done in the > free software community. I am offering, earnestly, to do that. I > just need a few people to tell me what _they think_ the software in > question ought to do. I set up a mailing list. I have solicited > comments. I'm not sure what else to do, but so far, I have the > positive remarks of Jose (GORDA), the remarks of Markus (which amount > to "this is a waste of time", unless I misread him), and nothing > else. I'm sorry if this sounded that negative. Defining what software is supposed to do is certainly necessary, especially as long as replication discussions on -hackers look like what you described above. Thus we should better first define what we mean to make sure we are talking about the same when speaking of 'multi-master replication' for example. Please note that I've never raised my voice against that. I'm just saying: it's not time for hooks or any other framework, yet. We don't even agree in that we need hooks to interface with the database. Even having to define points in code where I could hook would limit me in an unacceptable way, if I couldn't redefine them whenever I wanted. > Surely, in a community that spends time on the topic of whether > replication "should be in the back end", we oughta be able to come up > with 10 or so people who are willing to say what "being in the back > end" would mean. At the moment, this trivial goal is all I'm aiming > for. Being in the back end for me means, I can code in C, use shared memory and system catalogs, add another sub-process to PostgreSQL, introduce another operation mode for (remote) backends, mess with the postmaster and communicate to the backends via shared memory and signals (IPC). IPC is even a good example for something which could be of use for me. Back in April, I've sent a patch implementing internal messages passing (see [2]). It's a very general feature I need and, as pointed out in the mail, it could even be of use for others. But I have no hope for it to make it into core, because I've never seen something accepted which could perhaps be of use in the future. I've very well noticed that you and others offered to help in various ways. Thank you for that. But I also got the impression that there's an urge towards hooks or a framework or something so as PostgreSQL can provide that and refer to it as "having everything needed" for replication. That sounds marketing driven, IMO. I can assure you that I will continue to work on Postgres-R. I think its design has been described well enough already. I will post more design ideas for extensions and additions on the Postgres-R or on the replica-hooks mailing list as soon as I have them completely thought through and written down. And for sure I'll let you know if and how you or others can help me. Regards Markus [1]: Tom Lane: Re: Getting a move on for 8.2 beta: http://archives.postgresql.org/pgsql-hackers/2006-09/msg00139.php [2]: My Patch for IMessages: http://archives.postgresql.org/pgsql-patches/2006-04/msg00047.php
On Tue, Nov 28, 2006 at 02:19:51PM +0100, Markus Schiltknecht wrote: > (I note you don't count my version of Postgres-R (8), that might be > reasonable depending on your definition of 'having Postgres-R'.) Yes; what I meant was "production-grade, ready to go." I've played with your code. I'm mightily impressed that you managed to get it working. But I don't think it's ready for production use tomorrow in the environments where this sort of availability is actually worth the cost (think "money depends on this"). That's what I mean by "have". > and making it production grade software really takes a lot of time. IMO > this is where replication solutions could work together, because all of > them need to simulate a cluster somehow, to test their project. But this > certainly has nothing to do with PostgreSQL Core. I agree with you that such supporting tools would be a very good thing. Maybe nothing else is needed. Like I said before, a negative result is still a result. > Another point for me is that the feedback I got on Postgres-R since > Toronto is very close to zero. Some people haven't even noticed that > there is Postgres-R code for 8.2. Well, part of the problem is there isn't much to say to code that I can't look at. I can play with it on the live CD, but so far the source isn't on the web page at postgres-r.org, which is the only source I know for it. This makes the whole matter trickier for potential adopters, because it's basically a black box. > As you surely have noticed, I've been discussing forth and back with > Bruce about replication for the documentation. I've been doing that > because I wanted to clarify what 'replication' is, what we are talking > about when we say 'multi-master replication' or 'data partitioning', etc.. Yes, I think those docs are very good. But it's one thing to say, "This is what replication means," &c., and quite another to say, "Here are the sorts of things we plan to do, which have to work with that pile of code over there." > I'm sorry if this sounded that negative. No, not negative. Remember, as I said, if it turns out that we can't actually come up with an outline of replication framework necessary conditions, we have also discovered something. That's a useful result, because it tells us that the next thing we need to do is figure out where the exclusive features are, so we can say "you can have A or B, but not both." > through and written down. And for sure I'll let you know if and how you > or others can help me. Ok, thanks. A -- Andrew Sullivan | ajs@crankycanuck.ca When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes
On Tue, 2006-11-28 at 08:42 +0100, Markus Schiltknecht wrote: > > You're right, there is no agreement yet. When I say "first step," I mean > > that it's the first step toward getting any form of replication support > > in the _backend_, _not_ a first step toward a replication solution at > > all. > > Okay, sorry, then I misread you. > > > It may be a long time before the backend has replication-specific > > support of any kind, but many replication projects have passed the first > > step toward replication a long time ago. > > Have they? Have you heard requests for specific additions into core from > any of them? > I think you misread me again. I was again trying to make a distinction between the progress of replication _for_ postgresql (which has been very good, way past the first step) and the progress of replication natively in the community version of the postgresql core, which has a long way to go. I wasn't very clear, but I don't think you actually disagree with me. Regards,Jeff Davis
Hi, Andrew Sullivan wrote: > Yes; what I meant was "production-grade, ready to go." I've played > with your code. I'm mightily impressed that you managed to get it > working. But I don't think it's ready for production use tomorrow in > the environments where this sort of availability is actually worth > the cost (think "money depends on this"). That's what I mean by > "have". Agreed. > I agree with you that such supporting tools would be a very good > thing. Maybe nothing else is needed. Like I said before, a negative > result is still a result. Okay. > Well, part of the problem is there isn't much to say to code that I > can't look at. I can play with it on the live CD, but so far the > source isn't on the web page at postgres-r.org, which is the only > source I know for it. This makes the whole matter trickier for > potential adopters, because it's basically a black box. Very understandable. I'm trying to find ways to open source Postgres-R. > Yes, I think those docs are very good. But it's one thing to say, > "This is what replication means," &c., and quite another to say, > "Here are the sorts of things we plan to do, which have to work with > that pile of code over there." ACK. >> I'm sorry if this sounded that negative. > > No, not negative. Remember, as I said, if it turns out that we can't > actually come up with an outline of replication framework necessary > conditions, we have also discovered something. That's a useful > result, because it tells us that the next thing we need to do > is figure out where the exclusive features are, so we can say "you > can have A or B, but not both." Okay. >> through and written down. And for sure I'll let you know if and how you >> or others can help me. > > Ok, thanks. Thank you. Markus
On Wed, 2006-11-22 at 19:27 +0000, Simon Riggs wrote: > On Wed, 2006-11-22 at 19:23 +0100, Markus Schiltknecht wrote: > > > Jeff Davis wrote: > > > If there is some great replication solution that a lot of people need > > > and it will only work with a change to core, that change might make it > > > in. > > > > That's what I'm saying. Although it's hypothetical. > > My interest is in extending Warm Standby [8.2] to include the following > forms of replication: > 1. asynchronous WAL-record level transfer to Standby server > 2. synchronous WAL-record level transfer to Standby server > My foresight includes that this would likely require some improvements > in Group Commit, but I've not done the design for this *yet*. > > I would also like to include some performance optimisations into Core > that are specifically aimed at improving Slony performance. (I'm more > than happy if those things also increase performance of other > situations). That's slightly different thing to embedding Slony in Core, > which I am *not* suggesting. Suggestions welcome. > > This will then give PostgreSQL: > - improved performance for the most popular production replication > system for PostgreSQL (Slony) > - a capability for Synchronous Replication, when it is requested > > That's the limit of my ambitions for 8.3. Very curious slony user here. Can I ask what you have planned for 8.3 in regards to Slony performance? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Tue, 2006-11-28 at 14:22 -0500, Brad Nicholson wrote: > On Wed, 2006-11-22 at 19:27 +0000, Simon Riggs wrote: > > I would also like to include some performance optimisations into Core > > that are specifically aimed at improving Slony performance. (I'm more > > than happy if those things also increase performance of other > > situations). That's slightly different thing to embedding Slony in Core, > > which I am *not* suggesting. Suggestions welcome. > Very curious slony user here. Can I ask what you have planned for 8.3 > in regards to Slony performance? Discussion opened on slony-general list. See you there. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Nov 28, 2006, at 10:18 AM, Markus Schiltknecht wrote: >> Well, part of the problem is there isn't much to say to code that I >> can't look at. I can play with it on the live CD, but so far the >> source isn't on the web page at postgres-r.org, which is the only >> source I know for it. This makes the whole matter trickier for >> potential adopters, because it's basically a black box. > > Very understandable. I'm trying to find ways to open source > Postgres-R. Related to that, and your comment about people not using Postgres- R... I think it's going to be very, very hard to get people to seriously consider using Postgres-R while it's essentially a fork of the community code, with little/no visibility into what changes have been made and how they could affect data stored in the database. Contrast this with Slony, where there are no back-end changes and the trigger code (which is essentially the only thing that touches your live data) is readily visible just via \df+. That makes it very easy for people to convince themselves that Slony is unlikely to hose their data. Of course at this point there's enough people using Slony that that's no longer a concern, but back when it was introduced it would have been. Given the nature of Postgres-R, I suppose there's no real way people could become comfortable without looking at most/all of the code, since it does tie pretty deeply into the backend. But that's one way that having published hooks would help; if you could at least put the code that touches the guts of the database and the source data out in the open, people might be more willing to give Postgres-R a try. You also mentioned putting IPC in the backend, since it's something that you need. I think breaking something as complex as replication into smaller chunks that can stand on their own is a great idea. Oracle's replication does this, and I wish Slony would. Having access to the queuing/communications mechanism that the Slony folks have built would be very useful. So I'd definitely encourage making subsets of Postgres-R functionality available, and promoting them via pgFoundry. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Hi, Jim Nasby wrote: > Related to that, and your comment about people not using Postgres-R... I commented about the feedback I got, which would include rants about why it's not open source on such. But I didn't even get such responses. I'm not supposing anybody to use Postgres-R currently. I don't use it in production myself. And the LiveCD currently serves mainly as an evidence for real code behind my words. ;-) > I think it's going to be very, very hard to get people to seriously > consider using Postgres-R while it's essentially a fork of the community > code, with little/no visibility into what changes have been made and how > they could affect data stored in the database. Agreed. > Given the nature of Postgres-R, I suppose there's no real way people > could become comfortable without looking at most/all of the code, since > it does tie pretty deeply into the backend. Most *people* use PostgreSQL in production without having ever looked at it's source code. Why should *they* want to look at Postgres-R sources? I surely see that I could gain *developers* acceptance by opening up the source code. Please note that I'm absolutely for open source software, I always wanted to release my changes to Postgres-R under a BSD license one day. I'm so much for open source software that I want to make a living from writing OSS. I simply don't know exactly how to do that, yet. So I'm keeping Postgres-R closed to leave me more options open. > But that's one way that > having published hooks would help; if you could at least put the code > that touches the guts of the database and the source data out in the > open, people might be more willing to give Postgres-R a try. I don't really buy that argument. It would be quite some work for me and not really help other developers, because the real code is still hidden away. > You also mentioned putting IPC in the backend, since it's something that > you need. I think breaking something as complex as replication into > smaller chunks that can stand on their own is a great idea. Agreed. But once again, responses on my trivial IMessages implementations were... zero. Not even complaints about how lacking it is. Or discussing performance of pipes vs. this shared memory message passing approach. Nothing. Why should I work on something nobody else seems to be interested in? > Oracle's > replication does this, and I wish Slony would. Having access to the > queuing/communications mechanism that the Slony folks have built would > be very useful. So I'd definitely encourage making subsets of Postgres-R > functionality available, and promoting them via pgFoundry. Agreed. I myself have thought about splitting some things out (i.e. this IPC stuff, another chunk to split out could be the GCS interface). It could make testing and development easier. But making it available via pgFoundry and promoting it as a separate project is another story which certainly depends on some interested people asking for it. If Linus didn't get any answers to his famous post "What would you like to see most in minix?" he most probably wouldn't have published Linux. Regards Markus
On Sun, Dec 03, 2006 at 10:04:46PM -0800, Jim Nasby wrote: > Oracle's replication does this, and I wish Slony would. Having access > to the queuing/communications mechanism that the Slony folks have > built would be very useful. Abstraction patches are welcome ;-) Seriously, though, part of what I'm attempting to achieve (and that it keeps happening here suggests to me that another list was a bad idea) is to identify these _elements_. Then we can recycle them, after all. A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler