Thread: Replication documentation addition
Here is a new replication documentation section I want to add for 8.2: ftp://momjian.us/pub/postgresql/mypatches/replication Comments welcomed. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hello Bruce, Bruce Momjian wrote: > Here is a new replication documentation section I want to add for 8.2: > > ftp://momjian.us/pub/postgresql/mypatches/replication > > Comments welcomed. Thank you, that sounds good. It's targeted to production use and currently available solutions, which makes sense in the official manual. You are explaining the sync vs. async categorization, but I sort of asked myself where the explanation of single vs multi-master has gone. I then realized, that you are talking about read-only and a "read/write mix of servers". Then again, you are mentioning 'Multi-Master Replication' as one type of replication solutions. I think we should be consistent in our naming. As Single- and Multi-Master are the more common terms among database replication experts, I'd recommend to use them and explain what they mean instead of introducing new names. Along with that, I'd argue that this Single- or Multi-Master is a categorization as Sync vs Async. In that sense, the last chapter should probably be named 'Distributed-Shared-Memory Replication' or something like that instead of 'Multi-Master Replication', because as we know, there are several ways of doing Multi-Master Replication (Slony-II / Postgres-R, Distributed Shared Memory, 2PC in application code or the above mentioned 'Query Broadcast Replication', which would fall into a Multi-Master Replication model as well) Also in the last chapter, instead of just saying that "PostgreSQL does not offer this type of replication", we could probably say that different projects are trying to come up with better replication solutions. And there are several proprietary products based on PostgreSQL which do solve some kinds of Multi-Master Replication. Not that I want to advertise for any of them, but it just sounds better than the current "no, we don't offer that". As this documentation mainly covers production-quality solutions (which is absolutely perfect), can we document the status of current projects somewhere, probably in a wiki? Or at least mention them somewhere and point to their websites? It would help to get rid of all those rumors and uncertainties. Or are those intentional? Just my two cents. Regards Markus
Ühel kenal päeval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian: > Here is a new replication documentation section I want to add for 8.2: > > ftp://momjian.us/pub/postgresql/mypatches/replication This is how data partitioning is currently described there > Data Partitioning > ----------------- > > Data partitioning splits the database into data sets. To achieve > replication, each data set can only be modified by one server. For > example, data can be partitioned by offices, e.g. London and Paris. > While London and Paris servers have all data records, only London can > modify London records, and Paris can only modify Paris records. Such > partitioning is usually accomplished in application code, though rules > and triggers can help enforce partitioning and keep the read-only data > sets current. Slony can also be used in such a setup. While Slony > replicates only entire tables, London and Paris can be placed in > separate tables, and inheritance can be used to access from both tables > using a single table name. Maybe another use of partitioning should also be mentioned. That is , when partitioning is used to overcome limitations of single servers (especially IO and memory, but also CPU), and only a subset of data is stored and processed on each server. As an example of this type of partitioning you could mention Bizgres MPP (a PG-based commercial product, http://www.greenplum.com ), which partitions data to use I/O and CPU of several DB servers for processing complex OLAP queries, and Pl_Proxy ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP loads. I think the "official" term for this kind of "replication" is Shared-Nothing Clustering. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hannu Krosing wrote: > I think the "official" term for this kind of "replication" is > Shared-Nothing Clustering. Well, that's just another distinction for clusters. Most of the time it's between Shared-Disk vs. Shared-Nothing. You could also see the very Big Irons as a Shared-Everything Cluster. While it's certainly true, that any kind of data partitioning for databases only make sense for Shared-Nothing Clusters, I don't think it's a 'kind of replication'. AFAIK most database replication solutions are built for Shared-Nothing Clusters. (With the exception of PgCluster-II, I think). Regards Markus
On Tue, 2006-10-24 at 00:20 -0400, Bruce Momjian wrote: > Here is a new replication documentation section I want to add for 8.2: > > ftp://momjian.us/pub/postgresql/mypatches/replication > > Comments welcomed. It's a very good start to a complete minefield of competing solutions. My first thought would be to differentiate between clustering and replication, which will bring out many differences. My second thought would be to differentiate between load balancing, multi-threading, parallel query, high availability and recoverability, which would probably sort out the true differences in the above mix. But that wouldn't help most people and almost everybody would find fault. IMHO most people I've spoken to take "replication" to mean an HA solution, so perhaps we should cover it in those terms. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
OK, I have updated the URL. Please let me know how you like it. --------------------------------------------------------------------------- Hannu Krosing wrote: > ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian: > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > This is how data partitioning is currently described there > > > Data Partitioning > > ----------------- > > > > Data partitioning splits the database into data sets. To achieve > > replication, each data set can only be modified by one server. For > > example, data can be partitioned by offices, e.g. London and Paris. > > While London and Paris servers have all data records, only London can > > modify London records, and Paris can only modify Paris records. Such > > partitioning is usually accomplished in application code, though rules > > and triggers can help enforce partitioning and keep the read-only data > > sets current. Slony can also be used in such a setup. While Slony > > replicates only entire tables, London and Paris can be placed in > > separate tables, and inheritance can be used to access from both tables > > using a single table name. > > Maybe another use of partitioning should also be mentioned. That is , > when partitioning is used to overcome limitations of single servers > (especially IO and memory, but also CPU), and only a subset of data is > stored and processed on each server. > > As an example of this type of partitioning you could mention Bizgres MPP > (a PG-based commercial product, http://www.greenplum.com ), which > partitions data to use I/O and CPU of several DB servers for processing > complex OLAP queries, and Pl_Proxy > ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP > loads. > > I think the "official" term for this kind of "replication" is > Shared-Nothing Clustering. > > -- > ---------------- > Hannu Krosing > Database Architect > Skype Technologies O? > Akadeemia tee 21 F, Tallinn, 12618, Estonia > > Skype me: callto:hkrosing > Get Skype for free: http://www.skype.com > -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian > Sent: Tuesday, October 24, 2006 5:16 PM > To: Hannu Krosing > Cc: PostgreSQL-documentation; PostgreSQL-development > Subject: Re: [HACKERS] Replication documentation addition > > > OK, I have updated the URL. Please let me know how you like it. There's a typo on line 8, first paragraph: "perhaps with only one server allowing write rwork together at the same time." Also, consider this wording of the last description: "Single-Query Clustering..." Replaced by: "Shared Nothing Clustering ----------------------- This allows multiple servers with separate disks to work together on a each query. In shared nothing clusters, the work of answering each query is distributed among the servers to increase the performance through parallelism. These systems will typically feature high availability by using other forms of replication internally. While there are no open source options for this type of clustering, there are several commercial products available that implement this approach, making PostgreSQL achieve very high performance for multi-Terabyte business intelligence databases." - Luke
I have changed the text to reference "fail over" and "load balancing". I think it makes it clearer. Let me know what you think. I am hesitant to mention commercial PostgreSQL products in our documentation. --------------------------------------------------------------------------- Markus Schiltknecht wrote: > Hello Bruce, > > Bruce Momjian wrote: > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > > Comments welcomed. > > Thank you, that sounds good. It's targeted to production use and > currently available solutions, which makes sense in the official manual. > > You are explaining the sync vs. async categorization, but I sort of > asked myself where the explanation of single vs multi-master has gone. I > then realized, that you are talking about read-only and a "read/write > mix of servers". Then again, you are mentioning 'Multi-Master > Replication' as one type of replication solutions. I think we should be > consistent in our naming. As Single- and Multi-Master are the more > common terms among database replication experts, I'd recommend to use > them and explain what they mean instead of introducing new names. > > Along with that, I'd argue that this Single- or Multi-Master is a > categorization as Sync vs Async. In that sense, the last chapter should > probably be named 'Distributed-Shared-Memory Replication' or something > like that instead of 'Multi-Master Replication', because as we know, > there are several ways of doing Multi-Master Replication (Slony-II / > Postgres-R, Distributed Shared Memory, 2PC in application code or the > above mentioned 'Query Broadcast Replication', which would fall into a > Multi-Master Replication model as well) > > Also in the last chapter, instead of just saying that "PostgreSQL does > not offer this type of replication", we could probably say that > different projects are trying to come up with better replication > solutions. And there are several proprietary products based on > PostgreSQL which do solve some kinds of Multi-Master Replication. Not > that I want to advertise for any of them, but it just sounds better than > the current "no, we don't offer that". > > As this documentation mainly covers production-quality solutions (which > is absolutely perfect), can we document the status of current projects > somewhere, probably in a wiki? Or at least mention them somewhere and > point to their websites? It would help to get rid of all those rumors > and uncertainties. Or are those intentional? > > Just my two cents. > > Regards > > Markus > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon Riggs wrote: > On Tue, 2006-10-24 at 00:20 -0400, Bruce Momjian wrote: > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > > Comments welcomed. > > It's a very good start to a complete minefield of competing solutions. > > My first thought would be to differentiate between clustering and > replication, which will bring out many differences. I have gone with "fail-over" and "load balancing" in the updated text. > My second thought would be to differentiate between load balancing, > multi-threading, parallel query, high availability and recoverability, > which would probably sort out the true differences in the above mix. But > that wouldn't help most people and almost everybody would find fault. Yep. > IMHO most people I've spoken to take "replication" to mean an HA > solution, so perhaps we should cover it in those terms. Yes, I removed any reference to replication. It seemed too general. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I have updated the text. Please let me know what else I should change. I am unsure if I should be mentioning commercial PostgreSQL products in our documentation. --------------------------------------------------------------------------- Hannu Krosing wrote: > ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian: > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > This is how data partitioning is currently described there > > > Data Partitioning > > ----------------- > > > > Data partitioning splits the database into data sets. To achieve > > replication, each data set can only be modified by one server. For > > example, data can be partitioned by offices, e.g. London and Paris. > > While London and Paris servers have all data records, only London can > > modify London records, and Paris can only modify Paris records. Such > > partitioning is usually accomplished in application code, though rules > > and triggers can help enforce partitioning and keep the read-only data > > sets current. Slony can also be used in such a setup. While Slony > > replicates only entire tables, London and Paris can be placed in > > separate tables, and inheritance can be used to access from both tables > > using a single table name. > > Maybe another use of partitioning should also be mentioned. That is , > when partitioning is used to overcome limitations of single servers > (especially IO and memory, but also CPU), and only a subset of data is > stored and processed on each server. > > As an example of this type of partitioning you could mention Bizgres MPP > (a PG-based commercial product, http://www.greenplum.com ), which > partitions data to use I/O and CPU of several DB servers for processing > complex OLAP queries, and Pl_Proxy > ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP > loads. > > I think the "official" term for this kind of "replication" is > Shared-Nothing Clustering. > > -- > ---------------- > Hannu Krosing > Database Architect > Skype Technologies O? > Akadeemia tee 21 F, Tallinn, 12618, Estonia > > Skype me: callto:hkrosing > Get Skype for free: http://www.skype.com > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I don't think the PostgreSQL documentation should be mentioning commercial solutions. --------------------------------------------------------------------------- Luke Lonergan wrote: > Bruce, > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian > > Sent: Tuesday, October 24, 2006 5:16 PM > > To: Hannu Krosing > > Cc: PostgreSQL-documentation; PostgreSQL-development > > Subject: Re: [HACKERS] Replication documentation addition > > > > > > OK, I have updated the URL. Please let me know how you like it. > > There's a typo on line 8, first paragraph: > > "perhaps with only one server allowing write rwork together at the same > time." > > Also, consider this wording of the last description: > > "Single-Query Clustering..." > > Replaced by: > > "Shared Nothing Clustering > ----------------------- > > This allows multiple servers with separate disks to work together on a > each query. > In shared nothing clusters, the work of answering each query is > distributed among > the servers to increase the performance through parallelism. These > systems will > typically feature high availability by using other forms of replication > internally. > > While there are no open source options for this type of clustering, > there are several > commercial products available that implement this approach, making > PostgreSQL achieve > very high performance for multi-Terabyte business intelligence > databases." > > - Luke -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, > I have updated the text. Please let me know what else I should change. > I am unsure if I should be mentioning commercial PostgreSQL products in > our documentation. I think you should mention the postgresql-only ones, but just briefly with a link. Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Bruce, > >> I have updated the text. Please let me know what else I should change. >> I am unsure if I should be mentioning commercial PostgreSQL products in >> our documentation. > > I think you should mention the postgresql-only ones, but just briefly with a > link. Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator. And to further this I would expect that it would be a subsection.. e.g; a <sect2> or <sect3>. I think the open source version should absolutely get top billing though. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Joshua D. Drake wrote: > Josh Berkus wrote: > > Bruce, > > > >> I have updated the text. Please let me know what else I should change. > >> I am unsure if I should be mentioning commercial PostgreSQL products in > >> our documentation. > > > > I think you should mention the postgresql-only ones, but just briefly with a > > link. Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator. > > And to further this I would expect that it would be a subsection.. e.g; > a <sect2> or <sect3>. I think the open source version should absolutely > get top billing though. I am not inclined to add commercial offerings. If people wanted commercial database offerings, they can get them from companies that advertize. People are coming to PostgreSQL for open source solutions, and I think mentioning commercial ones doesn't make sense. If we are to add them, I need to hear that from people who haven't worked in PostgreSQL commerical replication companies. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Oct 24, 2006, at 8:48 PM, Bruce Momjian wrote: > Joshua D. Drake wrote: >> Josh Berkus wrote: >>> Bruce, >>> >>>> I have updated the text. Please let me know what else I should >>>> change. >>>> I am unsure if I should be mentioning commercial PostgreSQL >>>> products in >>>> our documentation. >>> >>> I think you should mention the postgresql-only ones, but just >>> briefly with a >>> link. Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator. >> >> And to further this I would expect that it would be a subsection.. >> e.g; >> a <sect2> or <sect3>. I think the open source version should >> absolutely >> get top billing though. > > I am not inclined to add commercial offerings. If people wanted > commercial database offerings, they can get them from companies that > advertize. People are coming to PostgreSQL for open source solutions, > and I think mentioning commercial ones doesn't make sense. > > If we are to add them, I need to hear that from people who haven't > worked in PostgreSQL commerical replication companies. I'm not coming to PostgreSQL for open source solutions. I'm coming to PostgreSQL for _good_ solutions. I want to see what solutions might be available for a problem I have. I certainly want to know whether they're freely available, commercial or some flavour of open source, but I'd like to know about all of them. A big part of the value of Postgresql is the applications and extensions that support it. Hiding the existence of some subset of those just because of the way they're licensed is both underselling postgresql and doing something of a disservice to the user of the document. Cheers, Steve
Steve Atkins wrote: > > If we are to add them, I need to hear that from people who haven't > > worked in PostgreSQL commerical replication companies. > > I'm not coming to PostgreSQL for open source solutions. I'm coming > to PostgreSQL for _good_ solutions. > > I want to see what solutions might be available for a problem I have. > I certainly want to know whether they're freely available, commercial > or some flavour of open source, but I'd like to know about all of them. > > A big part of the value of Postgresql is the applications and extensions > that support it. Hiding the existence of some subset of those just > because of the way they're licensed is both underselling postgresql > and doing something of a disservice to the user of the document. OK, does that mean we mention EnterpriseDB in the section about Oracle functions? Why not mention MS SQL if they have a better solution? I just don't see where that line can clearly be drawn on what to include. Do we mention Netiza, which is loosely based on PostgreSQL? It just seems very arbitrary to include commercial software. If someone wants to put in on a wiki, I think that would be fine because that doesn't seems as official. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Oct 24, 2006, at 9:20 PM, Bruce Momjian wrote: > Steve Atkins wrote: >>> If we are to add them, I need to hear that from people who haven't >>> worked in PostgreSQL commerical replication companies. >> >> I'm not coming to PostgreSQL for open source solutions. I'm coming >> to PostgreSQL for _good_ solutions. >> >> I want to see what solutions might be available for a problem I have. >> I certainly want to know whether they're freely available, commercial >> or some flavour of open source, but I'd like to know about all of >> them. >> >> A big part of the value of Postgresql is the applications and >> extensions >> that support it. Hiding the existence of some subset of those just >> because of the way they're licensed is both underselling postgresql >> and doing something of a disservice to the user of the document. > > OK, does that mean we mention EnterpriseDB in the section about Oracle > functions? Why not mention MS SQL if they have a better solution? I > just don't see where that line can clearly be drawn on what to > include. > Do we mention Netiza, which is loosely based on PostgreSQL? It just > seems very arbitrary to include commercial software. If someone wants > to put in on a wiki, I think that would be fine because that doesn't > seems as official. Good question. The line needs to be drawn somewhere. It's basically your judgement, tempered by other peoples feedback, though. If it were me, I'd ask myself "Would I mention this product if it were open source? Would mentioning it help people using the document?". Cheers, Steve
Hi, I also wrote Bruce about that. It happens that, if you 'freely advertise' commercial solutions (rather than they doing so by other vehicles) you will always happen to be an 'updater' to the docs if they change their product lines, if they change their business model, if and if. If you cite a commercial solution, as a fair game you should cite *all* of them. If one enterprise has the right to be listed in the documentation, all of them might, as you will never be favouring one of them. That's the main motivation to write this. Moreover, if there are also commercial solutions for high-end installs and they are cited as providers to those solutions, it (to a point) disencourages those of gathering themselves and writing open source extensions to PostgreSQL. As Bruce stated, then should the documentation contemplate EnterpriseDB's Oracle functions? Should PostgreSQL also come with it? Wouldn't it be painful to make, say, another description for an alternate product other than EnterpriseDB if it arises? If people (who read the documentation) professionally work with PostgreSQL, they may already have been briefed by those commercial offerings in some way. I think only the source and its tightly coupled (read: can compile along with, free as PostgreSQL) components should be packaged into the tarball. However, I find Bruce's unofficial wiki idea a good one for comparisons. Regards, Cesar Steve Atkins wrote: > > On Oct 24, 2006, at 9:20 PM, Bruce Momjian wrote: > >> Steve Atkins wrote: >>>> If we are to add them, I need to hear that from people who haven't >>>> worked in PostgreSQL commerical replication companies. >>> >>> I'm not coming to PostgreSQL for open source solutions. I'm coming >>> to PostgreSQL for _good_ solutions. >>> >>> I want to see what solutions might be available for a problem I have. >>> I certainly want to know whether they're freely available, commercial >>> or some flavour of open source, but I'd like to know about all of them. >>> >>> A big part of the value of Postgresql is the applications and >>> extensions >>> that support it. Hiding the existence of some subset of those just >>> because of the way they're licensed is both underselling postgresql >>> and doing something of a disservice to the user of the document. >> >> OK, does that mean we mention EnterpriseDB in the section about Oracle >> functions? Why not mention MS SQL if they have a better solution? I >> just don't see where that line can clearly be drawn on what to include. >> Do we mention Netiza, which is loosely based on PostgreSQL? It just >> seems very arbitrary to include commercial software. If someone wants >> to put in on a wiki, I think that would be fine because that doesn't >> seems as official. > > Good question. The line needs to be drawn somewhere. It's basically > your judgement, tempered by other peoples feedback, though. If it > were me, I'd ask myself "Would I mention this product if it were open > source? Would mentioning it help people using the document?". > > Cheers, > Steve > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Ühel kenal päeval, T, 2006-10-24 kell 22:57, kirjutas Bruce Momjian: > I don't think the PostgreSQL documentation should be mentioning > commercial solutions. IMNSHO, having commercial solutions based on postgresql which extend postgres in directions not (yet?) done by core postgres is nothing to be ashamed of. And we should at least mention the OSS version of Bizgres as a place where quite a lot of initial development is done on performance improvements considered too risky for mainline postgresql. And if you need a more technical reason, you can use free libpq and psql to connect to even Bizgres MPP ;) > --------------------------------------------------------------------------- > > Luke Lonergan wrote: > > Bruce, > > > > > -----Original Message----- > > > From: pgsql-hackers-owner@postgresql.org > > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian > > > Sent: Tuesday, October 24, 2006 5:16 PM > > > To: Hannu Krosing > > > Cc: PostgreSQL-documentation; PostgreSQL-development > > > Subject: Re: [HACKERS] Replication documentation addition > > > > > > > > > OK, I have updated the URL. Please let me know how you like it. > > > > There's a typo on line 8, first paragraph: > > > > "perhaps with only one server allowing write rwork together at the same > > time." > > > > Also, consider this wording of the last description: > > > > "Single-Query Clustering..." > > > > Replaced by: > > > > "Shared Nothing Clustering > > ----------------------- > > > > This allows multiple servers with separate disks to work together on a > > each query. > > In shared nothing clusters, the work of answering each query is > > distributed among > > the servers to increase the performance through parallelism. These > > systems will > > typically feature high availability by using other forms of replication > > internally. > > > > While there are no open source options for this type of clustering, > > there are several > > commercial products available that implement this approach, making > > PostgreSQL achieve > > very high performance for multi-Terabyte business intelligence > > databases." > > > > - Luke > -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hi, Bruce Momjian wrote: > I have updated the text. Please let me know what else I should change. > I am unsure if I should be mentioning commercial PostgreSQL products in > our documentation. I support your POV and vote for not including any pointers to commercial extensions in the official documentation. If at all, they should go to 'external-projects.sgml', where PostGIS, PgAdmin and other projects are mentioned. I can't really get excited about the exclusion of the term 'replication', because it's what most people are looking for. It's a well known term. Sorry if it sounded that way, but I've not meant to avoid that term. The newly created terms 'Query Broadcast Load Balancing' or even worse 'Multi-Master Load Balancing' are more confusing than helpful, because these terms do not exist. (See the googlefight in [1]) Can we name the chapter "Fail-over, Load-Balancing and Replication Options"? That would fit everything and contain the necessary buzz words. Also, I'm still missing Multi- vs Single-Master, which are also commonly used terms. IMHO, it does not make sense to speak of a synchronous replication for a 'Shared Disk Fail Over'. It's not replication, because there's no replica. The Data Partitioning paragraph should probably mention it's close relation with data partitioning across table spaces (and make the differences clear). What you call 'Query Broadcast Load Balancing' is also a multi-master replication, thus naming only the later 'Multi-Master Load Balancing' misleading. I'd propose to add a subsection 'Synchronous, Multi-Master Replication' and explain the different possibilities on how to do that: * Query-Based * with 2PC * Distributed SHMEM * (perhaps mention the optimized Postgres-R algorithm ;-) What you called 'Single-Query Clustering' is probably better known as 'Parallel Query Execution'. It can be combined with all types of replication (every combination of async / sync and Single- / Multi-Master). It's maybe load balancing, but it depends on some form of replication to distribute the data first. I liked Chris Browns documentation in [2] which was clearer regarding replication (which can be used to do fail-over, load-balancing, data-partitioning or parallel query execution). I'd like to keep all those things a little more separate to get them clear. Regards Markus [1]: Googlefight: "Multi-Master Load Balancing" vs "Multi-Master Replication": http://tinyurl.com/y3k76r [2]: Chris Browns proposal for a replication documentation: http://archives.postgresql.org/pgsql-patches/2006-08/msg00026.php
> I don't think the PostgreSQL documentation should be > mentioning commercial solutions. I think maybe the PostgreSQL documentation should be careful about trying to list a "complete list" of commercial *or* free solutions. Instead linking to something on the main website or on techdocs that can more easily be updated. //Magnus
Bruce Momjian wrote: > OK, does that mean we mention EnterpriseDB in the section about Oracle > functions? Why not mention MS SQL if they have a better solution? I > just don't see where that line can clearly be drawn on what to include. > Do we mention Netiza, which is loosely based on PostgreSQL? It just > seems very arbitrary to include commercial software. If someone wants > to put in on a wiki, I think that would be fine because that doesn't > seems as official. I agree that the commercial offerings shouldn't be named directly in the docs, but it should be mentioned that some commercial options are available and a starting point to find more information. If potential new users look through the docs and it says no options available for what they want or consider they will need in the future then they go elsewhere, if they know that some options are available then they will look further if they want that feature. something like "There are currently no open source solutions available for this option but there are some commercial offerings. More details of some available solutions can be found at postgresql.org/support/...." -- Shane Ambler pgSQL@007Marketing.com Get Sheeky @ http://Sheeky.Biz
On Wed, Oct 25, 2006 at 08:22:25PM +0930, Shane Ambler wrote: > Bruce Momjian wrote: > > >OK, does that mean we mention EnterpriseDB in the section about Oracle > >functions? Why not mention MS SQL if they have a better solution? I > >just don't see where that line can clearly be drawn on what to include. > >Do we mention Netiza, which is loosely based on PostgreSQL? It just > >seems very arbitrary to include commercial software. If someone wants > >to put in on a wiki, I think that would be fine because that doesn't > >seems as official. > > I agree that the commercial offerings shouldn't be named directly in the > docs, but it should be mentioned that some commercial options are > available and a starting point to find more information. > > If potential new users look through the docs and it says no options > available for what they want or consider they will need in the future > then they go elsewhere, if they know that some options are available > then they will look further if they want that feature. > > something like > "There are currently no open source solutions available for this option > but there are some commercial offerings. More details of some available > solutions can be found at postgresql.org/support/...." I think this is probably the best compromise. Keep in mind that many people who are looking at us will also be looking at MySQL, which is itself a commercial offering. It's good to let folks know that with PostgreSQL, they have more control over how much money they spend for commercial add-ons and support. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > I can't really get excited about the exclusion of the term > 'replication', because it's what most people are looking for. It's a > well known term. Sorry if it sounded that way, but I've not meant to > avoid that term. <snip> > IMHO, it does not make sense to speak of a synchronous replication for a > 'Shared Disk Fail Over'. It's not replication, because there's no replica. Those to statements are at odds with each other, at least based on everyone I've ever talked to in a commercial setting. People will use terms like 'replication', 'HA' or 'clustering' fairly interchangably. Usually what these folks want is some kind of high-availability solution. A few are more concerned with scalability. Sometimes it's a combination of both. That's why I think it's good for the chapter to deal with both aspects of this. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> > I am not inclined to add commercial offerings. If people wanted > commercial database offerings, they can get them from companies that > advertize. People are coming to PostgreSQL for open source solutions, > and I think mentioning commercial ones doesn't make sense. > > If we are to add them, I need to hear that from people who haven't > worked in PostgreSQL commerical replication companies. > You did, Josh Berkus. Secondly, as many people have stated in the past not one replication suits everyone's needs and as PostgreSQL has many replication solutions, it only makes sense to list the more prominent ones, commercial or not. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
>> A big part of the value of Postgresql is the applications and extensions >> that support it. Hiding the existence of some subset of those just >> because of the way they're licensed is both underselling postgresql >> and doing something of a disservice to the user of the document. > > OK, does that mean we mention EnterpriseDB in the section about Oracle > functions? Way to compare apples to houses their Bruce. We are talking about *PostgreSQL* replication solutions. Not *Oracle* compatibility functions, However, *if* we had an Oracle compatibility section, I would say, "Yes it does make sense to list EnterpriseDB as a Proprietary Commercial solution to migrating from Oracle. > Why not mention MS SQL if they have a better solution? Because we aren't talking about MS SQL, we are talking about PostgreSQL. > I > just don't see where that line can clearly be drawn on what to include. > Do we mention Netiza, which is loosely based on PostgreSQL? It just > seems very arbitrary to include commercial software. It is no more arbitrary than including *any* information on PostgreSQL replication solutions, because PostgreSQL doesn't have any. PostgreSQL doesn't do replication, except for PITR (and that is pushing it as a replication solution). Now.. there are *projects* that enable PostgreSQL to do replication. Some of them are Open Source, some of them are commercial products. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Hi, Cesar, Cesar Suga wrote: > If people (who read the documentation) professionally work with > PostgreSQL, they may already have been briefed by those commercial > offerings in some way. > > I think only the source and its tightly coupled (read: can compile along > with, free as PostgreSQL) components should be packaged into the tarball. > > However, I find Bruce's unofficial wiki idea a good one for comparisons. My suggestion is that the docs should mention only the pure existence of important third-party packages and projects in those places where it talks about the deficits that are supposedly fixed by those. E. G. "There are some third-party packages and projects that aim to provide multi-master replication, you can search for more information at http://[unofficial wiki page url] or your favourite search engine. This way, the docs stay neutral, but point the user to possible solutions of his problem. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Cesar Suga wrote: > Hi, > > I also wrote Bruce about that. > > It happens that, if you 'freely advertise' commercial solutions (rather > than they doing so by other vehicles) you will always happen to be an > 'updater' to the docs if they change their product lines, if they change > their business model, if and if. That is no different than the open source offerings. We have had several open source offerings that have died over the years. Replicator, for example has always been Replicator and has been around longer than any of the current replication solutions. > > If you cite a commercial solution, as a fair game you should cite *all* > of them. No. That doesn't make any sense either. I assume we aren't going to list all PostgreSQL OSS replication solutions (there are at least a dozen or more). You list the ones that are stable in their existence (commercial or not). > If one enterprise has the right to be listed in the > documentation, all of them might, as you will never be favouring one of > them. You are looking at this the wrong way. This isn't about *any* enterprise. It is about a PostgreSQL Solution. There happens to be two or three known working open source solutions, and two or three known working commercial solutions. > > That's the main motivation to write this. Moreover, if there are also > commercial solutions for high-end installs and they are cited as > providers to those solutions, it (to a point) disencourages those of > gathering themselves and writing open source extensions to PostgreSQL. > No it doesn't. Because there is always the, "It want's to be free!" crowd. > If people (who read the documentation) professionally work with > PostgreSQL, they may already have been briefed by those commercial > offerings in some way. Maybe, maybe not. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
I would think that companies that sell closed-source solutions for PostgreSQL would be modest enough not to push their own agenda for the documentation. I think they should just sit back and hope others suggest it. [ Josh Berkus recently left Green Plum for Sun. ] --------------------------------------------------------------------------- Joshua D. Drake wrote: > > >> A big part of the value of Postgresql is the applications and extensions > >> that support it. Hiding the existence of some subset of those just > >> because of the way they're licensed is both underselling postgresql > >> and doing something of a disservice to the user of the document. > > > > OK, does that mean we mention EnterpriseDB in the section about Oracle > > functions? > > Way to compare apples to houses their Bruce. We are talking about > *PostgreSQL* replication solutions. Not *Oracle* compatibility > functions, However, *if* we had an Oracle compatibility section, I would > say, "Yes it does make sense to list EnterpriseDB as a Proprietary > Commercial solution to migrating from Oracle. > > > Why not mention MS SQL if they have a better solution? > > Because we aren't talking about MS SQL, we are talking about PostgreSQL. > > > I > > just don't see where that line can clearly be drawn on what to include. > > Do we mention Netiza, which is loosely based on PostgreSQL? It just > > seems very arbitrary to include commercial software. > > It is no more arbitrary than including *any* information on PostgreSQL > replication solutions, because PostgreSQL doesn't have any. > > PostgreSQL doesn't do replication, except for PITR (and that is pushing > it as a replication solution). > > Now.. there are *projects* that enable PostgreSQL to do replication. > Some of them are Open Source, some of them are commercial products. > > Sincerely, > > Joshua D. Drake > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I have added this text: Commercial Solutions -------------------- Because PostgreSQL is open source and easily extended, a number of companies have taken PostgreSQL and created commercial closed-source solutions with unique failover, replication, and load balancing capabilities. --------------------------------------------------------------------------- Hannu Krosing wrote: > ?hel kenal p?eval, T, 2006-10-24 kell 22:57, kirjutas Bruce Momjian: > > I don't think the PostgreSQL documentation should be mentioning > > commercial solutions. > > IMNSHO, having commercial solutions based on postgresql which extend > postgres in directions not (yet?) done by core postgres is nothing to be > ashamed of. > > And we should at least mention the OSS version of Bizgres as a place > where quite a lot of initial development is done on performance > improvements considered too risky for mainline postgresql. > > And if you need a more technical reason, you can use free libpq and psql > to connect to even Bizgres MPP ;) > > > > --------------------------------------------------------------------------- > > > > Luke Lonergan wrote: > > > Bruce, > > > > > > > -----Original Message----- > > > > From: pgsql-hackers-owner@postgresql.org > > > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian > > > > Sent: Tuesday, October 24, 2006 5:16 PM > > > > To: Hannu Krosing > > > > Cc: PostgreSQL-documentation; PostgreSQL-development > > > > Subject: Re: [HACKERS] Replication documentation addition > > > > > > > > > > > > OK, I have updated the URL. Please let me know how you like it. > > > > > > There's a typo on line 8, first paragraph: > > > > > > "perhaps with only one server allowing write rwork together at the same > > > time." > > > > > > Also, consider this wording of the last description: > > > > > > "Single-Query Clustering..." > > > > > > Replaced by: > > > > > > "Shared Nothing Clustering > > > ----------------------- > > > > > > This allows multiple servers with separate disks to work together on a > > > each query. > > > In shared nothing clusters, the work of answering each query is > > > distributed among > > > the servers to increase the performance through parallelism. These > > > systems will > > > typically feature high availability by using other forms of replication > > > internally. > > > > > > While there are no open source options for this type of clustering, > > > there are several > > > commercial products available that implement this approach, making > > > PostgreSQL achieve > > > very high performance for multi-Terabyte business intelligence > > > databases." > > > > > > - Luke > > > -- > ---------------- > Hannu Krosing > Database Architect > Skype Technologies O? > Akadeemia tee 21 F, Tallinn, 12618, Estonia > > Skype me: callto:hkrosing > Get Skype for free: http://www.skype.com -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > I would think that companies that sell closed-source solutions for > PostgreSQL would be modest enough not to push their own agenda for the > documentation. I think they should just sit back and hope others > suggest it. > > [ Josh Berkus recently left Green Plum for Sun. ] Bruce, you are making an idiot of yourself. With this statement you have implied that Josh Berkus, are core member somehow has his own agenda that is not in the interests of the PostgreSQL community. Further that, you are suggesting that I as a member of Command Prompt has an agenda that is not in the interests of the PostgreSQL community. It was rude, uncalled for, inaccurate, and frankly disgusting. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> > I also wrote Bruce about that. > > > > It happens that, if you 'freely advertise' commercial solutions > > (rather than they doing so by other vehicles) you will > always happen > > to be an 'updater' to the docs if they change their product > lines, if > > they change their business model, if and if. > > That is no different than the open source offerings. We have > had several open source offerings that have died over the > years. Replicator, for example has always been Replicator and > has been around longer than any of the current replication solutions. I think this is a good reason not to list *any* of the products by name in the documentation, but instead refer to a page on say techdocs that can be more easily updated. And that can contain both free and non-free projects, under clear headlines showing the difference. The documentation is about PostgreSQL, not about third-party products, be they free or commercial. Our *website*, however, should give guidance on which specific products we (as a community) know are stable and usable along with PostgreSQL (as we do today under downloads, but could very well do based on specific uses like replication as well) //Magnus
>>> they change their business model, if and if. >> That is no different than the open source offerings. We have >> had several open source offerings that have died over the >> years. Replicator, for example has always been Replicator and >> has been around longer than any of the current replication solutions. > > I think this is a good reason not to list *any* of the products by name > in the documentation, but instead refer to a page on say techdocs that > can be more easily updated. And that can contain both free and non-free > projects, under clear headlines showing the difference. > > The documentation is about PostgreSQL, not about third-party products, > be they free or commercial. Our *website*, however, should give guidance > on which specific products we (as a community) know are stable and > usable along with PostgreSQL (as we do today under downloads, but could > very well do based on specific uses like replication as well) > I can agree with this :) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
"Magnus Hagander" <mha@sollentuna.net> writes: > I think this is a good reason not to list *any* of the products by name > in the documentation, but instead refer to a page on say techdocs that > can be more easily updated. I agree with that. If we have statements about other projects in our docs, we will have a problem with not being able to update those statements in a timely fashion when the other projects change. regards, tom lane
Tom Lane wrote: > "Magnus Hagander" <mha@sollentuna.net> writes: >> I think this is a good reason not to list *any* of the products by name >> in the documentation, but instead refer to a page on say techdocs that >> can be more easily updated. > > I agree with that. If we have statements about other projects in our > docs, we will have a problem with not being able to update those > statements in a timely fashion when the other projects change. This being said, I would say that the replication documentation needs to be on Techdocs or some place similar and that we should have a link in the PostgreSQL docs that points to the techdocs article and possibly: http://www.postgresql.org/download/ . Sincerely, Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Markus Schiltknecht wrote: > Hi, > > Bruce Momjian wrote: > > I have updated the text. Please let me know what else I should change. > > I am unsure if I should be mentioning commercial PostgreSQL products in > > our documentation. > > I support your POV and vote for not including any pointers to commercial > extensions in the official documentation. If at all, they should go to > 'external-projects.sgml', where PostGIS, PgAdmin and other projects are > mentioned. > > I can't really get excited about the exclusion of the term > 'replication', because it's what most people are looking for. It's a > well known term. Sorry if it sounded that way, but I've not meant to > avoid that term. OK, I have re-added the term "replication" as appropriate. > The newly created terms 'Query Broadcast Load Balancing' or even worse > 'Multi-Master Load Balancing' are more confusing than helpful, because > these terms do not exist. (See the googlefight in [1]) OK, renamed. > Can we name the chapter "Fail-over, Load-Balancing and Replication > Options"? That would fit everything and contain the necessary buzz words. Yes. Done, "cluster" added too. > Also, I'm still missing Multi- vs Single-Master, which are also commonly > used terms. Yea, not sure how to get those in because it somewhat confuses the "purpose" of the solution. > IMHO, it does not make sense to speak of a synchronous replication for a > 'Shared Disk Fail Over'. It's not replication, because there's no replica. Agreed. Modified. > The Data Partitioning paragraph should probably mention it's close > relation with data partitioning across table spaces (and make the > differences clear). Uh, so you I/O load with table spaces. Uh, that seems too far a reach to mention here. > What you call 'Query Broadcast Load Balancing' is also a multi-master > replication, thus naming only the later 'Multi-Master Load Balancing' > misleading. Renamed. > I'd propose to add a subsection 'Synchronous, Multi-Master Replication' > and explain the different possibilities on how to do that: > > * Query-Based > * with 2PC > * Distributed SHMEM > * (perhaps mention the optimized Postgres-R algorithm ;-) > > What you called 'Single-Query Clustering' is probably better known as > 'Parallel Query Execution'. It can be combined with all types of > replication (every combination of async / sync and Single- / > Multi-Master). It's maybe load balancing, but it depends on some form of > replication to distribute the data first. Good term. Added. > I liked Chris Browns documentation in [2] which was clearer regarding > replication (which can be used to do fail-over, load-balancing, > data-partitioning or parallel query execution). I'd like to keep all > those things a little more separate to get them clear. Please let me know how you like the new version at the ftp URL. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Jim C. Nasby wrote: > On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > > I can't really get excited about the exclusion of the term > > 'replication', because it's what most people are looking for. It's a > > well known term. Sorry if it sounded that way, but I've not meant to > > avoid that term. > <snip> > > IMHO, it does not make sense to speak of a synchronous replication for a > > 'Shared Disk Fail Over'. It's not replication, because there's no replica. > > Those to statements are at odds with each other, at least based on > everyone I've ever talked to in a commercial setting. People will use > terms like 'replication', 'HA' or 'clustering' fairly interchangably. > Usually what these folks want is some kind of high-availability > solution. A few are more concerned with scalability. Sometimes it's a > combination of both. That's why I think it's good for the chapter to > deal with both aspects of this. OK, I did break it out somewhat for clarity. Let me know how it looks now. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, Jim C. Nasby wrote: > Those to statements are at odds with each other, at least based on > everyone I've ever talked to in a commercial setting. People will use > terms like 'replication', 'HA' or 'clustering' fairly interchangably. > Usually what these folks want is some kind of high-availability > solution. A few are more concerned with scalability. Sometimes it's a > combination of both. That's why I think it's good for the chapter to > deal with both aspects of this. Yabut... at least the PostgreSQL manual should uses the terms correctly. And while I do perfectly agree that it's a fail-over solution and it should be mentioned in that section, I'm arguing that it's not replication. Regards Markus
Tom Lane wrote: > "Magnus Hagander" <mha@sollentuna.net> writes: > > I think this is a good reason not to list *any* of the products by name > > in the documentation, but instead refer to a page on say techdocs that > > can be more easily updated. > > I agree with that. If we have statements about other projects in our > docs, we will have a problem with not being able to update those > statements in a timely fashion when the other projects change. I mention only Slony and pgpool as examples of replication types. They seem to have risen to high enough visiblity to do that. I have not mentioned any other solutions. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Tom Lane wrote: >> "Magnus Hagander" <mha@sollentuna.net> writes: >>> I think this is a good reason not to list *any* of the products by name >>> in the documentation, but instead refer to a page on say techdocs that >>> can be more easily updated. >> I agree with that. If we have statements about other projects in our >> docs, we will have a problem with not being able to update those >> statements in a timely fashion when the other projects change. > > I mention only Slony and pgpool as examples of replication types. They > seem to have risen to high enough visiblity to do that. I have not > mentioned any other solutions. What about Slony-II or pgpool2? Which are fundamentally different from their v1 counterparts (o.k. slony-ii isn't out yet but still). I +1 that we move to have all of the replication documentation pushed to techdocs or other facility and just have a link from the docs. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Joshua D. Drake wrote: > Bruce Momjian wrote: > > Tom Lane wrote: > >> "Magnus Hagander" <mha@sollentuna.net> writes: > >>> I think this is a good reason not to list *any* of the products by name > >>> in the documentation, but instead refer to a page on say techdocs that > >>> can be more easily updated. > >> I agree with that. If we have statements about other projects in our > >> docs, we will have a problem with not being able to update those > >> statements in a timely fashion when the other projects change. > > > > I mention only Slony and pgpool as examples of replication types. They > > seem to have risen to high enough visiblity to do that. I have not > > mentioned any other solutions. > > What about Slony-II or pgpool2? Which are fundamentally different from > their v1 counterparts (o.k. slony-ii isn't out yet but still). > > I +1 that we move to have all of the replication documentation pushed to > techdocs or other facility and just have a link from the docs. What I did was to mention Slony and pgpool as "examples", so people realize there are many other soluions. It would be good to have a companion web site that could list them all, both open source and commercial. That is going to take a lot more work, but I think would have great value, especially since our documentation will clearly outline the terms. What you don't want to do is to throw up a list and have people try to figure out what solutions they cover. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi Hannu, everyone, I apologize for not having read the document in question - will do shortly. My comments are brought about by the dialogue I read on list this morning... > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > Data Partitioning > > ----------------- > > > > Data partitioning splits the database into data sets. To achieve > > replication, each data set can only be modified by one server. For > > example, data can be partitioned by offices, e.g. London and Paris. > > While London and Paris servers have all data records, only London can > > modify London records, and Paris can only modify Paris records. Such > > partitioning is usually accomplished in application code, though rules > > and triggers can help enforce partitioning and keep the read-only data > > sets current. Slony can also be used in such a setup. While Slony > > replicates only entire tables, London and Paris can be placed in > > separate tables, and inheritance can be used to access from both tables > > using a single table name. > > Maybe another use of partitioning should also be mentioned. That is , > when partitioning is used to overcome limitations of single servers > (especially IO and memory, but also CPU), and only a subset of data is > stored and processed on each server. > > I think the "official" term for this kind of "replication" is > > Shared-Nothing Clustering. "Data partitioning" has two fundamental flavors, "horizontal" and "vertical", quite a handful of implementations, and even more motivations behind why one uses either strategy and whatever implementation. The same is true for "clustering" - a few fundamental strategies, with a larger number of implementations and yet more motivations. Replication, meanwhile, is yet another beast altogether, sharing the same fundamentals of multiple flavors, implementations and motivations. I strongly urge keeping any documentation on these (and related) topics strictly distinct and separate. In my view, one should define the terms first, separately, distinctly, and as succinctly as possible, and, following this, a dialogue on how these may be combined can be entertained. The definitions of each should be both complete and academic in flavor and may include implementation and motivational information, but never "muddy the water" by mixing with other concepts - not yet, not until after all the fundamentals have been introduced. I don't know much about what PostgreSql has been doing in these areas of late - nothing, I gather from someone's post this morning - but I'll try to help out as I can with a paragraph or two - whatever you want, whatever's welcome - as "I was there" when Randy Eash created the first commercial RDBMS replicator - for Ingres - and since I created the first commercial RDBMS front-end failover technology, also for Ingres, so I have a pretty good handle on all the issues. Also, I liked what Markus Schiltknecht wrote, but will have to read the original before I can comment on his specific points. >> I am not inclined to add commercial offerings. If people wanted >> commercial database offerings, they can get them from companies that >> advertize. People are coming to PostgreSQL for open source solutions, >> and I think mentioning commercial ones doesn't make sense. >> >> If we are to add them, I need to hear that from people who haven't >> worked in PostgreSQL commerical replication companies. > > I'm not coming to PostgreSQL for open source solutions. I'm coming > to PostgreSQL for _good_ solutions. > > I want to see what solutions might be available for a problem I have. > I certainly want to know whether they're freely available, commercial > or some flavour of open source, but I'd like to know about all of them. > > A big part of the value of Postgresql is the applications and extensions > that support it. Hiding the existence of some subset of those just > because of the way they're licensed is both underselling postgresql > and doing something of a disservice to the user of the document. > If potential new users look through the docs and it says no options > available for what they want or consider they will need in the future > then they go elsewhere, if they know that some options are available > then they will look further if they want that feature. I agree that people look through the materials on the web site, documentation especially, and make choices based upon what they see. Many of us don't have time to spend a day searching the web for things we don't even know exist. By including more information, more users will be attracted to PostgreSql, whether it be in the documentation or web site. I have been SURE that certain things must exist in the PG world, but haven't known about them with certainty due to time constraints, but would gladly point our customers at Postgres solutions if only I knew about them. Count this paragraph as praise for doing _something_more_ to help get more information to (prospective) users. Consider someone like me; my company supports five RDBMSes, one of them being Postgres. We are probably not unique in that we've written an SQL dialect translator so we could write our own code in one code line to run anywhere, against any RDBMS (it can learn new dialects) - or perhaps others keep multiple code lines containing varriant dialects. Either way, we "don't care" whether our customer has Oracle, or PostgreSql, so long as they buy our stuff. But when our customers - or prospects - come to us with a given scenario, the more we know about Postgres - and its community - the more likely we can steer them to a PG solution, which we would prefer anyway, for lots of reasons, historical, personal, and technical - not to mention cost. The trouble is, Oracle, for example, has already told them (sold them?) on whatever, and we need a rebuttal ready at hand or they'll go with Oracle. We just don't have the time to fight that battle, nor do we wish to risk the sale when we can work with Oracle just fine. In sum, I agree with Tom Lane and the others who chimed in with "keep the docs clean, use the web site for mentioning other projects/products." And again I applaud this new effort. Regards, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > Can we name the chapter "Fail-over, Load-Balancing and Replication > Options"? That would fit everything and contain the necessary buzz words. ... > IMHO, it does not make sense to speak of a synchronous replication for a > 'Shared Disk Fail Over'. It's not replication, because there's no replica. As you point out, there is no replica of the data, but there is some protection against machine failure, which puts it firmly in the "Fail-over" part above. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
David Fetter wrote: > On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > > > Can we name the chapter "Fail-over, Load-Balancing and Replication > > Options"? That would fit everything and contain the necessary buzz words. > ... > > > IMHO, it does not make sense to speak of a synchronous replication for a > > 'Shared Disk Fail Over'. It's not replication, because there's no replica. > > As you point out, there is no replica of the data, but there is some > protection against machine failure, which puts it firmly in the > "Fail-over" part above. Right, but his point was not to call it synchronous. I have fixed that in the current version. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Totally agree. The docs will tend to outlive whatever projects or websites they mention. Best to not bake that into stone. -Casey On Oct 25, 2006, at 3:36 AM, Magnus Hagander wrote: >> I don't think the PostgreSQL documentation should be >> mentioning commercial solutions. > > I think maybe the PostgreSQL documentation should be careful about > trying to list a "complete list" of commercial *or* free solutions. > Instead linking to something on the main website or on techdocs > that can > more easily be updated. > > //Magnus > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
> Here is a new replication documentation section I want to add for 8.2: > > ftp://momjian.us/pub/postgresql/mypatches/replication > ...Read the document, as promissed... First paragraph, "(fail over)" is inconsistent with title, "failover", as are other spots throughout the document. The whole document should be consistent and I vote for "failover" and not "fail over." Fourth paragraph, "This "sync problem" is the fundamental difficulty for servers working together"; "Sync problem" hasn't been defined. Actually, you're talking about the consistent attribute of the "acid" properties of all competent databases: Atomic, Consistency, Isolation, and Durability. At least define the term you are using - probably most easily done in the preceeding paragraph. The fifth paragraph needs a lot more help, I think. Howabout this alternative: So called "two phaised commit" was developed as a strategy in which two or more databases are updated simultaneously and none of the data is committed until all are committed. This guarantees consistency between the databases with all propagation delay being absorbed by the writer at write time. There are times when this propagation delay is large, so sometimes alternatives are worked out which we'll call here "asynchronous updates," however, in these cases, there is always a window of time in which some transaction can be lost should a failure occurr. For this reason, asynchronous updates are only used when the possibility of such losses is acceptible. Paragraphs six through to "shared disk failover" seem very awkward to me. I don't like them at all. "Shared disk failover" has nothing to do with "the sync problem" as it's not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue. Further, it also has nothing to do with disk arrays, though it is often used with RAID to help avoid disk based corruption problems. The point about Warm Standby needs to include a warning about WAL that it MUST be sensitive to the semantics of the database design or else it's fatally flawed. I'm talking about "referential integrety". That is to say, it's inappropriate to capture updates on a table by table basis, as some such systems do, (I have no idea what's done by anyone in the PG world on this right now) because an update to one table (esp. inserts) very often go hand in glove with updates in other tables and to get one without the other can corrupt a database. The description of "Continuously running replication server" should include the critical caveat - repeated if you think it's already said elsewhere - that it is ONLY suitable for applications in which a loss of (missing) update data doesn't matter. For example, an airline reservation system would be an inappropriate application for such a "solution" because what seats are available cannot be guaranteed to be correct. Regarding data partitioning, I strongly disagree with the opening sentence in that it doesn't split a database into sets, it splits tables into sets. Data partitioning is often done within a single database on a single server and therefore, as a concept, has nothing whatsoever to do with different servers. Similarly, the second paragraph of this section is problematic. Please define your term first, then talk about some implementations - this is muddying the water. Further, there are both vertical and horizontal partitioning - you mention neither - and each has its own distinct uses. If partitioning is mentioned, it should be more complete. Next, Query Broadcast Load Balancing... also needs a lot of work. First, it's foremost in my memory that sending read queries everywhere and returning the first result set back is a key way to improve application performance at the cost of additional load on other systems - I guess that's not at all what the document is after here, but it's a worthy part of a dialogue on broadcasting queries. In other words, this has more parts to it than just what the document now entertains. Secondly, the document doesn't address _at_all_ whether this is a two-phaise-commit environment or not. If not, how are updates managed? If each server operates independently and one of them fails, what do you do then? How do you know _any_ server got an insert/update? ... Each server _can't_ operate independently unless the application does its own insert/update commits to every one of them - and that can't be fast, nor does it load balance, though it may contribute to superior uptime performance by the application. Next up; I'm not aware of any current products or projects that provide parallel query execution, though Informix might - I can ask a colleague or two. Either way, it's probably best to simply define the term (perhaps in a little more detail), and not mention solutions - they change with time anyway. While I've never used Oracle's clustering tools, I've read up on them and have customers who use them, and I think this description of Oracle clustering is a mis-read on what the Oracle system actually does. A check with a true Oracle clustering expert is in order here. Hope this helps. If asked, I'm willing to (re)write some of the bits discussed above. Regards, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
Alexey Klyukin wrote: > Hi, > > A typo: > ("a write to any server has to be _propogated_") > s/propogated/propagated Thanks, fixed. --------------------------------------------------------------------------- > > Bruce Momjian wrote: > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > > Comments welcomed. > > > > > -- > Regards, > > Alexey Klyukin alexk(at)vollmond.org.ua > Simferopol, Crimea, Ukraine. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, > > > ftp://momjian.us/pub/postgresql/mypatches/replication I'm still not seeing anything in this patch that tells users where they can get replication solutions for PostgreSQL, either OSS or commercial. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Richard Troy wrote: > > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > > ...Read the document, as promissed... > > First paragraph, "(fail over)" is inconsistent with title, "failover", as > are other spots throughout the document. The whole document should be > consistent and I vote for "failover" and not "fail over." OK. Fixed to "failover" > Fourth paragraph, "This "sync problem" is the fundamental difficulty for > servers working together"; "Sync problem" hasn't been defined. Actually, > you're talking about the consistent attribute of the "acid" properties of > all competent databases: Atomic, Consistency, Isolation, and Durability. > At least define the term you are using - probably most easily done in the > preceeding paragraph. OK, "sync problem" term removed, and spelled out fully. > The fifth paragraph needs a lot more help, I think. Howabout this > alternative: > > So called "two phaised commit" was developed as a strategy in which two or > more databases are updated simultaneously and none of the data is > committed until all are committed. This guarantees consistency between the > databases with all propagation delay being absorbed by the writer at write > time. There are times when this propagation delay is large, so sometimes > alternatives are worked out which we'll call here "asynchronous updates," > however, in these cases, there is always a window of time in which some > transaction can be lost should a failure occurr. For this reason, > asynchronous updates are only used when the possibility of such losses is > acceptible. I have modified the paragraph to use some of your terms. > Paragraphs six through to "shared disk failover" seem very awkward to me. > I don't like them at all. > > "Shared disk failover" has nothing to do with "the sync problem" as it's > not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue. > Further, it also has nothing to do with disk arrays, though it is often > used with RAID to help avoid disk based corruption problems. Yes, please see updated version. I removed the sync problem term from there. > The point about Warm Standby needs to include a warning about WAL that it > MUST be sensitive to the semantics of the database design or else it's > fatally flawed. I'm talking about "referential integrety". That is to say, > it's inappropriate to capture updates on a table by table basis, as some > such systems do, (I have no idea what's done by anyone in the PG world on > this right now) because an update to one table (esp. inserts) very often > go hand in glove with updates in other tables and to get one without the > other can corrupt a database. We don't have that problem. We recover only full transactions. > The description of "Continuously running replication server" should > include the critical caveat - repeated if you think it's already said > elsewhere - that it is ONLY suitable for applications in which a loss of > (missing) update data doesn't matter. For example, an airline reservation > system would be an inappropriate application for such a "solution" because > what seats are available cannot be guaranteed to be correct. I have added note about data loss for the Slony item. > Regarding data partitioning, I strongly disagree with the opening sentence > in that it doesn't split a database into sets, it splits tables into sets. OK, changed. > Data partitioning is often done within a single database on a single > server and therefore, as a concept, has nothing whatsoever to do with > different servers. Similarly, the second paragraph of this section is Uh, why would someone split things up like that on a single server? > problematic. Please define your term first, then talk about some > implementations - this is muddying the water. Further, there are both > vertical and horizontal partitioning - you mention neither - and each has > its own distinct uses. If partitioning is mentioned, it should be more > complete. Uh, what exactly needs to be defined. > Next, Query Broadcast Load Balancing... also needs a lot of work. First, > it's foremost in my memory that sending read queries everywhere and > returning the first result set back is a key way to improve application > performance at the cost of additional load on other systems - I guess > that's not at all what the document is after here, but it's a worthy part > of a dialogue on broadcasting queries. In other words, this has more parts > to it than just what the document now entertains. Secondly, the document Uh, do we want to go into that here? I guess I could. > doesn't address _at_all_ whether this is a two-phaise-commit environment > or not. If not, how are updates managed? If each server operates > independently and one of them fails, what do you do then? How do you know > _any_ server got an insert/update? ... Each server _can't_ operate > independently unless the application does its own insert/update commits to > every one of them - and that can't be fast, nor does it load balance, > though it may contribute to superior uptime performance by the > application. I think having the application middle layer do the commits is how it works now. Can someone explain how pgpool works, or should we mention how two-phase commit has to be done here? pgpool2 has additional features. > Next up; I'm not aware of any current products or projects that provide > parallel query execution, though Informix might - I can ask a colleague or > two. Either way, it's probably best to simply define the term (perhaps in > a little more detail), and not mention solutions - they change with time > anyway. Actually, Bizgres MPP, based on PostgreSQL, does this, but mostly for read-only queries. > While I've never used Oracle's clustering tools, I've read up on them and > have customers who use them, and I think this description of Oracle > clustering is a mis-read on what the Oracle system actually does. A check > with a true Oracle clustering expert is in order here. OK, would someone please comment? > Hope this helps. If asked, I'm willing to (re)write some of the bits > discussed above. Yes, please review the URL and let me know what else to change. Thanks. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Josh Berkus wrote: > Bruce, > > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > I'm still not seeing anything in this patch that tells users where they can > get replication solutions for PostgreSQL, either OSS or commercial. It isn't designed for that. It is designed for people to understand what they want, and then they can look around for solutions. I think most agree we don't want a list of solutions in the documentation, though I have a few as examples. Also, some of the solutions don't require software, but just configuration or special hardware. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 10/25/06, Bruce Momjian <bruce@momjian.us> wrote: > Joshua D. Drake wrote: > > Bruce Momjian wrote: > > > Tom Lane wrote: > > >> "Magnus Hagander" <mha@sollentuna.net> writes: > > >>> I think this is a good reason not to list *any* of the products by name > > >>> in the documentation, but instead refer to a page on say techdocs that > > >>> can be more easily updated. > > >> I agree with that. If we have statements about other projects in our > > >> docs, we will have a problem with not being able to update those > > >> statements in a timely fashion when the other projects change. > > > > > > I mention only Slony and pgpool as examples of replication types. They > > > seem to have risen to high enough visiblity to do that. I have not > > > mentioned any other solutions. > > > > What about Slony-II or pgpool2? Which are fundamentally different from > > their v1 counterparts (o.k. slony-ii isn't out yet but still). > > > > I +1 that we move to have all of the replication documentation pushed to > > techdocs or other facility and just have a link from the docs. > > What I did was to mention Slony and pgpool as "examples", so people > realize there are many other soluions. It would be good to have a > companion web site that could list them all, both open source and > commercial. That is going to take a lot more work, but I think would > have great value, especially since our documentation will clearly > outline the terms. What you don't want to do is to throw up a list and > have people try to figure out what solutions they cover. I'm in quite an unique situation right now, working with a few DBAs who have deep knowledge but no PostgreSQL background, so I have a good view how PostgreSQL is perceived by people with fair knowledge of other databases. What I have noticed is a deep respect for community. If they ask about replication solution, and I tell about Slony, they ask if Slony is provided with the postgresql-contrib. Well... no, and it won't be. Then they look back, think a while and say somethig on the lines of: well, $SOME_OTHER _DATABASE was using external replication solutions so it is all right. But then, before I talked with them, they did some quick research on PostgreSQL and their perception was that there's no replication / replication is shady in PostgreSQL. It would be quite convenient to tell them: "No replication? Did you actually read the manual? <here goes URL>" Well, pointing them to slony page is a solution but of a lesser caliber (how should they know about Slony anyway? They are newbies). Pointing them at The Documentation is a Good Argument (and it may cause them to look for some other information, like SQL syntax or PostgreSQL-specific catalog views there, which is Good). Enough background. Bruce, I've read Your documentation and I was left a bit with a feeling that it's a bit too generic. It's almost as if it could be about just about any major database, not PostgreSQL specific. I feel that, when I'm reading PostgreSQL docs I would like to know how to set up multi-master replication with PostgreSQL not an explanation what a multi-master replication is. It's not about the actual documentation content, but rather on accents distribution. Now it is something like: "These are the types of replication solutions possible, some of them can be done with PostgreSQL", I think it should be rather: "With PostgreSQL and some third-party tools you can achieve such and such replication solutions, oh and by the way, research is done on such and such replication method, but it's not a production quality yet". And I try to think as my DBA-mates would do if they read the documentation, I'm not sure they would end up enlighted after reading the docs -- thay would probably say: "hey, I knew that, it's well structured there, but I still don't know what should I use", or maybe "where can I read something about this slony thing anyway?". It may be my "closed thinking schema" though. What I feel is that such outsider, after reading these docs should end with "Aha! I should be using Slony for my purposes". Or pgpool, if it's what she needs. I believe Tom's remark that it does NOT belong in the PostgreSQL documentation is quite right (though I wish there IS some reference to external replication packages, mainly because over and over again I need to prove PostgreSQL CAN be replicated, and it's not uncommon). However I'm still unconvinced about TechDocs -- TechDocs are good but still they are a bit scattered and unorganised. I am a PostgreSQL enthusiast, but it took me a while to learn about them, and for newbies not biased towards PostgreSQL it may take even more time. If it is linked from within the documentation, random DBAs might read it, and I wish they do. Right now I am more and more biased towards an additional "documentation book" for PostgreSQL, something like "DBA guide" or handbook. In format similar to the PostgreSQL documentation, but inside oriented around configuring other tools around and together with PostgreSQL. I shall send here some drafts withing 10-days time to seed a discussion. After all, PostgreSQL is too big for just one documentation book. [1] Regards, Dawid [1]: Then, later, a programmer's handbook? Deeper knowledge about fancy stuff with Python, Perl and PgSQL? ;-)
Bruce, > It isn't designed for that. It is designed for people to understand > what they want, and then they can look around for solutions. I think > most agree we don't want a list of solutions in the documentation, > though I have a few as examples. Do they? I've seen no discussion of the matter. I think we should have them. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Bruce, > > > It isn't designed for that. It is designed for people to understand > > what they want, and then they can look around for solutions. I think > > most agree we don't want a list of solutions in the documentation, > > though I have a few as examples. > > Do they? I've seen no discussion of the matter. I think we should have > them. Most people didn't want a list because there is no way to keep it current in the docs, and a secondary web site was suggested for the list. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Dawid Kuroczko wrote: > Bruce, I've read Your documentation and I was left a bit with a feeling > that it's a bit too generic. It's almost as if it could be about just about > any major database, not PostgreSQL specific. I feel that, when I'm > reading PostgreSQL docs I would like to know how to set up multi-master > replication with PostgreSQL not an explanation what a multi-master > replication is. It's not about the actual documentation content, but rather > on accents distribution. Now it is something like: "These are the types > of replication solutions possible, some of them can be done with PostgreSQL", > I think it should be rather: "With PostgreSQL and some third-party tools you > can achieve such and such replication solutions, oh and by the way, research > is done on such and such replication method, but it's not a production quality > yet". > > And I try to think as my DBA-mates would do if they read the documentation, > I'm not sure they would end up enlighted after reading the docs -- thay would > probably say: "hey, I knew that, it's well structured there, but I > still don't know > what should I use", or maybe "where can I read something about this slony > thing anyway?". Well, the idea is to have a web site that lists all the solutions that can be updated regularly, perhaps using the categories from the documentation. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, > Most people didn't want a list because there is no way to keep it > current in the docs, and a secondary web site was suggested for the > list. So, like www.postgresql.org/docs/techdocs/replication? That would work. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Bruce, > > > Most people didn't want a list because there is no way to keep it > > current in the docs, and a secondary web site was suggested for the > > list. > > So, like www.postgresql.org/docs/techdocs/replication? That would work. Yes. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, Oct 25, 2006 at 04:42:17PM -0400, Bruce Momjian wrote: > Dawid Kuroczko wrote: > > Bruce, I've read Your documentation and I was left a bit with a feeling > > that it's a bit too generic. It's almost as if it could be about just about > > any major database, not PostgreSQL specific. I feel that, when I'm > > reading PostgreSQL docs I would like to know how to set up multi-master > > replication with PostgreSQL not an explanation what a multi-master > > replication is. It's not about the actual documentation content, but rather > > on accents distribution. Now it is something like: "These are the types > > of replication solutions possible, some of them can be done with PostgreSQL", > > I think it should be rather: "With PostgreSQL and some third-party tools you > > can achieve such and such replication solutions, oh and by the way, research > > is done on such and such replication method, but it's not a production quality > > yet". > > > > And I try to think as my DBA-mates would do if they read the documentation, > > I'm not sure they would end up enlighted after reading the docs -- thay would > > probably say: "hey, I knew that, it's well structured there, but I > > still don't know > > what should I use", or maybe "where can I read something about this slony > > thing anyway?". > > Well, the idea is to have a web site that lists all the solutions that > can be updated regularly, perhaps using the categories from the > documentation. And the docs should point to that page, prominently (presumably that will happen after the page actually exists). Something else worth doing though is to have a paragraph explaining why there's no built-in replication. I don't have time to write something right now, but I can do it later tonight if no one beats me to it. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote: > Something else worth doing though is to have a paragraph explaining why > there's no built-in replication. I don't have time to write something > right now, but I can do it later tonight if no one beats me to it. I thought that was implied in the early paragraph about why there are many solutions. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Joshua D. Drake wrote: > Cesar Suga wrote: > >> Hi, >> >> I also wrote Bruce about that. >> >> It happens that, if you 'freely advertise' commercial solutions (rather >> than they doing so by other vehicles) you will always happen to be an >> 'updater' to the docs if they change their product lines, if they change >> their business model, if and if. >> > > That is no different than the open source offerings. We have had several > open source offerings that have died over the years. Replicator, for > example has always been Replicator and has been around longer than any > of the current replication solutions. > The documentation comes with the open source tarball. I would welcome if the docs point to an unofficial wiki (maintained externally from authoritative PostgreSQL developers) or a website listing them and giving a brief of each solution. postgresql.org already does this for events (commercial training!) and news. Point to postgresql.org/download/commercial as there *already* are brief descriptions, pricing and website links. >> If you cite a commercial solution, as a fair game you should cite *all* >> of them. >> > > No. That doesn't make any sense either. I assume we aren't going to list > all PostgreSQL OSS replication solutions (there are at least a dozen or > more). > > You list the ones that are stable in their existence (commercial or not). > And how would you determine it? Years of existance? Contribution to PostgreSQL's source code? It is not easy and wouldn't be fair. There are ones that certainly will be listed, and other doubtful ones (which would perhaps complain, that's why I said 'all' - if they are not stable, either they stay out of the market or fix their problems). >> If one enterprise has the right to be listed in the >> documentation, all of them might, as you will never be favouring one of >> them. >> > > You are looking at this the wrong way. This isn't about *any* > enterprise. It is about a PostgreSQL Solution. There happens to be two > or three known working open source solutions, and two or three known > working commercial solutions. > (see first three paragraphs) >> That's the main motivation to write this. Moreover, if there are also >> commercial solutions for high-end installs and they are cited as >> providers to those solutions, it (to a point) disencourages those of >> gathering themselves and writing open source extensions to PostgreSQL. >> > > No it doesn't. Because there is always the, "It want's to be free!" crowd. > Yes, I agree there are. But also development in *that* cutting-edge is scarce. It feels that something had filled the gap if you list some commercial solution, mainly people in the trenches (DBAs). They would, obviously, firstly seek the commercial solutions as they are interested. So they click 'commercial products' in the main website. >> If people (who read the documentation) professionally work with >> PostgreSQL, they may already have been briefed by those commercial >> offerings in some way. >> > > Maybe, maybe not. > > Sincerely, > > Joshua D. Drake > And I agree with your point, still. However, that would open a precedent for people to have to maintain lists of stable software in every documentation area. Regards, Cesar
On Wed, Oct 25, 2006 at 05:46:33PM -0400, Bruce Momjian wrote: > Josh Berkus wrote: > > So, like www.postgresql.org/docs/techdocs/replication? That would work. > > Yes. I like that idea, but I think that the URL needs to be decided upon, needs to be stable, and needs to be put into the docs. (I don't see it ATM, I guess because the URL isn't chosen yet?) We get so many questions about "what replication system" that I'm sure people are looking for outlines. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
With no new additions submitted today, I have moved my text into our SGML documentation: http://momjian.us/main/writings/pgsql/sgml/failover.html Please let me know what additional changes are needed. --------------------------------------------------------------------------- bruce wrote: > Richard Troy wrote: > > > > > Here is a new replication documentation section I want to add for 8.2: > > > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > > > > > ...Read the document, as promissed... > > > > First paragraph, "(fail over)" is inconsistent with title, "failover", as > > are other spots throughout the document. The whole document should be > > consistent and I vote for "failover" and not "fail over." > > OK. Fixed to "failover" > > > Fourth paragraph, "This "sync problem" is the fundamental difficulty for > > servers working together"; "Sync problem" hasn't been defined. Actually, > > you're talking about the consistent attribute of the "acid" properties of > > all competent databases: Atomic, Consistency, Isolation, and Durability. > > At least define the term you are using - probably most easily done in the > > preceeding paragraph. > > OK, "sync problem" term removed, and spelled out fully. > > > The fifth paragraph needs a lot more help, I think. Howabout this > > alternative: > > > > So called "two phaised commit" was developed as a strategy in which two or > > more databases are updated simultaneously and none of the data is > > committed until all are committed. This guarantees consistency between the > > databases with all propagation delay being absorbed by the writer at write > > time. There are times when this propagation delay is large, so sometimes > > alternatives are worked out which we'll call here "asynchronous updates," > > however, in these cases, there is always a window of time in which some > > transaction can be lost should a failure occurr. For this reason, > > asynchronous updates are only used when the possibility of such losses is > > acceptible. > > I have modified the paragraph to use some of your terms. > > > Paragraphs six through to "shared disk failover" seem very awkward to me. > > I don't like them at all. > > > > "Shared disk failover" has nothing to do with "the sync problem" as it's > > not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue. > > Further, it also has nothing to do with disk arrays, though it is often > > used with RAID to help avoid disk based corruption problems. > > Yes, please see updated version. I removed the sync problem term from > there. > > > The point about Warm Standby needs to include a warning about WAL that it > > MUST be sensitive to the semantics of the database design or else it's > > fatally flawed. I'm talking about "referential integrety". That is to say, > > it's inappropriate to capture updates on a table by table basis, as some > > such systems do, (I have no idea what's done by anyone in the PG world on > > this right now) because an update to one table (esp. inserts) very often > > go hand in glove with updates in other tables and to get one without the > > other can corrupt a database. > > We don't have that problem. We recover only full transactions. > > > The description of "Continuously running replication server" should > > include the critical caveat - repeated if you think it's already said > > elsewhere - that it is ONLY suitable for applications in which a loss of > > (missing) update data doesn't matter. For example, an airline reservation > > system would be an inappropriate application for such a "solution" because > > what seats are available cannot be guaranteed to be correct. > > I have added note about data loss for the Slony item. > > > Regarding data partitioning, I strongly disagree with the opening sentence > > in that it doesn't split a database into sets, it splits tables into sets. > > OK, changed. > > > Data partitioning is often done within a single database on a single > > server and therefore, as a concept, has nothing whatsoever to do with > > different servers. Similarly, the second paragraph of this section is > > Uh, why would someone split things up like that on a single server? > > > problematic. Please define your term first, then talk about some > > implementations - this is muddying the water. Further, there are both > > vertical and horizontal partitioning - you mention neither - and each has > > its own distinct uses. If partitioning is mentioned, it should be more > > complete. > > Uh, what exactly needs to be defined. > > > Next, Query Broadcast Load Balancing... also needs a lot of work. First, > > it's foremost in my memory that sending read queries everywhere and > > returning the first result set back is a key way to improve application > > performance at the cost of additional load on other systems - I guess > > that's not at all what the document is after here, but it's a worthy part > > of a dialogue on broadcasting queries. In other words, this has more parts > > to it than just what the document now entertains. Secondly, the document > > Uh, do we want to go into that here? I guess I could. > > > doesn't address _at_all_ whether this is a two-phaise-commit environment > > or not. If not, how are updates managed? If each server operates > > independently and one of them fails, what do you do then? How do you know > > _any_ server got an insert/update? ... Each server _can't_ operate > > independently unless the application does its own insert/update commits to > > every one of them - and that can't be fast, nor does it load balance, > > though it may contribute to superior uptime performance by the > > application. > > I think having the application middle layer do the commits is how it > works now. Can someone explain how pgpool works, or should we mention > how two-phase commit has to be done here? pgpool2 has additional > features. > > > Next up; I'm not aware of any current products or projects that provide > > parallel query execution, though Informix might - I can ask a colleague or > > two. Either way, it's probably best to simply define the term (perhaps in > > a little more detail), and not mention solutions - they change with time > > anyway. > > Actually, Bizgres MPP, based on PostgreSQL, does this, but mostly for > read-only queries. > > > While I've never used Oracle's clustering tools, I've read up on them and > > have customers who use them, and I think this description of Oracle > > clustering is a mis-read on what the Oracle system actually does. A check > > with a true Oracle clustering expert is in order here. > > OK, would someone please comment? > > > Hope this helps. If asked, I'm willing to (re)write some of the bits > > discussed above. > > Yes, please review the URL and let me know what else to change. Thanks. > > -- > Bruce Momjian bruce@momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote: > Jim C. Nasby wrote: > > Something else worth doing though is to have a paragraph explaining why > > there's no built-in replication. I don't have time to write something > > right now, but I can do it later tonight if no one beats me to it. > > I thought that was implied in the early paragraph about why there are > many solutions. I think we should explicitely spell it out, especially considering how many times people ask about it. How about... This multitude of choices is why PostgreSQL does not ship with a replication solution by default; any bundled solution would only satisfy a subset of replication needs. (sorry for the non-standard patch, but anoncvs isn't sync'd up yet). -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
Jim C. Nasby wrote: > On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote: > > Jim C. Nasby wrote: > > > Something else worth doing though is to have a paragraph explaining why > > > there's no built-in replication. I don't have time to write something > > > right now, but I can do it later tonight if no one beats me to it. > > > > I thought that was implied in the early paragraph about why there are > > many solutions. > > I think we should explicitely spell it out, especially considering how > many times people ask about it. How about... > > This multitude of choices is why PostgreSQL does not ship with a > replication solution by default; any bundled solution would only > satisfy a subset of replication needs. The problem is that we do have some solutions in our code, like doing data partitioning in the application, warm standby, or using a shared disk for failover, so how do we spell that out? I say there are multiple solutions, but I don't see how I can say that all are external and not included. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Jim C. Nasby wrote: >> On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote: >>> Jim C. Nasby wrote: >>>> Something else worth doing though is to have a paragraph explaining why >>>> there's no built-in replication. I don't have time to write something >>>> right now, but I can do it later tonight if no one beats me to it. >>> I thought that was implied in the early paragraph about why there are >>> many solutions. >> I think we should explicitely spell it out, especially considering how >> many times people ask about it. How about... >> >> This multitude of choices is why PostgreSQL does not ship with a >> replication solution by default; any bundled solution would only >> satisfy a subset of replication needs. > > The problem is that we do have some solutions in our code, like doing > data partitioning in the application, warm standby, or using a shared > disk for failover, so how do we spell that out? I say there are > multiple solutions, but I don't see how I can say that all are external > and not included. None of those are replication solutions. So I would have to agree with Jim here. This isn't about what people do with their app, so that is not relevant. Warm standby is PITR which is a backup and recovery solution. It does not include a failover solution and is *not* replication. It technically does not provide an HA solution either as it will be almost always farther behind than a replication solution. Shared disk for failover could be used by anything it isn't special to a replication scenario it is standard for many HA. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Joshua D. Drake wrote: > Bruce Momjian wrote: > > Jim C. Nasby wrote: > >> On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote: > >>> Jim C. Nasby wrote: > >>>> Something else worth doing though is to have a paragraph explaining why > >>>> there's no built-in replication. I don't have time to write something > >>>> right now, but I can do it later tonight if no one beats me to it. > >>> I thought that was implied in the early paragraph about why there are > >>> many solutions. > >> I think we should explicitely spell it out, especially considering how > >> many times people ask about it. How about... > >> > >> This multitude of choices is why PostgreSQL does not ship with a > >> replication solution by default; any bundled solution would only > >> satisfy a subset of replication needs. > > > > The problem is that we do have some solutions in our code, like doing > > data partitioning in the application, warm standby, or using a shared > > disk for failover, so how do we spell that out? I say there are > > multiple solutions, but I don't see how I can say that all are external > > and not included. > > None of those are replication solutions. So I would have to agree with > Jim here. > > This isn't about what people do with their app, so that is not relevant. > > Warm standby is PITR which is a backup and recovery solution. It does > not include a failover solution and is *not* replication. It technically > does not provide an HA solution either as it will be almost always > farther behind than a replication solution. > > Shared disk for failover could be used by anything it isn't special to a > replication scenario it is standard for many HA. The section is no longer titled only "replication", but is now "Failover, Replication, Load Balancing, and Clustering Options", so it is more a catch-all, and hence saying nothing is included doesn't make sense. You could say no "replication" is included, but replication is only one part of the section, so where do you put that, and why is it worth it? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, A typo: ("a write to any server has to be _propogated_") s/propogated/propagated Bruce Momjian wrote: > Here is a new replication documentation section I want to add for 8.2: > > ftp://momjian.us/pub/postgresql/mypatches/replication > > Comments welcomed. > > -- Regards, Alexey Klyukin alexk(at)vollmond.org.ua Simferopol, Crimea, Ukraine.
On Wed, 25 Oct 2006, Josh Berkus wrote: > > Bruce, > > > It isn't designed for that. It is designed for people to understand > > what they want, and then they can look around for solutions. I think > > most agree we don't want a list of solutions in the documentation, > > though I have a few as examples. > > Do they? I've seen no discussion of the matter. I think we should have > them. > > I completely agree; If you want to attract competent people from the business world, one thing you have to do is respect their time by helping them find information, especially about things they don't know exist. All that's needed are pointers, but the pointers need to be to solid documents/resources, not just the top of a heap - if you'll forgive the pun. Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
> The documentation comes with the open source tarball. Yuck. > > I would welcome if the docs point to an unofficial wiki (maintained > externally from authoritative PostgreSQL developers) or a website > listing them and giving a brief of each solution. > > postgresql.org already does this for events (commercial training!) and > news. Point to postgresql.org/download/commercial as there *already* are > brief descriptions, pricing and website links. I wouldn't have looked in "download" for such a thing. Nor would I expect everyone with a Postgres related solution to want to post it on PosgreSql.org for download. However I agree that a simple web page listing such things is needed. It's easy to manage - way easier to manage than the development of a competent relational database engine! It's just a bunch of text, after all, and errors aren't that critical and will tend to self-correct through user attention. > > > > You list the ones that are stable in their existence (commercial or not). > > > And how would you determine it? Years of existance? Contribution to > PostgreSQL's source code? It is not easy and wouldn't be fair. There are > ones that certainly will be listed, and other doubtful ones (which would > perhaps complain, that's why I said 'all' - if they are not stable, > either they stay out of the market or fix their problems). You have to just trust people. If it's clear that "this isn't PostgreSql.org", stuff can be unstable, etc - it isn't the group's problem. > > No it doesn't. Because there is always the, "It want's to be free!" crowd. > > > Yes, I agree there are. But also development in *that* cutting-edge is > scarce. It feels that something had filled the gap if you list some > commercial solution, mainly people in the trenches (DBAs). They would, > obviously, firstly seek the commercial solutions as they are interested. > So they click 'commercial products' in the main website. Not necessarily. Most times, I'll seek the better solution, which may or may not be commercial. Sometimes I'll avoid a commercial version because I don't like the company! ... But getting genuine donations of time - without direct $$ self-interest attached, is a whole nother kettle o fish. For example, there are a lot of students out there that are excellent and would love to have a mechanism to gain something for their resumes before entering the business world. ...There might be some residual interest at UCB, for example. Attracting this kind of support is a completely different dialogue, but on _this_ topic, surely seeking the "it wants to be free!" crowd can't (or shouldn't, in my view) be used as an excuse for not publishing pointers to commercial soltions that involve PostgreSql. Do it already! > >> If people (who read the documentation) professionally work with > >> PostgreSQL, they may already have been briefed by those commercial > >> offerings in some way. > >> > > > > Maybe, maybe not. The "may" is a wiggler; sounds like an excuse with a back door. The real answer is "probably not!" I'm in that world. I haven't been briefed. Ever. > And I agree with your point, still. However, that would open a precedent > for people to have to maintain lists of stable software in every > documentation area. All that's needed is ONE list, with clear disclaimer. It'll be all text and links, and maybe the odd small .gif logo, if permitted, so it won't be a huge thing. Come on now, are there thousands of such products? Tens sounds more plausible. Regards, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
On Thu, Oct 26, 2006 at 11:59:57AM -0400, Bruce Momjian wrote: > Jim C. Nasby wrote: > > On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote: > > > Jim C. Nasby wrote: > > > > Something else worth doing though is to have a paragraph explaining why > > > > there's no built-in replication. I don't have time to write something > > > > right now, but I can do it later tonight if no one beats me to it. > > > > > > I thought that was implied in the early paragraph about why there are > > > many solutions. > > > > I think we should explicitely spell it out, especially considering how > > many times people ask about it. How about... > > > > This multitude of choices is why PostgreSQL does not ship with a > > replication solution by default; any bundled solution would only > > satisfy a subset of replication needs. > > The problem is that we do have some solutions in our code, like doing > data partitioning in the application, warm standby, or using a shared > disk for failover, so how do we spell that out? I say there are > multiple solutions, but I don't see how I can say that all are external > and not included. Good point... how about this? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
On Thursday 26 October 2006 10:45, Andrew Sullivan wrote: > On Wed, Oct 25, 2006 at 05:46:33PM -0400, Bruce Momjian wrote: > > Josh Berkus wrote: > > > So, like www.postgresql.org/docs/techdocs/replication? That would > > > work. > > > > Yes. > > I like that idea, but I think that the URL needs to be decided upon, > needs to be stable, and needs to be put into the docs. (I don't see > it ATM, I guess because the URL isn't chosen yet?) We get so many > questions about "what replication system" that I'm sure people are > looking for outlines. > > A Unfortunately the techdocs system won't support a url like the one above, rather you'll end up with something more like the following http://www.postgresql.org/docs/techdocs.54 which is the "GUI Tools Guide" (which is linked in the FAQ fwiw). Once it is in place, it will be stable though. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Jim C. Nasby wrote: > On Thu, Oct 26, 2006 at 11:59:57AM -0400, Bruce Momjian wrote: > > Jim C. Nasby wrote: > > > On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote: > > > > Jim C. Nasby wrote: > > > > > Something else worth doing though is to have a paragraph explaining why > > > > > there's no built-in replication. I don't have time to write something > > > > > right now, but I can do it later tonight if no one beats me to it. > > > > > > > > I thought that was implied in the early paragraph about why there are > > > > many solutions. > > > > > > I think we should explicitely spell it out, especially considering how > > > many times people ask about it. How about... > > > > > > This multitude of choices is why PostgreSQL does not ship with a > > > replication solution by default; any bundled solution would only > > > satisfy a subset of replication needs. > > > > The problem is that we do have some solutions in our code, like doing > > data partitioning in the application, warm standby, or using a shared > > disk for failover, so how do we spell that out? I say there are > > multiple solutions, but I don't see how I can say that all are external > > and not included. > > Good point... how about this? Sorry, that is too preachy, and I have the extensibility issue addressed in the commerical solutions section. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, Oct 26, 2006 at 03:06:13PM -0400, Robert Treat wrote: > > Unfortunately the techdocs system won't support a url like the one above, > rather you'll end up with something more like the following > http://www.postgresql.org/docs/techdocs.54 which is the "GUI Tools Guide" > (which is linked in the FAQ fwiw). Once it is in place, it will be stable > though. Surely this is what redirects were invented for, no? http://www.postgresql.org/replication redirects to [stable magic URL] Put the former in the docs. A -- Andrew Sullivan | ajs@crankycanuck.ca Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz
On Wed, 25 Oct 2006, Bruce Momjian wrote: ...snip... > > > Data partitioning is often done within a single database on a single > > server and therefore, as a concept, has nothing whatsoever to do with > > different servers. Similarly, the second paragraph of this section is > > Uh, why would someone split things up like that on a single server? > > > problematic. Please define your term first, then talk about some > > implementations - this is muddying the water. Further, there are both > > vertical and horizontal partitioning - you mention neither - and each has > > its own distinct uses. If partitioning is mentioned, it should be more > > complete. > > Uh, what exactly needs to be defined. OK, "Data partitioning"; data partitioning begins in the RDB world with the very notion of tables, and we partition our data during schema development with the goal of "normalizing" the design - "thrid normal form" being the one most Professors talk about as a target. "Data partitioning", then, is the intentional denormalization of the design to accomplish some goal(s) - not all of which are listed in this document's title. In this context, data partitioning takes two forms based upon which axis of a two-dimensional table is to be divided, with the vertical partition dividing attributes (as in a master/detail relationship with one-to-one mapping), and the horizontal partition dividing based on one or more attributes domain, or value (as in your example of London records being kept in a database in London, while Paris records are kept in Paris). The point I was making was that that section of the document was in err because it presumed there was only one form of data partitioning and that it was horizontal. (The document is now missing, so I can't look at the current content - it was here: ftp://momjian.us/pub/postgresql/mypatches/replication.) In answer to your query about why someone would use such partitioning, the nearly universal answer is performance, and the distant second answer is security. In one example that comes immediately to mind, there is a table which is a central core of an application, and, as such, there's a lot to say about the items in this table. The table's size is in the tens to hundreds of millions of rows, and needs to be joined with something else in a huge fraction of queries. For performance reasons, the tables size was therefore kept as tiny as possible and detail table(s) is(are) used for the remaining attributes that logically belong in the table - it's a vertical partition. It's an exceptionally common technique - so common, it probably didn't occur to you that you were even talking about it when you spoke of "data partitioning." > > Next, Query Broadcast Load Balancing... also needs a lot of work. First, > > it's foremost in my memory that sending read queries everywhere and > > returning the first result set back is a key way to improve application > > performance at the cost of additional load on other systems - I guess > > that's not at all what the document is after here, but it's a worthy part > > of a dialogue on broadcasting queries. In other words, this has more parts > > to it than just what the document now entertains. Secondly, the document > > Uh, do we want to go into that here? I guess I could. > > > doesn't address _at_all_ whether this is a two-phaise-commit environment > > or not. If not, how are updates managed? If each server operates > > independently and one of them fails, what do you do then? How do you know > > _any_ server got an insert/update? ... Each server _can't_ operate > > independently unless the application does its own insert/update commits to > > every one of them - and that can't be fast, nor does it load balance, > > though it may contribute to superior uptime performance by the > > application. > > I think having the application middle layer do the commits is how it > works now. Can someone explain how pgpool works, or should we mention > how two-phase commit has to be done here? pgpool2 has additional > features. Well, you hadn't mentioned two phaise commit at all and it surely belong somewhere in this document - it's a core PG feature and enables a lot of alternative solutions which the document discusses. What it needs to say but doesn't (didn't?) is that the load from read queries can be distributed for load balancing purposes but that there's no benefit possible for writes, and that replication overhead costs could possibly overwhelm the benefits in high-update scenarios. The point that each server operates independently is only true if you ignore the the necessary replication - which, to my mind, links the systems and they are not independent. ...I suppose that in a completely read-only environment - or updated nightly by dumping tarwads or something like that, they could be considered independent, but it's hardly worth the sentence. Regards, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/