Thread: Re: [HACKERS] Replication documentation addition
I have updated the text. Please let me know what else I should change. I am unsure if I should be mentioning commercial PostgreSQL products in our documentation. --------------------------------------------------------------------------- Hannu Krosing wrote: > ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian: > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > This is how data partitioning is currently described there > > > Data Partitioning > > ----------------- > > > > Data partitioning splits the database into data sets. To achieve > > replication, each data set can only be modified by one server. For > > example, data can be partitioned by offices, e.g. London and Paris. > > While London and Paris servers have all data records, only London can > > modify London records, and Paris can only modify Paris records. Such > > partitioning is usually accomplished in application code, though rules > > and triggers can help enforce partitioning and keep the read-only data > > sets current. Slony can also be used in such a setup. While Slony > > replicates only entire tables, London and Paris can be placed in > > separate tables, and inheritance can be used to access from both tables > > using a single table name. > > Maybe another use of partitioning should also be mentioned. That is , > when partitioning is used to overcome limitations of single servers > (especially IO and memory, but also CPU), and only a subset of data is > stored and processed on each server. > > As an example of this type of partitioning you could mention Bizgres MPP > (a PG-based commercial product, http://www.greenplum.com ), which > partitions data to use I/O and CPU of several DB servers for processing > complex OLAP queries, and Pl_Proxy > ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP > loads. > > I think the "official" term for this kind of "replication" is > Shared-Nothing Clustering. > > -- > ---------------- > Hannu Krosing > Database Architect > Skype Technologies O? > Akadeemia tee 21 F, Tallinn, 12618, Estonia > > Skype me: callto:hkrosing > Get Skype for free: http://www.skype.com > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, > I have updated the text. Please let me know what else I should change. > I am unsure if I should be mentioning commercial PostgreSQL products in > our documentation. I think you should mention the postgresql-only ones, but just briefly with a link. Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Bruce, > >> I have updated the text. Please let me know what else I should change. >> I am unsure if I should be mentioning commercial PostgreSQL products in >> our documentation. > > I think you should mention the postgresql-only ones, but just briefly with a > link. Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator. And to further this I would expect that it would be a subsection.. e.g; a <sect2> or <sect3>. I think the open source version should absolutely get top billing though. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Joshua D. Drake wrote: > Josh Berkus wrote: > > Bruce, > > > >> I have updated the text. Please let me know what else I should change. > >> I am unsure if I should be mentioning commercial PostgreSQL products in > >> our documentation. > > > > I think you should mention the postgresql-only ones, but just briefly with a > > link. Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator. > > And to further this I would expect that it would be a subsection.. e.g; > a <sect2> or <sect3>. I think the open source version should absolutely > get top billing though. I am not inclined to add commercial offerings. If people wanted commercial database offerings, they can get them from companies that advertize. People are coming to PostgreSQL for open source solutions, and I think mentioning commercial ones doesn't make sense. If we are to add them, I need to hear that from people who haven't worked in PostgreSQL commerical replication companies. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, Bruce Momjian wrote: > I have updated the text. Please let me know what else I should change. > I am unsure if I should be mentioning commercial PostgreSQL products in > our documentation. I support your POV and vote for not including any pointers to commercial extensions in the official documentation. If at all, they should go to 'external-projects.sgml', where PostGIS, PgAdmin and other projects are mentioned. I can't really get excited about the exclusion of the term 'replication', because it's what most people are looking for. It's a well known term. Sorry if it sounded that way, but I've not meant to avoid that term. The newly created terms 'Query Broadcast Load Balancing' or even worse 'Multi-Master Load Balancing' are more confusing than helpful, because these terms do not exist. (See the googlefight in [1]) Can we name the chapter "Fail-over, Load-Balancing and Replication Options"? That would fit everything and contain the necessary buzz words. Also, I'm still missing Multi- vs Single-Master, which are also commonly used terms. IMHO, it does not make sense to speak of a synchronous replication for a 'Shared Disk Fail Over'. It's not replication, because there's no replica. The Data Partitioning paragraph should probably mention it's close relation with data partitioning across table spaces (and make the differences clear). What you call 'Query Broadcast Load Balancing' is also a multi-master replication, thus naming only the later 'Multi-Master Load Balancing' misleading. I'd propose to add a subsection 'Synchronous, Multi-Master Replication' and explain the different possibilities on how to do that: * Query-Based * with 2PC * Distributed SHMEM * (perhaps mention the optimized Postgres-R algorithm ;-) What you called 'Single-Query Clustering' is probably better known as 'Parallel Query Execution'. It can be combined with all types of replication (every combination of async / sync and Single- / Multi-Master). It's maybe load balancing, but it depends on some form of replication to distribute the data first. I liked Chris Browns documentation in [2] which was clearer regarding replication (which can be used to do fail-over, load-balancing, data-partitioning or parallel query execution). I'd like to keep all those things a little more separate to get them clear. Regards Markus [1]: Googlefight: "Multi-Master Load Balancing" vs "Multi-Master Replication": http://tinyurl.com/y3k76r [2]: Chris Browns proposal for a replication documentation: http://archives.postgresql.org/pgsql-patches/2006-08/msg00026.php
On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > I can't really get excited about the exclusion of the term > 'replication', because it's what most people are looking for. It's a > well known term. Sorry if it sounded that way, but I've not meant to > avoid that term. <snip> > IMHO, it does not make sense to speak of a synchronous replication for a > 'Shared Disk Fail Over'. It's not replication, because there's no replica. Those to statements are at odds with each other, at least based on everyone I've ever talked to in a commercial setting. People will use terms like 'replication', 'HA' or 'clustering' fairly interchangably. Usually what these folks want is some kind of high-availability solution. A few are more concerned with scalability. Sometimes it's a combination of both. That's why I think it's good for the chapter to deal with both aspects of this. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> > I am not inclined to add commercial offerings. If people wanted > commercial database offerings, they can get them from companies that > advertize. People are coming to PostgreSQL for open source solutions, > and I think mentioning commercial ones doesn't make sense. > > If we are to add them, I need to hear that from people who haven't > worked in PostgreSQL commerical replication companies. > You did, Josh Berkus. Secondly, as many people have stated in the past not one replication suits everyone's needs and as PostgreSQL has many replication solutions, it only makes sense to list the more prominent ones, commercial or not. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Markus Schiltknecht wrote: > Hi, > > Bruce Momjian wrote: > > I have updated the text. Please let me know what else I should change. > > I am unsure if I should be mentioning commercial PostgreSQL products in > > our documentation. > > I support your POV and vote for not including any pointers to commercial > extensions in the official documentation. If at all, they should go to > 'external-projects.sgml', where PostGIS, PgAdmin and other projects are > mentioned. > > I can't really get excited about the exclusion of the term > 'replication', because it's what most people are looking for. It's a > well known term. Sorry if it sounded that way, but I've not meant to > avoid that term. OK, I have re-added the term "replication" as appropriate. > The newly created terms 'Query Broadcast Load Balancing' or even worse > 'Multi-Master Load Balancing' are more confusing than helpful, because > these terms do not exist. (See the googlefight in [1]) OK, renamed. > Can we name the chapter "Fail-over, Load-Balancing and Replication > Options"? That would fit everything and contain the necessary buzz words. Yes. Done, "cluster" added too. > Also, I'm still missing Multi- vs Single-Master, which are also commonly > used terms. Yea, not sure how to get those in because it somewhat confuses the "purpose" of the solution. > IMHO, it does not make sense to speak of a synchronous replication for a > 'Shared Disk Fail Over'. It's not replication, because there's no replica. Agreed. Modified. > The Data Partitioning paragraph should probably mention it's close > relation with data partitioning across table spaces (and make the > differences clear). Uh, so you I/O load with table spaces. Uh, that seems too far a reach to mention here. > What you call 'Query Broadcast Load Balancing' is also a multi-master > replication, thus naming only the later 'Multi-Master Load Balancing' > misleading. Renamed. > I'd propose to add a subsection 'Synchronous, Multi-Master Replication' > and explain the different possibilities on how to do that: > > * Query-Based > * with 2PC > * Distributed SHMEM > * (perhaps mention the optimized Postgres-R algorithm ;-) > > What you called 'Single-Query Clustering' is probably better known as > 'Parallel Query Execution'. It can be combined with all types of > replication (every combination of async / sync and Single- / > Multi-Master). It's maybe load balancing, but it depends on some form of > replication to distribute the data first. Good term. Added. > I liked Chris Browns documentation in [2] which was clearer regarding > replication (which can be used to do fail-over, load-balancing, > data-partitioning or parallel query execution). I'd like to keep all > those things a little more separate to get them clear. Please let me know how you like the new version at the ftp URL. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Jim C. Nasby wrote: > On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > > I can't really get excited about the exclusion of the term > > 'replication', because it's what most people are looking for. It's a > > well known term. Sorry if it sounded that way, but I've not meant to > > avoid that term. > <snip> > > IMHO, it does not make sense to speak of a synchronous replication for a > > 'Shared Disk Fail Over'. It's not replication, because there's no replica. > > Those to statements are at odds with each other, at least based on > everyone I've ever talked to in a commercial setting. People will use > terms like 'replication', 'HA' or 'clustering' fairly interchangably. > Usually what these folks want is some kind of high-availability > solution. A few are more concerned with scalability. Sometimes it's a > combination of both. That's why I think it's good for the chapter to > deal with both aspects of this. OK, I did break it out somewhat for clarity. Let me know how it looks now. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, Jim C. Nasby wrote: > Those to statements are at odds with each other, at least based on > everyone I've ever talked to in a commercial setting. People will use > terms like 'replication', 'HA' or 'clustering' fairly interchangably. > Usually what these folks want is some kind of high-availability > solution. A few are more concerned with scalability. Sometimes it's a > combination of both. That's why I think it's good for the chapter to > deal with both aspects of this. Yabut... at least the PostgreSQL manual should uses the terms correctly. And while I do perfectly agree that it's a fail-over solution and it should be mentioned in that section, I'm arguing that it's not replication. Regards Markus
On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > Can we name the chapter "Fail-over, Load-Balancing and Replication > Options"? That would fit everything and contain the necessary buzz words. ... > IMHO, it does not make sense to speak of a synchronous replication for a > 'Shared Disk Fail Over'. It's not replication, because there's no replica. As you point out, there is no replica of the data, but there is some protection against machine failure, which puts it firmly in the "Fail-over" part above. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
David Fetter wrote: > On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote: > > > Can we name the chapter "Fail-over, Load-Balancing and Replication > > Options"? That would fit everything and contain the necessary buzz words. > ... > > > IMHO, it does not make sense to speak of a synchronous replication for a > > 'Shared Disk Fail Over'. It's not replication, because there's no replica. > > As you point out, there is no replica of the data, but there is some > protection against machine failure, which puts it firmly in the > "Fail-over" part above. Right, but his point was not to call it synchronous. I have fixed that in the current version. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +