Thread: Re: [HACKERS] Replication documentation addition

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
I have updated the text.  Please let me know what else I should change.
I am unsure if I should be mentioning commercial PostgreSQL products in
our documentation.

---------------------------------------------------------------------------

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> > Here is a new replication documentation section I want to add for 8.2:
> >
> >     ftp://momjian.us/pub/postgresql/mypatches/replication
>
> This is how data partitioning is currently described there
>
> > Data Partitioning
> > -----------------
> >
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris.
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
>
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.
>
> As an example of this type of partitioning you could mention Bizgres MPP
> (a PG-based commercial product, http://www.greenplum.com ), which
> partitions data to use I/O and CPU of several DB servers for processing
> complex OLAP queries, and Pl_Proxy
> ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
> loads.
>
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.
>
> --
> ----------------
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
>
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Josh Berkus
Date:
Bruce,

> I have updated the text.  Please let me know what else I should change.
> I am unsure if I should be mentioning commercial PostgreSQL products in
> our documentation.

I think you should mention the postgresql-only ones, but just briefly with a
link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: [HACKERS] Replication documentation addition

From
"Joshua D. Drake"
Date:
Josh Berkus wrote:
> Bruce,
>
>> I have updated the text.  Please let me know what else I should change.
>> I am unsure if I should be mentioning commercial PostgreSQL products in
>> our documentation.
>
> I think you should mention the postgresql-only ones, but just briefly with a
> link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

And to further this I would expect that it would be a subsection.. e.g;
a <sect2> or <sect3>. I think the open source version should absolutely
get top billing though.

Sincerely,

Joshua D. Drake




--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Joshua D. Drake wrote:
> Josh Berkus wrote:
> > Bruce,
> >
> >> I have updated the text.  Please let me know what else I should change.
> >> I am unsure if I should be mentioning commercial PostgreSQL products in
> >> our documentation.
> >
> > I think you should mention the postgresql-only ones, but just briefly with a
> > link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.
>
> And to further this I would expect that it would be a subsection.. e.g;
> a <sect2> or <sect3>. I think the open source version should absolutely
> get top billing though.

I am not inclined to add commercial offerings.  If people wanted
commercial database offerings, they can get them from companies that
advertize.  People are coming to PostgreSQL for open source solutions,
and I think mentioning commercial ones doesn't make sense.

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Markus Schiltknecht
Date:
Hi,

Bruce Momjian wrote:
> I have updated the text.  Please let me know what else I should change.
> I am unsure if I should be mentioning commercial PostgreSQL products in
> our documentation.

I support your POV and vote for not including any pointers to commercial
extensions in the official documentation. If at all, they should go to
'external-projects.sgml', where PostGIS, PgAdmin and other projects are
mentioned.

I can't really get excited about the exclusion of the term
'replication', because it's what most people are looking for. It's a
well known term. Sorry if it sounded that way, but I've not meant to
avoid that term.

The newly created terms 'Query Broadcast Load Balancing' or even worse
'Multi-Master Load Balancing' are more confusing than helpful, because
these terms do not exist. (See the googlefight in [1])

Can we name the chapter "Fail-over, Load-Balancing and Replication
Options"? That would fit everything and contain the necessary buzz words.

Also, I'm still missing Multi- vs Single-Master, which are also commonly
used terms.

IMHO, it does not make sense to speak of a synchronous replication for a
'Shared Disk Fail Over'. It's not replication, because there's no replica.

The Data Partitioning paragraph should probably mention it's close
relation with data partitioning across table spaces (and make the
differences clear).

What you call 'Query Broadcast Load Balancing' is also a multi-master
replication, thus naming only the later 'Multi-Master Load Balancing'
misleading.

I'd propose to add a subsection 'Synchronous, Multi-Master Replication'
and explain the different possibilities on how to do that:

* Query-Based
* with 2PC
* Distributed SHMEM
* (perhaps mention the optimized Postgres-R algorithm ;-)

What you called 'Single-Query Clustering' is probably better known as
'Parallel Query Execution'. It can be combined with all types of
replication (every combination of async / sync and Single- /
Multi-Master). It's maybe load balancing, but it depends on some form of
replication to distribute the data first.

I liked Chris Browns documentation in [2] which was clearer regarding
replication (which can be used to do fail-over, load-balancing,
data-partitioning or parallel query execution). I'd like to keep all
those things a little more separate to get them clear.

Regards

Markus

[1]: Googlefight: "Multi-Master Load Balancing" vs "Multi-Master
Replication": http://tinyurl.com/y3k76r

[2]: Chris Browns proposal for a replication documentation:
http://archives.postgresql.org/pgsql-patches/2006-08/msg00026.php

Re: [HACKERS] Replication documentation addition

From
"Jim C. Nasby"
Date:
On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:
> I can't really get excited about the exclusion of the term
> 'replication', because it's what most people are looking for. It's a
> well known term. Sorry if it sounded that way, but I've not meant to
> avoid that term.
<snip>
> IMHO, it does not make sense to speak of a synchronous replication for a
> 'Shared Disk Fail Over'. It's not replication, because there's no replica.

Those to statements are at odds with each other, at least based on
everyone I've ever talked to in a commercial setting. People will use
terms like 'replication', 'HA' or 'clustering' fairly interchangably.
Usually what these folks want is some kind of high-availability
solution. A few are more concerned with scalability. Sometimes it's a
combination of both. That's why I think it's good for the chapter to
deal with both aspects of this.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: [HACKERS] Replication documentation addition

From
"Joshua D. Drake"
Date:
>
> I am not inclined to add commercial offerings.  If people wanted
> commercial database offerings, they can get them from companies that
> advertize.  People are coming to PostgreSQL for open source solutions,
> and I think mentioning commercial ones doesn't make sense.
>
> If we are to add them, I need to hear that from people who haven't
> worked in PostgreSQL commerical replication companies.
>

You did, Josh Berkus. Secondly, as many people have stated in the past
not one replication suits everyone's needs and as PostgreSQL has many
replication solutions, it only makes sense to list the more prominent
ones, commercial or not.

Joshua D. Drake


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Markus Schiltknecht wrote:
> Hi,
>
> Bruce Momjian wrote:
> > I have updated the text.  Please let me know what else I should change.
> > I am unsure if I should be mentioning commercial PostgreSQL products in
> > our documentation.
>
> I support your POV and vote for not including any pointers to commercial
> extensions in the official documentation. If at all, they should go to
> 'external-projects.sgml', where PostGIS, PgAdmin and other projects are
> mentioned.
>
> I can't really get excited about the exclusion of the term
> 'replication', because it's what most people are looking for. It's a
> well known term. Sorry if it sounded that way, but I've not meant to
> avoid that term.

OK, I have re-added the term "replication" as appropriate.

> The newly created terms 'Query Broadcast Load Balancing' or even worse
> 'Multi-Master Load Balancing' are more confusing than helpful, because
> these terms do not exist. (See the googlefight in [1])

OK, renamed.

> Can we name the chapter "Fail-over, Load-Balancing and Replication
> Options"? That would fit everything and contain the necessary buzz words.

Yes. Done, "cluster" added too.

> Also, I'm still missing Multi- vs Single-Master, which are also commonly
> used terms.

Yea, not sure how to get those in because it somewhat confuses the
"purpose" of the solution.

> IMHO, it does not make sense to speak of a synchronous replication for a
> 'Shared Disk Fail Over'. It's not replication, because there's no replica.

Agreed.  Modified.

> The Data Partitioning paragraph should probably mention it's close
> relation with data partitioning across table spaces (and make the
> differences clear).

Uh, so you I/O load with table spaces.  Uh, that seems too far a reach
to mention here.

> What you call 'Query Broadcast Load Balancing' is also a multi-master
> replication, thus naming only the later 'Multi-Master Load Balancing'
> misleading.

Renamed.

> I'd propose to add a subsection 'Synchronous, Multi-Master Replication'
> and explain the different possibilities on how to do that:
>
> * Query-Based
> * with 2PC
> * Distributed SHMEM
> * (perhaps mention the optimized Postgres-R algorithm ;-)
>
> What you called 'Single-Query Clustering' is probably better known as
> 'Parallel Query Execution'. It can be combined with all types of
> replication (every combination of async / sync and Single- /
> Multi-Master). It's maybe load balancing, but it depends on some form of
> replication to distribute the data first.

Good term.  Added.

> I liked Chris Browns documentation in [2] which was clearer regarding
> replication (which can be used to do fail-over, load-balancing,
> data-partitioning or parallel query execution). I'd like to keep all
> those things a little more separate to get them clear.

Please let me know how you like the new version at the ftp URL.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:
> > I can't really get excited about the exclusion of the term
> > 'replication', because it's what most people are looking for. It's a
> > well known term. Sorry if it sounded that way, but I've not meant to
> > avoid that term.
> <snip>
> > IMHO, it does not make sense to speak of a synchronous replication for a
> > 'Shared Disk Fail Over'. It's not replication, because there's no replica.
>
> Those to statements are at odds with each other, at least based on
> everyone I've ever talked to in a commercial setting. People will use
> terms like 'replication', 'HA' or 'clustering' fairly interchangably.
> Usually what these folks want is some kind of high-availability
> solution. A few are more concerned with scalability. Sometimes it's a
> combination of both. That's why I think it's good for the chapter to
> deal with both aspects of this.

OK, I did break it out somewhat for clarity.  Let me know how it looks
now.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Markus Schiltknecht
Date:
Hi,

Jim C. Nasby wrote:
> Those to statements are at odds with each other, at least based on
> everyone I've ever talked to in a commercial setting. People will use
> terms like 'replication', 'HA' or 'clustering' fairly interchangably.
> Usually what these folks want is some kind of high-availability
> solution. A few are more concerned with scalability. Sometimes it's a
> combination of both. That's why I think it's good for the chapter to
> deal with both aspects of this.

Yabut... at least the PostgreSQL manual should uses the terms correctly.

And while I do perfectly agree that it's a fail-over solution and it
should be mentioned in that section, I'm arguing that it's not replication.

Regards

Markus

Re: [HACKERS] Replication documentation addition

From
David Fetter
Date:
On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:

> Can we name the chapter "Fail-over, Load-Balancing and Replication
> Options"? That would fit everything and contain the necessary buzz words.
...

> IMHO, it does not make sense to speak of a synchronous replication for a
> 'Shared Disk Fail Over'. It's not replication, because there's no replica.

As you point out, there is no replica of the data, but there is some
protection against machine failure, which puts it firmly in the
"Fail-over" part above.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
David Fetter wrote:
> On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:
>
> > Can we name the chapter "Fail-over, Load-Balancing and Replication
> > Options"? That would fit everything and contain the necessary buzz words.
> ...
>
> > IMHO, it does not make sense to speak of a synchronous replication for a
> > 'Shared Disk Fail Over'. It's not replication, because there's no replica.
>
> As you point out, there is no replica of the data, but there is some
> protection against machine failure, which puts it firmly in the
> "Fail-over" part above.

Right, but his point was not to call it synchronous.  I have fixed that
in the current version.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +