replication docs: split single vs. multi-master - Mailing list pgsql-patches

From Markus Schiltknecht
Subject replication docs: split single vs. multi-master
Date
Msg-id 455AEF4F.3010304@bluegap.ch
Whole thread Raw
Responses Re: replication docs: split single vs. multi-master
List pgsql-patches
Hi,

as promised on -docs, here comes my proposal on how to improve the
replication documentation. The patches are split as follows and have to
be applied in order:

replication_doku_1.diff:

   Smallest possible one-word change to warm-up...


replication_doku_2.diff:

   Moves down "Clustering For Parallel Query Execution", because
   it's not a replication type, but a feature, see explanation below.


replication_doku_3.diff:

   This is the most important part, splitting all replication types
   into single- and multi-master replication.  I'm new to SGML, so
   please bear with me if this is not the right way to do it...

   "Shared-Disk-Failover" does IMO not fall into a replication category.
   Should we mention there, that 'sharing' a disk using NFS or some
   such is not recommended? (And more importantly, does not work as
   a multi-master replication solution)

   I've added a general paragraph describing Single-Master Replication.
   I'm stating that 'Single-Master Replication is always asynchronous'.
   Can anybody think of a counter example? Or a use case for sync
   Single-Master Replication? The argument to put down is: if you go
   sync, why don't you do Multi-Master right away?

   Most of the "Clustering for Load Balancing" text applies to all
   synchronous, Multi-Master Replication algorithms, even to
   "Query Broadcasting". Thus it became the general description
   of Multi-Master Replication. The section "Clustering for
   Load Balancing" has been removed.


replication_doku_4.diff:

   These are the text modifications I did to adjust to the new structure.
   I've adjusted the Multi-Master Replication text to really be
   appropriate for all existing solutions.

   "Query Broadcasting" has some corrections, mainly to stick to describe
   that algorithm there and none of the general properties of
   Multi-Master Replication.

   I've added two sections to describe 2PC and Distributed SHMEM
   algorithms which belong into that category and cover all of the
   previous text. Except that I've removed the mentioning of Oracle RAC
   in favor of Pgpool-II.


IMO this makes it clearer, what replication types exist and how to
categorize them. I'm tempted to mention the Postgres-R algorithm as
fourth sub-section of Multi-Master Replication, as it's quite different
from all the others in many aspects. But I urgently need to do go to
work now... besides, I'm heavily biased regarding Postgres-R, so
probably someone else should write that paragraph. :-)

The only downside of the structure I'm proposing here is: the
non-replication-algorithms fall of somewhat. Namely: "Shared-Disk
Failover", "Data Partitioning", "Parallel Query Execution" and
"Commercial Solutions".

For me, "Data Partitioning" as well as "Parallel Query Execution" are
possible optimizations which can be run on top of replicated data. They
don't replicate data and are thus not replication solutions. But
grouping those two together would make sense.

So. I really have to go to work now!

Regards

Markus

*** doc/src/sgml/failover.sgml    2006-11-15 08:52:25.000000000 +0100
--- doc/src/sgml/failover.sgml    2006-11-15 08:58:34.000000000 +0100
***************
*** 126,132 ****
    </para>

    <para>
!    Such partitioning implements both failover and load balancing.  Failover
     is achieved because the data resides on both servers, and this is an
     ideal way to enable failover if the servers share a slow communication
     channel. Load balancing is possible because read requests can go to any
--- 126,132 ----
    </para>

    <para>
!    Such partitioning provides both failover and load balancing.  Failover
     is achieved because the data resides on both servers, and this is an
     ideal way to enable failover if the servers share a slow communication
     channel. Load balancing is possible because read requests can go to any
*** doc/src/sgml/failover.sgml    2006-11-15 08:58:34.000000000 +0100
--- doc/src/sgml/failover.sgml    2006-11-15 09:10:53.000000000 +0100
***************
*** 114,150 ****
    </para>
   </sect1>

-  <sect1 id="data-partitioning">
-   <title>Data Partitioning</title>
-
-   <para>
-    Data partitioning splits tables into data sets.  Each set can only be
-    modified by one server.  For example, data can be partitioned by
-    offices, e.g. London and Paris.  While London and Paris servers have all
-    data records, only London can modify London records, and Paris can only
-    modify Paris records.
-   </para>
-
-   <para>
-    Such partitioning provides both failover and load balancing.  Failover
-    is achieved because the data resides on both servers, and this is an
-    ideal way to enable failover if the servers share a slow communication
-    channel. Load balancing is possible because read requests can go to any
-    of the servers, and write requests are split among the servers.  Of
-    course, the communication to keep all the servers up-to-date adds
-    overhead, so ideally the write load should be low, or localized as in
-    the London/Paris example above.
-   </para>
-
-   <para>
-    Data partitioning is usually handled by application code, though rules
-    and triggers can be used to keep the read-only data sets current.  Slony-I
-    can also be used in such a setup.  While Slony-I replicates only entire
-    tables, London and Paris can be placed in separate tables, and
-    inheritance can be used to access both tables using a single table name.
-   </para>
-  </sect1>
-
   <sect1 id="query-broadcast-load-balancing">
    <title>Query Broadcast Load Balancing</title>

--- 114,119 ----
***************
*** 198,203 ****
--- 167,203 ----
    </para>
   </sect1>

+  <sect1 id="data-partitioning">
+   <title>Data Partitioning</title>
+
+   <para>
+    Data partitioning splits tables into data sets.  Each set can only be
+    modified by one server.  For example, data can be partitioned by
+    offices, e.g. London and Paris.  While London and Paris servers have all
+    data records, only London can modify London records, and Paris can only
+    modify Paris records.
+   </para>
+
+   <para>
+    Such partitioning provides both failover and load balancing.  Failover
+    is achieved because the data resides on both servers, and this is an
+    ideal way to enable failover if the servers share a slow communication
+    channel. Load balancing is possible because read requests can go to any
+    of the servers, and write requests are split among the servers.  Of
+    course, the communication to keep all the servers up-to-date adds
+    overhead, so ideally the write load should be low, or localized as in
+    the London/Paris example above.
+   </para>
+
+   <para>
+    Data partitioning is usually handled by application code, though rules
+    and triggers can be used to keep the read-only data sets current.  Slony-I
+    can also be used in such a setup.  While Slony-I replicates only entire
+    tables, London and Paris can be placed in separate tables, and
+    inheritance can be used to access both tables using a single table name.
+   </para>
+  </sect1>
+
   <sect1 id="clustering-for-parallel-query-execution">
    <title>Clustering For Parallel Query Execution</title>

*** doc/src/sgml/failover.sgml    2006-11-15 09:10:53.000000000 +0100
--- doc/src/sgml/failover.sgml    2006-11-15 11:11:12.000000000 +0100
***************
*** 83,89 ****
    </para>
   </sect1>

!  <sect1 id="warm-standby-using-point-in-time-recovery">
    <title>Warm Standby Using Point-In-Time Recovery</title>

    <para>
--- 83,107 ----
    </para>
   </sect1>

!  <sect1 id="single-master-replication">
!   <indexterm><primary>single-master</></>
!
!   <title>Single-Master Replication</title>
!
!    <para>
!     When using only one server to process data-modifying transactions we
!     speak of a master server and Single-Master Replication.  It is used
!     to allow one or more slave servers to stay current and therefore
!     provide a quick failover for high availability.
!    </para>
!
!    <para>
!     Single-Master Replication is always asynchronous to allow the master
!     to process data-modifying transactions at full speed, but with the
!     downside of possible data loss during failover.
!    </para>
!
!  <sect2 id="warm-standby-using-point-in-time-recovery">
    <title>Warm Standby Using Point-In-Time Recovery</title>

    <para>
***************
*** 94,102 ****
     made the new master database server.  This is asynchronous and
     can only be done for the entire database server.
    </para>
!  </sect1>

!  <sect1 id="continuously-running-replication-server">
    <title>Continuously Running Replication Server</title>

    <para>
--- 112,120 ----
     made the new master database server.  This is asynchronous and
     can only be done for the entire database server.
    </para>
!  </sect2>

!  <sect2 id="continuously-running-replication-server">
    <title>Continuously Running Replication Server</title>

    <para>
***************
*** 112,121 ****
     granularity.  It updates the backup server in batches, so the replication
     is asynchronous and might lose data during a fail over.
    </para>
   </sect1>

!  <sect1 id="query-broadcast-load-balancing">
!   <title>Query Broadcast Load Balancing</title>

    <para>
     Query broadcast load balancing is accomplished by having a
--- 130,160 ----
     granularity.  It updates the backup server in batches, so the replication
     is asynchronous and might lose data during a fail over.
    </para>
+  </sect2>
+
   </sect1>

!
!  <sect1 id="Multi-Master Replication">
!
!   <indexterm><primary>multi-master</></>
!
!   <title>Multi-Master Replication</title>
!
!   <para>
!    In clustering, each server can accept write requests, and modified
!    data is transmitted from the original server to every other
!    server before each transaction commits.  Heavy write activity
!    can cause excessive locking, leading to poor performance.  In
!    fact, write performance is often worse than that of a single
!    server.  Read requests can be sent to any server.  Clustering
!    is best for mostly read workloads, though its big advantage is
!    that any server can accept write requests — there is no need
!    to partition workloads between read/write and read-only servers.
!   </para>
!
!  <sect2 id="query-broadcast">
!   <title>Query Broadcasting</title>

    <para>
     Query broadcast load balancing is accomplished by having a
***************
*** 138,170 ****
     care must be taken that all transactions either commit or abort
     on all servers  Pgpool is an example of this type of replication.
    </para>
!  </sect1>
!
!  <sect1 id="clustering-for-load-balancing">
!   <title>Clustering For Load Balancing</title>
!
!   <para>
!    In clustering, each server can accept write requests, and modified
!    data is transmitted from the original server to every other
!    server before each transaction commits.  Heavy write activity
!    can cause excessive locking, leading to poor performance.  In
!    fact, write performance is often worse than that of a single
!    server.  Read requests can be sent to any server.  Clustering
!    is best for mostly read workloads, though its big advantage is
!    that any server can accept write requests — there is no need
!    to partition workloads between read/write and read-only servers.
!   </para>

-   <para>
-    Clustering is implemented by <productname>Oracle</> in their
-    <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
-    does not offer this type of load balancing, though
-    <productname>PostgreSQL</> two-phase commit (<xref
-    linkend="sql-prepare-transaction"
-    endterm="sql-prepare-transaction-title"> and <xref
-    linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
-    can be used to implement this in application code or middleware.
-   </para>
   </sect1>

   <sect1 id="data-partitioning">
--- 177,184 ----
     care must be taken that all transactions either commit or abort
     on all servers  Pgpool is an example of this type of replication.
    </para>
!  </sect2>

   </sect1>

   <sect1 id="data-partitioning">
*** doc/src/sgml/failover.sgml    2006-11-15 11:11:12.000000000 +0100
--- doc/src/sgml/failover.sgml    2006-11-15 11:05:58.000000000 +0100
***************
*** 141,181 ****

    <title>Multi-Master Replication</title>

!   <para>
!    In clustering, each server can accept write requests, and modified
!    data is transmitted from the original server to every other
!    server before each transaction commits.  Heavy write activity
!    can cause excessive locking, leading to poor performance.  In
!    fact, write performance is often worse than that of a single
!    server.  Read requests can be sent to any server.  Clustering
!    is best for mostly read workloads, though its big advantage is
!    that any server can accept write requests — there is no need
!    to partition workloads between read/write and read-only servers.
!   </para>

   <sect2 id="query-broadcast">
    <title>Query Broadcasting</title>

    <para>
!    Query broadcast load balancing is accomplished by having a
!    program intercept every SQL query and send it to all servers.
!    This is unique because most replication solutions have the write
!    server propagate its changes to the other servers.  With query
!    broadcasting, each server operates independently.  Read-only
!    queries can be sent to a single server because there is no need
!    for all servers to process it.
    </para>

    <para>
!    One limitation of this solution is that functions like
!    <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
!    sequences can have different values on different servers.  This
!    is because each server operates independently, and because SQL
!    queries are broadcast (and not actual modified rows).  If this
!    is unacceptable, applications must query such values from a
!    single server and then use those values in write queries.  Also,
!    care must be taken that all transactions either commit or abort
!    on all servers  Pgpool is an example of this type of replication.
    </para>
   </sect2>

--- 141,215 ----

    <title>Multi-Master Replication</title>

!    <para>
!     With Multi-Master Replication, each server can accept write requests,
!     and modified data is transmitted to all servers.  Heavy write activity
!     causes network traffic and excessive locking, leading to poor performance
!     especially for synchronous Multi-Master Replication.  The write
!     performance can therefore be worse than that of a single server. Thus
!     Multi-Master Replication works best for mostly read workloads, though
!     its big advantage is that any server can accept write requests —
!     there is no need to partition workloads between read/write and read-only
!     servers.
!    </para>
!
!    <para>
!     Because every server has a consistent copy (replica) of the data, read
!     requests can be sent to any server.  In asynchronous Multi-Master
!     Replication the servers replica diverge until re-synchronized.
!     Likewise, conflicts in committed transactions can only be detected during
!     re-synchronization.  So either the application has to ensure not to send
!     conflicting transactions or the conflicts have to be resolved somehow
!     during re-synchronization of the servers.
!    </para>

   <sect2 id="query-broadcast">
    <title>Query Broadcasting</title>

    <para>
!    Query broadcasting is often accomplished by having a program intercept
!    data-modifying SQL queries and send them to all servers.  With query
!    broadcasting, each server operates independently, thus every
!    data-modifying transaction has to be processed on every server.  Pgpool
!    is an example of this type of replication.
!   </para>
!
!   <para>
!    One challenge in implementing this solution is that functions like
!    <function>random()</> or <function>CURRENT_TIMESTAMP</> can return
!    different values on different servers.  So if only plain SQL gets
!    transmitted, the application has to cope with these differences.
!    Some solutions try to correctly handle these situations by executing
!    the non-deterministic functions on only one server and propagating
!    the result together with the rest of the SQL to the others.
!   </para>
!  </sect2>
!
!  <sect2 id="two-phase-commit-replication">
!   <title>Using Two Phase Commit</title>
!
!   <para>
!    <productname>PostgreSQL</productname> offers two-phase commit
!    (<xref linkend="sql-prepare-transaction"
!    endterm="sql-prepare-transaction-title"> and <xref
!    linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">
!    which can be used to implement synchronous Multi-Master replication
!    in application code or middleware.  As with Query Broadcasting, care
!    has to be taken with non-deterministic functions.
    </para>
+  </sect2>
+
+  <sect2 id="distributed-shared-memory-replication">
+   <title>Using Distributed Shared Memory</title>

    <para>
!    A different approach is to use Distributed Shared Memory to propagate
!    the modified data and all the locks of data-modifying transactions.
!    This can be done on the level of the operating system, i.e. with
!    OpenMosix or within the database itself.  However, in both methods
!    generate a lot of network traffic and do not scale very well for that
!    reason.  Pgpool-II is an effort to implement Distributed Shared Memory
!    Replication for PostgreSQL.
    </para>
   </sect2>


pgsql-patches by date:

Previous
From: "Pavan Deolasee"
Date:
Subject: Frequent Update - Heap Overflow Tuple (HOT) patch
Next
From: "Simon Riggs"
Date:
Subject: Caveat Caveat