Thread: Re: [HACKERS] Replication documentation addition

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
With no new additions submitted today, I have moved my text into our
SGML documentation:

    http://momjian.us/main/writings/pgsql/sgml/failover.html

Please let me know what additional changes are needed.

---------------------------------------------------------------------------

bruce wrote:
> Richard Troy wrote:
> >
> > > Here is a new replication documentation section I want to add for 8.2:
> > >
> > >     ftp://momjian.us/pub/postgresql/mypatches/replication
> > >
> >
> > ...Read the document, as promissed...
> >
> > First paragraph, "(fail over)" is inconsistent with title, "failover", as
> > are other spots throughout the document. The whole document should be
> > consistent and I vote for "failover" and not "fail over."
>
> OK.  Fixed to "failover"
>
> > Fourth paragraph, "This "sync problem" is the fundamental difficulty for
> > servers working together"; "Sync problem" hasn't been defined. Actually,
> > you're talking about the consistent attribute of the "acid" properties of
> > all competent databases: Atomic, Consistency, Isolation, and Durability.
> > At least define the term you are using - probably most easily done in the
> > preceeding paragraph.
>
> OK, "sync problem" term removed, and spelled out fully.
>
> > The fifth paragraph needs a lot more help, I think. Howabout this
> > alternative:
> >
> > So called "two phaised commit" was developed as a strategy in which two or
> > more databases are updated simultaneously and none of the data is
> > committed until all are committed. This guarantees consistency between the
> > databases with all propagation delay being absorbed by the writer at write
> > time. There are times when this propagation delay is large, so sometimes
> > alternatives are worked out which we'll call here "asynchronous updates,"
> > however, in these cases, there is always a window of time in which some
> > transaction can be lost should a failure occurr. For this reason,
> > asynchronous updates are only used when the possibility of such losses is
> > acceptible.
>
> I have modified the paragraph to use some of your terms.
>
> > Paragraphs six through to "shared disk failover" seem very awkward to me.
> > I don't like them at all.
> >
> > "Shared disk failover" has nothing to do with "the sync problem" as it's
> > not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue.
> > Further, it also has nothing to do with disk arrays, though it is often
> > used with RAID to help avoid disk based corruption problems.
>
> Yes, please see updated version.  I removed the sync problem term from
> there.
>
> > The point about Warm Standby needs to include a warning about WAL that it
> > MUST be sensitive to the semantics of the database design or else it's
> > fatally flawed. I'm talking about "referential integrety". That is to say,
> > it's inappropriate to capture updates on a table by table basis, as some
> > such systems do, (I have no idea what's done by anyone in the PG world on
> > this right now) because an update to one table (esp. inserts) very often
> > go hand in glove with updates in other tables and to get one without the
> > other can corrupt a database.
>
> We don't have that problem.  We recover only full transactions.
>
> > The description of "Continuously running replication server" should
> > include the critical caveat - repeated if you think it's already said
> > elsewhere - that it is ONLY suitable for applications in which a loss of
> > (missing) update data doesn't matter. For example, an airline reservation
> > system would be an inappropriate application for such a "solution" because
> > what seats are available cannot be guaranteed to be correct.
>
> I have added note about data loss for the Slony item.
>
> > Regarding data partitioning, I strongly disagree with the opening sentence
> > in that it doesn't split a database into sets, it splits tables into sets.
>
> OK, changed.
>
> > Data partitioning is often done within a single database on a single
> > server and therefore, as a concept, has nothing whatsoever to do with
> > different servers. Similarly, the second paragraph of this section is
>
> Uh, why would someone split things up like that on a single server?
>
> > problematic. Please define your term first, then talk about some
> > implementations - this is muddying the water. Further, there are both
> > vertical and horizontal partitioning - you mention neither - and each has
> > its own distinct uses. If partitioning is mentioned, it should be more
> > complete.
>
> Uh, what exactly needs to be defined.
>
> > Next, Query Broadcast Load Balancing... also needs a lot of work. First,
> > it's foremost in my memory that sending read queries everywhere and
> > returning the first result set back is a key way to improve application
> > performance at the cost of additional load on other systems - I guess
> > that's not at all what the document is after here, but it's a worthy part
> > of a dialogue on broadcasting queries. In other words, this has more parts
> > to it than just what the document now entertains. Secondly, the document
>
> Uh, do we want to go into that here?  I guess I could.
>
> > doesn't address _at_all_ whether this is a two-phaise-commit environment
> > or not. If not, how are updates managed? If each server operates
> > independently and one of them fails, what do you do then? How do you know
> > _any_ server got an insert/update? ...  Each server _can't_ operate
> > independently unless the application does its own insert/update commits to
> > every one of them - and that can't be fast, nor does it load balance,
> > though it may contribute to superior uptime performance by the
> > application.
>
> I think having the application middle layer do the commits is how it
> works now.  Can someone explain how pgpool works, or should we mention
> how two-phase commit has to be done here?  pgpool2 has additional
> features.
>
> > Next up; I'm not aware of any current products or projects that provide
> > parallel query execution, though Informix might - I can ask a colleague or
> > two. Either way, it's probably best to simply define the term (perhaps in
> > a little more detail), and not mention solutions - they change with time
> > anyway.
>
> Actually, Bizgres MPP, based on PostgreSQL, does this, but mostly for
> read-only queries.
>
> > While I've never used Oracle's clustering tools, I've read up on them and
> > have customers who use them, and I think this description of Oracle
> > clustering is a mis-read on what the Oracle system actually does. A check
> > with a true Oracle clustering expert is in order here.
>
> OK, would someone please comment?
>
> > Hope this helps. If asked, I'm willing to (re)write some of the bits
> > discussed above.
>
> Yes, please review the URL and let me know what else to change.  Thanks.
>
> --
>   Bruce Momjian   bruce@momjian.us
>   EnterpriseDB    http://www.enterprisedb.com
>
>   + If your life is a hard drive, Christ can be your backup. +

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Chris Browne
Date:
bruce@momjian.us (Bruce Momjian) writes:
> With no new additions submitted today, I have moved my text into our
> SGML documentation:
>
>     http://momjian.us/main/writings/pgsql/sgml/failover.html
>
> Please let me know what additional changes are needed.

It's looking a lot improved to me...

There are still numerous places where it needs s/Slony/Slony-I/g
because there is more than one thing out there called "Slony," only
one of which is the single-master-to-multiple-subscribers-asynchronous
replication system...

<http://momjian.us/main/writings/pgsql/sgml/query-broadcast-load-balancing.html>

"This can be complex to set up because functions like random() and
CURRENT_TIMESTAMP will have different values on different servers, and
sequences should be consistent across servers."

It doesn't make sense to call this "complex to set up."  This problem
isn't about complexity of setup; it is about whether updates are
processed identically on different hosts.

Perhaps better:

"Query broadcasting can break down such that servers fall out of sync
if the queries have nondeterministic behavior.  For instance,
functions like random(), CURRENT_TIMESTAMP, and
nextval('some_sequence') will take on different values on different
servers.  Care must be taken at the application level to make sure
that queries are all fully deterministic and that they either COMMIT
or ABORT on all servers."

<http://momjian.us/main/writings/pgsql/sgml/clustering-for-load-balancing.html>
"24.6. Clustering For Load Balancing

In clustering, each server can accept write requests, and these write
requests are broadcast from the original server to all other servers
before each transaction commits. Under heavy load, this can cause
excessive locking and performance degradation. It is implemented by
Oracle in their RAC product. PostgreSQL does not offer this type of
load balancing, though PostgreSQL two-phase commit can be used to
implement this in application code or middleware."

Something doesn't feel entirely right here...

How about...

"24.6. Multimaster Replication For Load Balancing

In this scenario, each server can accept write requests, which are
broadcast from the original server to all other servers before each
transaction commits in order to ensure consistency.  Unfortunately,
under heavy load, the cost of distributing locks across servers can
lead to substantial performance degradation. It is implemented by
Oracle in their RAC product. PostgreSQL does not offer this type of
load balancing, though PostgreSQL two-phase commit using <xref
linkend="sql-prepare-transaction-title"> and <xref linkend=
"sql-commit-prepared-title"> may be used to implement this in
application code or middleware.

The communications costs involved in distributing locks and writes
have the result that write operations are considerably more expensive
than they would be on a single server.  In general, the cost of
distributed locking means that this clustering approach is only usable
across a cluster of servers at a local site.

There will only be a performance "win" if the cluster mostly processes
read-only traffic that the cluster can distribute across a larger
number of database servers.  Write performance generally degrades a
fair bit as compared to using a single database server.  Reliability
should be enhanced since the cluster should be able to continue work
even if some of the members of the cluster should fail."

<http://momjian.us/main/writings/pgsql/sgml/clustering-for-parallel-query-execution.html>

"24.7. Clustering For Parallel Query Execution

This allows multiple servers to work on a single query. One possible
way this could work is for the data to be split among servers and for
each server to execute its part of the query and results sent to a
central server to be combined and returned to the user. There
currently is no PostgreSQL open source solution for this."

This seems a bit thin.

"24.7. Clustering For Parallel Query Execution

This allows multiple servers to work concurrently on a single query,
analagous to the way RAID permits multiple disk drives to respond
concurrently to disk I/O requests.

One way this could work is for the data to be partitioned across the
servers, where each server executes its part of the query, submitting
results to a central server to be combined and returned to the user.
There currently is no PostgreSQL open source solution for this."
--
select 'cbbrowne' || '@' || 'acm.org';
http://cbbrowne.com/info/advocacy.html
Why do we put suits in a garment bag, and put garments in a suitcase?

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Chris Browne wrote:
> bruce@momjian.us (Bruce Momjian) writes:
> > With no new additions submitted today, I have moved my text into our
> > SGML documentation:
> >
> >     http://momjian.us/main/writings/pgsql/sgml/failover.html
> >
> > Please let me know what additional changes are needed.
>
> It's looking a lot improved to me...
>
> There are still numerous places where it needs s/Slony/Slony-I/g
> because there is more than one thing out there called "Slony," only
> one of which is the single-master-to-multiple-subscribers-asynchronous
> replication system...

Fixed.

> <http://momjian.us/main/writings/pgsql/sgml/query-broadcast-load-balancing.html>
>
> "This can be complex to set up because functions like random() and
> CURRENT_TIMESTAMP will have different values on different servers, and
> sequences should be consistent across servers."
>
> It doesn't make sense to call this "complex to set up."  This problem
> isn't about complexity of setup; it is about whether updates are
> processed identically on different hosts.
>
> Perhaps better:
>
> "Query broadcasting can break down such that servers fall out of sync
> if the queries have nondeterministic behavior.  For instance,
> functions like random(), CURRENT_TIMESTAMP, and
> nextval('some_sequence') will take on different values on different
> servers.  Care must be taken at the application level to make sure
> that queries are all fully deterministic and that they either COMMIT
> or ABORT on all servers."

I redid the section with:

   Because each server operates independently, functions like
   <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
   sequences can have different values on different servers.  If
   this is unacceptable, applications must query such values from
   a single server and then use those values in write queries.
   Also, care must also be taken that all transactions either commit
   or abort on all servers  Pgpool is an example of this type of
   replication.

> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-load-balancing.html>
> "24.6. Clustering For Load Balancing
>
> In clustering, each server can accept write requests, and these write
> requests are broadcast from the original server to all other servers
> before each transaction commits. Under heavy load, this can cause
> excessive locking and performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit can be used to
> implement this in application code or middleware."
>
> Something doesn't feel entirely right here...
>
> How about...
>
> "24.6. Multimaster Replication For Load Balancing
>
> In this scenario, each server can accept write requests, which are
> broadcast from the original server to all other servers before each
> transaction commits in order to ensure consistency.  Unfortunately,
> under heavy load, the cost of distributing locks across servers can
> lead to substantial performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit using <xref
> linkend="sql-prepare-transaction-title"> and <xref linkend=
> "sql-commit-prepared-title"> may be used to implement this in
> application code or middleware.
>
> The communications costs involved in distributing locks and writes
> have the result that write operations are considerably more expensive
> than they would be on a single server.  In general, the cost of
> distributed locking means that this clustering approach is only usable
> across a cluster of servers at a local site.
>
> There will only be a performance "win" if the cluster mostly processes
> read-only traffic that the cluster can distribute across a larger
> number of database servers.  Write performance generally degrades a
> fair bit as compared to using a single database server.  Reliability
> should be enhanced since the cluster should be able to continue work
> even if some of the members of the cluster should fail."

Your description was too detailed, but I took some of your concepts:

  <para>
   In clustering, each server can accept write requests, and these
   write requests are broadcast from the original server to all
   other servers before each transaction commits.  Heavy write
   activity can cause excessive locking, leading to poor performance.
   In fact, write performance is often worse than that of a single
   server.  Read requests can be sent to any server.  Clustering
   is best for mostly read workloads, though its big advantage is
   that any server can accept write requests --- there is no need
   to partition workloads between read/write and read-only servers.
  </para>

  <para>
   Clustering is implemented by <productname>Oracle</> in their
   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
   does not offer this type of load balancing, though
   <productname>PostgreSQL</> two-phase commit (<xref
   linkend="sql-prepare-transaction-title"> and <xref linkend=
   "sql-commit-prepared-title">) can be used to implement this in
   application code or middleware.
  </para>

>
> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-parallel-query-execution.html>
>
> "24.7. Clustering For Parallel Query Execution
>
> This allows multiple servers to work on a single query. One possible
> way this could work is for the data to be split among servers and for
> each server to execute its part of the query and results sent to a
> central server to be combined and returned to the user. There
> currently is no PostgreSQL open source solution for this."
>
> This seems a bit thin.
>
> "24.7. Clustering For Parallel Query Execution
>
> This allows multiple servers to work concurrently on a single query,
> analagous to the way RAID permits multiple disk drives to respond
> concurrently to disk I/O requests.
>
> One way this could work is for the data to be partitioned across the
> servers, where each server executes its part of the query, submitting
> results to a central server to be combined and returned to the user.
> There currently is no PostgreSQL open source solution for this."

I took some of your wording:

   This allows multiple servers to work concurrently on a single
   query.  One possible way this could work is for the data to be
   split among servers and for each server to execute its part of
   the query and results sent to a central server to be combined
   and returned to the user.  There currently is no
   <productname>PostgreSQL</> open source solution for this.

Because RAID is often used for high availability, I thought mentioning
it in this context was too complicated.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Jeff Frost
Date:
On Tue, 14 Nov 2006, Bruce Momjian wrote:

> Your description was too detailed, but I took some of your concepts:
>
>  <para>
>   In clustering, each server can accept write requests, and these
>   write requests are broadcast from the original server to all
>   other servers before each transaction commits.  Heavy write
>   activity can cause excessive locking, leading to poor performance.
>   In fact, write performance is often worse than that of a single
>   server.  Read requests can be sent to any server.  Clustering
>   is best for mostly read workloads, though its big advantage is
>   that any server can accept write requests --- there is no need
>   to partition workloads between read/write and read-only servers.
>  </para>
>
>  <para>
>   Clustering is implemented by <productname>Oracle</> in their
>   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
>   does not offer this type of load balancing, though
>   <productname>PostgreSQL</> two-phase commit (<xref
>   linkend="sql-prepare-transaction-title"> and <xref linkend=
>   "sql-commit-prepared-title">) can be used to implement this in
>   application code or middleware.
>  </para>

Bruce,

Continuent's uni/cluster middleware product implements this type of
clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far
we want to get into listing external products.


--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
>
> > Your description was too detailed, but I took some of your concepts:
> >
> >  <para>
> >   In clustering, each server can accept write requests, and these
> >   write requests are broadcast from the original server to all
> >   other servers before each transaction commits.  Heavy write
> >   activity can cause excessive locking, leading to poor performance.
> >   In fact, write performance is often worse than that of a single
> >   server.  Read requests can be sent to any server.  Clustering
> >   is best for mostly read workloads, though its big advantage is
> >   that any server can accept write requests --- there is no need
> >   to partition workloads between read/write and read-only servers.
> >  </para>
> >
> >  <para>
> >   Clustering is implemented by <productname>Oracle</> in their
> >   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
> >   does not offer this type of load balancing, though
> >   <productname>PostgreSQL</> two-phase commit (<xref
> >   linkend="sql-prepare-transaction-title"> and <xref linkend=
> >   "sql-commit-prepared-title">) can be used to implement this in
> >   application code or middleware.
> >  </para>
>
> Bruce,
>
> Continuent's uni/cluster middleware product implements this type of
> clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far
> we want to get into listing external products.

We had a long discussion about that and felt that recommending
commercial products or even every open source project was too much.  The
idea was that we should reference a web page that has them all mentioned,
but no one has set one up yet.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Jeff Frost
Date:
On Tue, 14 Nov 2006, Bruce Momjian wrote:

> Jeff Frost wrote:
>> On Tue, 14 Nov 2006, Bruce Momjian wrote:
>>
>>> Your description was too detailed, but I took some of your concepts:
>>>
>>>  <para>
>>>   In clustering, each server can accept write requests, and these
>>>   write requests are broadcast from the original server to all
>>>   other servers before each transaction commits.  Heavy write
>>>   activity can cause excessive locking, leading to poor performance.
>>>   In fact, write performance is often worse than that of a single
>>>   server.  Read requests can be sent to any server.  Clustering
>>>   is best for mostly read workloads, though its big advantage is
>>>   that any server can accept write requests --- there is no need
>>>   to partition workloads between read/write and read-only servers.
>>>  </para>
>>>
>>>  <para>
>>>   Clustering is implemented by <productname>Oracle</> in their
>>>   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
>>>   does not offer this type of load balancing, though
>>>   <productname>PostgreSQL</> two-phase commit (<xref
>>>   linkend="sql-prepare-transaction-title"> and <xref linkend=
>>>   "sql-commit-prepared-title">) can be used to implement this in
>>>   application code or middleware.
>>>  </para>
>>
>> Bruce,
>>
>> Continuent's uni/cluster middleware product implements this type of
>> clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far
>> we want to get into listing external products.
>
> We had a long discussion about that and felt that recommending
> commercial products or even every open source project was too much.  The
> idea was that we should reference a web page that has them all mentioned,
> but no one has set one up yet.

That makes sense, I just hate to see us say something like "Oracle can do
this with RAC but PostgreSQL cannot."

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jeff Frost wrote:
> >>>  <para>
> >>>   Clustering is implemented by <productname>Oracle</> in their
> >>>   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
> >>>   does not offer this type of load balancing, though
> >>>   <productname>PostgreSQL</> two-phase commit (<xref
> >>>   linkend="sql-prepare-transaction-title"> and <xref linkend=
> >>>   "sql-commit-prepared-title">) can be used to implement this in
> >>>   application code or middleware.
> >>>  </para>
> >>
> >> Bruce,
> >>
> >> Continuent's uni/cluster middleware product implements this type of
> >> clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far
> >> we want to get into listing external products.
> >
> > We had a long discussion about that and felt that recommending
> > commercial products or even every open source project was too much.  The
> > idea was that we should reference a web page that has them all mentioned,
> > but no one has set one up yet.
>
> That makes sense, I just hate to see us say something like "Oracle can do
> this with RAC but PostgreSQL cannot."

Agreed.  I think we would mention any PostgreSQL solution for this, even
if it is not open source.  We mention solutions as examples in this part
of the documentation.

FYI, as far as I know, Continuent's solution is "Query Broadcast Load
Balancing", not clustering.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Jeff Frost
Date:
On Tue, 14 Nov 2006, Bruce Momjian wrote:

>>> We had a long discussion about that and felt that recommending
>>> commercial products or even every open source project was too much.  The
>>> idea was that we should reference a web page that has them all mentioned,
>>> but no one has set one up yet.
>>
>> That makes sense, I just hate to see us say something like "Oracle can do
>> this with RAC but PostgreSQL cannot."
>
> Agreed.  I think we would mention any PostgreSQL solution for this, even
> if it is not open source.  We mention solutions as examples in this part
> of the documentation.
>
> FYI, as far as I know, Continuent's solution is "Query Broadcast Load
> Balancing", not clustering.

I would speculate that your terminology is slightly more accurate than mine.
The do query broadcast, but they also do a bit more with it than that as they
evaluate many of the non deterministic write queries on a particular server
and update the broadcast query so each db gets the same value.

I guess middleware of this sort automatically ends up in the query broadcast
category.  It just sounds awfully similar to the description of cluster for
load balancing:

In clustering, each server can accept write requests, and these write requests
are broadcast from the original server to all other servers before each
transaction commits.

I guess it's kind of a fine line how it gets defined?

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jeff Frost wrote:
> > FYI, as far as I know, Continuent's solution is "Query Broadcast Load
> > Balancing", not clustering.
>
> I would speculate that your terminology is slightly more accurate than mine.
> The do query broadcast, but they also do a bit more with it than that as they
> evaluate many of the non deterministic write queries on a particular server
> and update the broadcast query so each db gets the same value.
>
> I guess middleware of this sort automatically ends up in the query broadcast
> category.  It just sounds awfully similar to the description of cluster for
> load balancing:
>
> In clustering, each server can accept write requests, and these write requests
> are broadcast from the original server to all other servers before each
> transaction commits.
>
> I guess it's kind of a fine line how it gets defined?

Hmmm.  Interesting.  Does anyone else have details or an opinion on
this?  The fact that there is something sitting above the servers seems
to be the defining issue of calling it query broadcast.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Jeff Frost
Date:
On Tue, 14 Nov 2006, Bruce Momjian wrote:

>> In clustering, each server can accept write requests, and these write requests
>> are broadcast from the original server to all other servers before each
>> transaction commits.
>>
>> I guess it's kind of a fine line how it gets defined?
>
> Hmmm.  Interesting.  Does anyone else have details or an opinion on
> this?  The fact that there is something sitting above the servers seems
> to be the defining issue of calling it query broadcast.

My thinking on the definition of clustering was that there is some smarts for
graceful failover and automated or semi-automated ways of bringing failed DB
servers back up to date and online with the rest of the servers in the
cluster.  All servers need to be able to accept writes, but do we
differentiate on where the writes originated (i.e. middleware or another
postgresql server) or on functionality?

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
>
> >> In clustering, each server can accept write requests, and these write requests
> >> are broadcast from the original server to all other servers before each
> >> transaction commits.
> >>
> >> I guess it's kind of a fine line how it gets defined?
> >
> > Hmmm.  Interesting.  Does anyone else have details or an opinion on
> > this?  The fact that there is something sitting above the servers seems
> > to be the defining issue of calling it query broadcast.
>
> My thinking on the definition of clustering was that there is some smarts for
> graceful failover and automated or semi-automated ways of bringing failed DB
> servers back up to date and online with the rest of the servers in the
> cluster.  All servers need to be able to accept writes, but do we

No, even replication servers can have that.

> differentiate on where the writes originated (i.e. middleware or another
> postgresql server) or on functionality?

Fundamentally, broadcast means the queries are being propogated outside
the server, with the benefits and limitations inherent in that.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Jeff Frost
Date:
On Tue, 14 Nov 2006, Bruce Momjian wrote:

>> My thinking on the definition of clustering was that there is some smarts for
>> graceful failover and automated or semi-automated ways of bringing failed DB
>> servers back up to date and online with the rest of the servers in the
>> cluster.  All servers need to be able to accept writes, but do we
>
> No, even replication servers can have that.
>
>> differentiate on where the writes originated (i.e. middleware or another
>> postgresql server) or on functionality?
>
> Fundamentally, broadcast means the queries are being propogated outside
> the server, with the benefits and limitations inherent in that.

I'd definitely have to agree with you on that.  I guess I'm trying to decide
what differentiates clustering for load balancing from query broadcast based
on your text.  Maybe just don't use the word broadcast here:

"In clustering, each server can accept write requests, and these write
requests are broadcast from the original server to all other servers before
each transaction commits."

Unfortunately, I can't seem to come up with anything more clever.

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
>
> >> My thinking on the definition of clustering was that there is some smarts for
> >> graceful failover and automated or semi-automated ways of bringing failed DB
> >> servers back up to date and online with the rest of the servers in the
> >> cluster.  All servers need to be able to accept writes, but do we
> >
> > No, even replication servers can have that.
> >
> >> differentiate on where the writes originated (i.e. middleware or another
> >> postgresql server) or on functionality?
> >
> > Fundamentally, broadcast means the queries are being propogated outside
> > the server, with the benefits and limitations inherent in that.
>
> I'd definitely have to agree with you on that.  I guess I'm trying to decide
> what differentiates clustering for load balancing from query broadcast based
> on your text.  Maybe just don't use the word broadcast here:
>
> "In clustering, each server can accept write requests, and these write
> requests are broadcast from the original server to all other servers before
> each transaction commits."
>
> Unfortunately, I can't seem to come up with anything more clever.

Basically, when you are broadcasting outside the server, you are
broadcasting SQL queries, and those queries do not have information
about non-deterministic functions and have issues with universal commits
on all node.

I think I now see your point about using the word "broadcast" for both
clustering and middle-ware broadcast.  Let me find some new wording and
repost.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Jeff Frost
Date:
On Tue, 14 Nov 2006, Bruce Momjian wrote:

>> "In clustering, each server can accept write requests, and these write
>> requests are broadcast from the original server to all other servers before
>> each transaction commits."
>>
>> Unfortunately, I can't seem to come up with anything more clever.
>
> Basically, when you are broadcasting outside the server, you are
> broadcasting SQL queries, and those queries do not have information
> about non-deterministic functions and have issues with universal commits
> on all node.

Ahh..I like this explanation, because the inter-server communication in
clustering is not necessarily SQL queries.


--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
>
> >> "In clustering, each server can accept write requests, and these write
> >> requests are broadcast from the original server to all other servers before
> >> each transaction commits."
> >>
> >> Unfortunately, I can't seem to come up with anything more clever.
> >
> > Basically, when you are broadcasting outside the server, you are
> > broadcasting SQL queries, and those queries do not have information
> > about non-deterministic functions and have issues with universal commits
> > on all node.
>
> Ahh..I like this explanation, because the inter-server communication in
> clustering is not necessarily SQL queries.

OK, I have updated the documentation with the attached patch, which
clarifies SQL broadcast vs. modified row propogation.  Current version
is at:

    http://momjian.us/main/writings/pgsql/sgml/failover.html

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/failover.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/failover.sgml,v
retrieving revision 1.5
diff -c -c -r1.5 failover.sgml
*** doc/src/sgml/failover.sgml    14 Nov 2006 22:25:15 -0000    1.5
--- doc/src/sgml/failover.sgml    15 Nov 2006 01:06:42 -0000
***************
*** 149,171 ****
    <title>Query Broadcast Load Balancing</title>

    <para>
!    Query broadcast load balancing is accomplished by having a program
!    intercept every query and send it to all servers.  Read-only queries can
!    be sent to a single server because there is no need for all servers to
!    process it.  This is unusual because most replication solutions have
!    each write server propagate its changes to the other servers.  With
!    query broadcasting, each server operates independently.
    </para>

    <para>
!    Because each server operates independently, functions like
     <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
!    sequences can have different values on different servers.  If
!    this is unacceptable, applications must query such values from
!    a single server and then use those values in write queries.
!    Also, care must also be taken that all transactions either commit
!    or abort on all servers  Pgpool is an example of this type of
!    replication.
    </para>
   </sect1>

--- 149,173 ----
    <title>Query Broadcast Load Balancing</title>

    <para>
!    Query broadcast load balancing is accomplished by having a
!    program intercept every SQL query and send it to all servers.
!    This is unique because most replication solutions have the write
!    server propagate its changes to the other servers.  With query
!    broadcasting, each server operates independently.  Read-only
!    queries can be sent to a single server because there is no need
!    for all servers to process it.
    </para>

    <para>
!    One limitation of this solution is that functions like
     <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
!    sequences can have different values on different servers.  This
!    is because each server operates independently, and because SQL
!    queries are broadcast (and not actual modified rows).  If this
!    is unacceptable, applications must query such values from a
!    single server and then use those values in write queries.  Also,
!    care must be taken that all transactions either commit or abort
!    on all servers  Pgpool is an example of this type of replication.
    </para>
   </sect1>

***************
*** 173,186 ****
    <title>Clustering For Load Balancing</title>

    <para>
!    In clustering, each server can accept write requests, and these
!    write requests are broadcast from the original server to all
!    other servers before each transaction commits.  Heavy write
!    activity can cause excessive locking, leading to poor performance.
!    In fact, write performance is often worse than that of a single
     server.  Read requests can be sent to any server.  Clustering
     is best for mostly read workloads, though its big advantage is
!    that any server can accept write requests --- there is no need
     to partition workloads between read/write and read-only servers.
    </para>

--- 175,188 ----
    <title>Clustering For Load Balancing</title>

    <para>
!    In clustering, each server can accept write requests, and modified
!    data is transmitted from the original server to every other
!    server before each transaction commits.  Heavy write activity
!    can cause excessive locking, leading to poor performance.  In
!    fact, write performance is often worse than that of a single
     server.  Read requests can be sent to any server.  Clustering
     is best for mostly read workloads, though its big advantage is
!    that any server can accept write requests — there is no need
     to partition workloads between read/write and read-only servers.
    </para>


Re: [HACKERS] Replication documentation addition

From
Markus Schiltknecht
Date:
Hi,

> Jeff Frost wrote:
>> I would speculate that your terminology is slightly more accurate than mine.

I can't help it, but I'm still thinking the terminology in the
replication documentation is somewhat made up.

Bruce Momjian wrote:
> Hmmm.  Interesting.  Does anyone else have details or an opinion on
> this?  The fact that there is something sitting above the servers seems
> to be the defining issue of calling it query broadcast.

I'd argue that "Query Broadcast Load Balancing" and "Clustering For Load
Balancing" are both the same replication type: sync, multi-master. And
the problem they try to solve is the same (Load Balancing).

Listing them as two different types of replication... I don't know. But
we should at least clearly state that both are sync, multi-master
replication algorithms.

Anyway, instead of mocking around any longer I'm trying to come up with
a better proposal... patch will follow.

Regards

Markus



Re: [HACKERS] Replication documentation addition

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> Hmmm.  Interesting.  Does anyone else have details or an opinion on
> this?  The fact that there is something sitting above the servers
> seems to be the defining issue of calling it query broadcast.

Well, clustering is just a general term for putting several things, say,
computers, together to a common cause.  If you cluster a database
system, you need some way to distribute the incoming requests across
the machines, which you can either do on the network layer or on the
application layer.  Sequoia does the latter.  But I don't see
any "broadcasting" in there as a defining quality.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [HACKERS] Replication documentation addition

From
Jim Nasby
Date:
On Nov 14, 2006, at 5:11 PM, Bruce Momjian wrote:
> Basically, when you are broadcasting outside the server, you are
> broadcasting SQL queries, and those queries do not have information
> about non-deterministic functions and have issues with universal
> commits
> on all node.

That's true of simple query broadcasting (ie: pgpool), but not true
of Continuent/Sequoia. Continuent's software adds a lot of additional
features on top of simple query broadcasting, making it far more
robust than simply spewing queries out to every node in the cluster.
You still have to be very careful with how you use it, but not nearly
as much as with simpler solutions.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: [HACKERS] Replication documentation addition

From
Josh Berkus
Date:
Jim,

> That's true of simple query broadcasting (ie: pgpool), but not true  
> of Continuent/Sequoia. Continuent's software adds a lot of additional  
> features on top of simple query broadcasting, making it far more  
> robust than simply spewing queries out to every node in the cluster.  
> You still have to be very careful with how you use it, but not nearly  
> as much as with simpler solutions.

I think the general term is "statement-based replication", not "broadcasting".

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: [HACKERS] Replication documentation addition

From
Markus Schiltknecht
Date:
Hi,

Josh Berkus wrote:
> I think the general term is "statement-based replication", not "broadcasting".

I agree that this is a better description.

Markus



Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Jim Nasby wrote:
> On Nov 14, 2006, at 5:11 PM, Bruce Momjian wrote:
> > Basically, when you are broadcasting outside the server, you are
> > broadcasting SQL queries, and those queries do not have information
> > about non-deterministic functions and have issues with universal
> > commits
> > on all node.
>
> That's true of simple query broadcasting (ie: pgpool), but not true
> of Continuent/Sequoia. Continuent's software adds a lot of additional
> features on top of simple query broadcasting, making it far more
> robust than simply spewing queries out to every node in the cluster.
> You still have to be very careful with how you use it, but not nearly
> as much as with simpler solutions.

Yes, I have heard that Continuent/Sequoia has a process running on each
server that deals with many of the problems with broadcasting.  Not sure
how I should work that into the documentation.  In fact, based on our
description, the improvements Continuent/Sequoia made are probably
clearer.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Jim,
>
> > That's true of simple query broadcasting (ie: pgpool), but not true ?
> > of Continuent/Sequoia. Continuent's software adds a lot of additional ?
> > features on top of simple query broadcasting, making it far more ?
> > robust than simply spewing queries out to every node in the cluster. ?
> > You still have to be very careful with how you use it, but not nearly ?
> > as much as with simpler solutions.
>
> I think the general term is "statement-based replication", not "broadcasting".

Well, the problem is that you can use a statement-based method to
replication from a master to a slave.  I think MySQL used to use this
method, or still does, so I don't think the term "statement-based" is
clear enough, though I am open to other terms than "broadcast".

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Josh Berkus wrote:
> > Jim,
> >
> > > That's true of simple query broadcasting (ie: pgpool), but not true ?
> > > of Continuent/Sequoia. Continuent's software adds a lot of additional ?
> > > features on top of simple query broadcasting, making it far more ?
> > > robust than simply spewing queries out to every node in the cluster. ?
> > > You still have to be very careful with how you use it, but not nearly ?
> > > as much as with simpler solutions.
> >
> > I think the general term is "statement-based replication", not "broadcasting".
>
> Well, the problem is that you can use a statement-based method to
> replication from a master to a slave.  I think MySQL used to use this
> method, or still does, so I don't think the term "statement-based" is
> clear enough, though I am open to other terms than "broadcast".

Oops, I see Markus Schiltknech likes the term "statement-based
replication" better too.  Certainly master-slave communication using
"statement-based replication" has the same drawbacks as the broadcast
method, but I wanted to highlight that the broadcast was happening
outside the server.  Do we need a master/slave "statement-based
replication" item and a middleware broadcast item?

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Josh Berkus wrote:
> > > Jim,
> > >
> > > > That's true of simple query broadcasting (ie: pgpool), but not true ?
> > > > of Continuent/Sequoia. Continuent's software adds a lot of additional ?
> > > > features on top of simple query broadcasting, making it far more ?
> > > > robust than simply spewing queries out to every node in the cluster. ?
> > > > You still have to be very careful with how you use it, but not nearly ?
> > > > as much as with simpler solutions.
> > >
> > > I think the general term is "statement-based replication", not "broadcasting".
> >
> > Well, the problem is that you can use a statement-based method to
> > replication from a master to a slave.  I think MySQL used to use this
> > method, or still does, so I don't think the term "statement-based" is
> > clear enough, though I am open to other terms than "broadcast".
>
> Oops, I see Markus Schiltknech likes the term "statement-based
> replication" better too.  Certainly master-slave communication using
> "statement-based replication" has the same drawbacks as the broadcast
> method, but I wanted to highlight that the broadcast was happening
> outside the server.  Do we need a master/slave "statement-based
> replication" item and a middleware broadcast item?

OK, new text:

 <varlistentry>
  <term>Statement-Based Replication</term>
  <listitem>

   <para>
    In statement-based replication, a program intercepts every SQL
    query and sends it to all servers.  Each server operates
    independently.  Read-only queries can be sent to a single server
    because there is no need for all servers to process it.
   </para>

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Markus Schiltknecht
Date:
Good morning Bruce,

Bruce Momjian wrote:
>> Oops, I see Markus Schiltknech likes the term "statement-based
>> replication" better too.  Certainly master-slave communication using
>> "statement-based replication" has the same drawbacks as the broadcast
>> method, but I wanted to highlight that the broadcast was happening
>> outside the server.  Do we need a master/slave "statement-based
>> replication" item and a middleware broadcast item?

Ah, I see you had a much narrower definition of statement-based
replication in mind. As I've pointed out, there are different
implementations of 'statement-based replication'. I don't know about
sequoia, but Postgres-R fails back to statement based replication in
certain situations. Thus having an external 'program intercept every SQL
query' is absolutely no necessity of this algorithm, it can very well be
done inside the db backend, where you can better catch non-deterministic
functions... but again, that's an implementation detail.

So, do you want to describe pgpool here or do you want to give a more
general description?

>  <varlistentry>
>   <term>Statement-Based Replication</term>
>   <listitem>
>
>    <para>
>     In statement-based replication, a program intercepts every SQL
>     query and sends it to all servers.  Each server operates
>     independently.  Read-only queries can be sent to a single server
>     because there is no need for all servers to process it.
>    </para>

If you want to go for the general description, I think the 'each server
operates independently' is somewhere between confusing and false. And
again, the last sentence applies to all multi-master replication solutions.

Regards

Markus


Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Markus Schiltknecht wrote:
> Good morning Bruce,
>
> Bruce Momjian wrote:
> >> Oops, I see Markus Schiltknech likes the term "statement-based
> >> replication" better too.  Certainly master-slave communication using
> >> "statement-based replication" has the same drawbacks as the broadcast
> >> method, but I wanted to highlight that the broadcast was happening
> >> outside the server.  Do we need a master/slave "statement-based
> >> replication" item and a middleware broadcast item?
>
> Ah, I see you had a much narrower definition of statement-based
> replication in mind. As I've pointed out, there are different
> implementations of 'statement-based replication'. I don't know about
> sequoia, but Postgres-R fails back to statement based replication in
> certain situations. Thus having an external 'program intercept every SQL
> query' is absolutely no necessity of this algorithm, it can very well be
> done inside the db backend, where you can better catch non-deterministic
> functions... but again, that's an implementation detail.
>
> So, do you want to describe pgpool here or do you want to give a more
> general description?

OK, I have updated the title to be "Statement-Based Replication Using
Middleware".  I personally think statement-based replication only makes
sense in middleware because when you are in the backend, you have more
information and can do things better, either by modifying the statement
or passing actual data rows, like Slony does, so I want to restrict this
to middleware like pgpool, and Usogres, which was an early
implementation of this idea.

> >  <varlistentry>
> >   <term>Statement-Based Replication</term>
> >   <listitem>
> >
> >    <para>
> >     In statement-based replication, a program intercepts every SQL
> >     query and sends it to all servers.  Each server operates
> >     independently.  Read-only queries can be sent to a single server
> >     because there is no need for all servers to process it.
> >    </para>
>
> If you want to go for the general description, I think the 'each server
> operates independently' is somewhere between confusing and false. And
> again, the last sentence applies to all multi-master replication solutions.

Am I OK now?

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Replication documentation addition

From
Markus Schiltknecht
Date:
Bruce Momjian wrote:
> OK, I have updated the title to be "Statement-Based Replication Using
> Middleware".  I personally think statement-based replication only makes
> sense in middleware because when you are in the backend,

I completely agree.

> you have more
> information and can do things better, either by modifying the statement
> or passing actual data rows, like Slony does, so I want to restrict this
> to middleware like pgpool, and Usogres, which was an early
> implementation of this idea.

That's fine and reasonable.

> Am I OK now?

The title and first paragraph are fine.

I'd still say that the second paragraph, about limitations is too pgpool
specific. How's that for sequoia?

And I'm unsure what you mean by mentioning 2PC there. Do you have to
'make sure every transaction commits or aborts' yourself with pgpool? Or
did you just want to mention that pgpool does (and has to do) that for you?

Regards

Markus


Re: [HACKERS] Replication documentation addition

From
Bruce Momjian
Date:
Markus Schiltknecht wrote:
> Bruce Momjian wrote:
> > OK, I have updated the title to be "Statement-Based Replication Using
> > Middleware".  I personally think statement-based replication only makes
> > sense in middleware because when you are in the backend,
>
> I completely agree.
>
> > you have more
> > information and can do things better, either by modifying the statement
> > or passing actual data rows, like Slony does, so I want to restrict this
> > to middleware like pgpool, and Usogres, which was an early
> > implementation of this idea.
>
> That's fine and reasonable.
>
> > Am I OK now?
>
> The title and first paragraph are fine.
>
> I'd still say that the second paragraph, about limitations is too pgpool
> specific. How's that for sequoia?

OK, I made it more open-ended:

    If queries are simply broadcast unmodified, functions like
    <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
    sequences would have different values on different servers.
    This is because each server operates independently, and because
    SQL queries are broadcast (and not actual modified rows).  If
    this is unacceptable, either the middleware or the application
    must query such values from a single server and then use those
    values in write queries.  Also, care must be taken that all
    transactions either commit or abort on all servers, perhaps
    using two-phase commit (<xref linkend="sql-prepare-transaction"
    endterm="sql-prepare-transaction-title"> and <xref
    linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
    Pgpool is an example of this type of replication.

> And I'm unsure what you mean by mentioning 2PC there. Do you have to
> 'make sure every transaction commits or aborts' yourself with pgpool? Or
> did you just want to mention that pgpool does (and has to do) that for you?

I am not sure pgpool does that, but perhaps it should.  Looking at the
pgpool web site, it seems it does not use 2PC (see replication_strict):

    http://pgpool.projects.postgresql.org/

    replication_mode

        set this true if you are going to use replication functionality.
        Default is false.

    replication_strict

        If true, pgpool will wait for the completion of the master query
        before sending a query to the secondary server. This is the safest and
        default operating mode for pgpool. Default is true.

The HA docs merely say that 2PC might be a good way to keep the servers
consistent.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +