Re: PostgreSQL HA config recommendations - Mailing list pgsql-general

From Fabio Ugo Venchiarutti
Subject Re: PostgreSQL HA config recommendations
Date
Msg-id 5542D0C7.2020003@vuole.me
Whole thread Raw
In response to Re: PostgreSQL HA config recommendations  (William Dunn <dunnwjr@gmail.com>)
Responses Re: PostgreSQL HA config recommendations  (Alex Gregory <alex@c2company.com>)
List pgsql-general
Point taken. William is right.


My recommendations were unusually pessimistic as I didn't take enough
time to assess global+instantaneous data changes visibility requirements
on your part.


If the cluster is ONLY HA and you don't need to read fresh data off
secondary nodes (E.G.: HA+read load balancing), asynchronous is good
enough in most cases.



On 01/05/15 06:37, William Dunn wrote:
> Alex,
> Note that you should be weary of suggestions to make your replication
> synchronous. Synchronous replication is rarely used for this kind of use
> case (Cisco Jabber) where the most complete durability of the standby is
> not of the utmost concern (as it would be in a banking application). Not
> only will it decrease performance, but since you expect to have only one
> local standby it could actually decrease your availability because if
> your standby went down no transactions would be able to commit on the
> master. See the Synchronous Replication section of the docs for more
> details (http://www.postgresql.org/docs/devel/static/warm-standby.html).
>
> Also note that the suggestion provided by Fabio that you should not have
> your application commit more than one transaction per user operation is
> only applicable in synchronous replication (though since this is for a
> Cisco Jabber, where you neither have control over nor much information
> regarding the number of commits sent by the transaction per user
> operation, that suggestion is not applicable anyway...). In the case of
> asynchronous master-slave replication the typical issue with streaming
> replication latency is that you have transactions going to the master
> and then the application sends a read only transaction to the slave
> before the slave receives the transaction. So long as you don't have the
> application consider the user operation completed before all the
> transactions are committed I don't think having multiple transactions
> would make your replication latency issue any less.
>
>     For example, if you had a calendar application where a user enters
>     event details and creates an event for the calendar. The application
>     may be set up to execute 2 transactions, 1) Add the event and
>     details to the calendar events table and 2) once the event creation
>     transaction returns add the current user as an attendee for that
>     event. In this case both transactions would be going against the
>     master, so how far the slave is behind wouldn't be a factor. Of
>     course it would be faster overall to send the inserts as a single
>     database procedure, but that all goes against the master database so
>     the streaming replication is not a factor in that consideration.
>
>
> *William J. Dunn*
> _willjdunn.com <http://willjdunn.com>_
>
> *William J. Dunn*
> *P* 978-844-4427 | _dunnwjr@gmail.com <mailto:dunnwjr@gmail.com>_
> _dunnw@bu.edu <mailto:dunnw@bu.edu>_
>
> On Thu, Apr 30, 2015 at 9:02 AM, Fabio Ugo Venchiarutti <fabio@vuole.me
> <mailto:fabio@vuole.me>> wrote:
>
>     > WAN delays can cause problems for any replication system; you just have
>     > to be aware of that and not push things too hard (or try and violate the
>     > laws of physics). For example, streaming replication set to be
>     > synchronous crossing the planet is something you'd probably be rather
>     > unhappy with. :)
>
>
>     In my experience streaming replication fits most use cases due to
>     inherent its simplicity and robustness, but you might need to adjust
>     your software design to get the best out of it.
>
>
>     More specifically, latency issues can be heavily mitigated by having
>     application software commit no more than one transaction per user
>     operation, provided 1 x "master<->sync_slave round trip time" is
>     acceptable delay when they submit forms or the like.
>
>     It can get much worse if the application server is on a different
>     geographical node than the DB master. In such case it is
>     realistically beneficial to batch multiple write operations in a
>     single STATEMENT instead.
>     If the replication synchronous slave is on yet another node, the
>     best case (single statement) scenario would be 2 x round trip time.
>     This configuration is more common than you might think as some
>     setups feature remote app servers reading off synchronous slaves at
>     their own physical location but committing against a master that is
>     somewhere else.
>
>
>     Cheers
>
>
>
>
>
>
>
>     On 30/04/15 11:06, Jim Nasby wrote:
>
>         On 4/29/15 1:13 PM, Alex Gregory wrote:
>
>             I was thinking that I could use Slony but then I read that
>             it does not
>             like WAN replication.  I have also read about streaming
>             replication
>             native to Postgres but was not sure how that would work over
>             the WAN.
>             Bucardo seems better for Data Warehousing or multimaster
>             situations
>             which this is not.  That leaves pgpool ii which seems like
>             it would
>             add an extra layer of complexity.
>
>
>         WAN delays can cause problems for any replication system; you
>         just have
>         to be aware of that and not push things too hard (or try and
>         violate the
>         laws of physics). For example, streaming replication set to be
>         synchronous crossing the planet is something you'd probably be
>         rather
>         unhappy with. :)
>
>         I haven't played with Slony in forever, but when I did it loved
>         to lock
>         things. That would not play well with high latency.
>
>         I have run londiste between sites within the same city, and that
>         worked
>         well.
>
>         Bucardo and pg_pool are both based on the idea of replaying SQL
>         statements instead of replicating actual data. They have their
>         uses, but
>         I personally distrust that idea, especially for DR.
>
>             When it comes down to to there are so many choices I am not
>             sure if I
>             need one or a combination of two.    Any help you could
>             provide could
>             be greatly appreciated.
>
>
>         If you want to replicate within a data center then streaming
>         replication
>         is pretty nice, and as a bonus you might be able to do
>         synchronous as
>         well. The downside to streaming rep is that it's binary, so if
>         you ever
>         suffer data corruption you're practically guaranteed that corruption
>         will end up on the replica. Logical replication like londiste or
>         Slony
>         are much more robust against that. You also can't use temporary
>         tables
>         with streaming rep, and you have to replicate the details of ALL
>         activity, including maintenance like VACUUM. In some
>         environments that
>         might be slower than logical replication.
>
>
>
>     --
>     Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>     <mailto:pgsql-general@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-general
>
>


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: psql 8 warm standby strong start, weak finish
Next
From: Mathew Moon
Date:
Subject: Removing and readding bdr nodes