Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date
Msg-id CAASwCXf_VZGG4mOd6Cw92R4LS0Pc7W7SrpD+1XNsigTfQ0sV7A@mail.gmail.com
Whole thread Raw
In response to Proposal: "Causal reads" mode for load balancing reads without stale data  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
+1 to both the feature and the concept of how it's implemented.
Haven't looked at the code though.

This feature would be very useful for us at Trustly.
This would mean we got get rid of an entire system component in our
architecture (=memcached) which we only use to write data which must
be immediately readable at the sync slave after the master commits.
The only such data we currently have is the backoffice sessionid which
must be readable on the slave, otherwise the read-only calls which we
route to the slave might fail because it's missing.

On Wed, Nov 11, 2015 at 6:37 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
>
> Hi hackers,
>
> Many sites use hot standby servers to spread read-heavy workloads over more hardware, or at least would like to.
Thisworks well today if your application can tolerate some time lag on standbys.  The problem is that there is no
guaranteeof when a particular commit will become visible for clients connected to standbys.  The existing synchronous
commitfeature is no help here because it guarantees only that the WAL has been flushed on another server before commit
returns. It says nothing about whether it has been applied or whether it has been applied on the standby that you
happento be talking to. 
>
> A while ago I posted a small patch[1] to allow synchronous_commit to wait for remote apply on the current synchronous
standby,but (as Simon Riggs rightly pointed out in that thread) that part isn't the main problem.  It seems to me that
themain problem for a practical 'writer waits' system is how to support a dynamic set of servers, gracefully tolerating
failuresand timing out laggards, while also providing a strong guarantee during any such transitions.  Now I would like
topropose something to do that, and share a proof-of-concept patch. 
>
>
> === PROPOSAL ===
>
> The working name for the proposed feature is "causal reads", because it provides a form of "causal consistency"[2]
(and"read-your-writes" consistency) no matter which server the client is connected to.  There is a similar feature by
thesame name in another product (albeit implemented differently -- 'reader waits'; more about that later).  I'm not
weddedto the name. 
>
> The feature allows arbitrary read-only transactions to be run on any hot standby, with a specific guarantee about the
visibilityof preceding transactions.  The guarantee is that if you set a new GUC "causal_reads = on" in any pair of
consecutivetransactions (tx1, tx2) where tx2 begins after tx1 successfully returns, then tx2 will either see tx1 or
failwith a new error "standby is not available for causal reads", no matter which server it runs on.  A discovery
mechanismis also provided, giving an instantaneous snapshot of the set of standbys that are currently available for
causalreads (ie won't raise the error), in the form of a new column in pg_stat_replication. 
>
> For example, a web server might run tx1 to insert a new row representing a message in a discussion forum on the
primaryserver, and then send the user to another web page that runs tx2 to load all messages in the forum on an
arbitraryhot standby server.  If causal_reads = on in both tx1 and tx2 (for example, because it's on globally), then
tx2is guaranteed to see the new post, or get a (hopefully rare) error telling the client to retry on another server. 
>
> Very briefly, the approach is:
> 1.  The primary tracks apply lag on each standby (including between commits).
> 2.  The primary deems standbys 'available' for causal reads if they are applying WAL and replying to keepalives fast
enough,and periodically sends the standby an authorization to consider itself available for causal reads until a time
inthe near future. 
> 3.  Commit on the primary with "causal_reads = on" waits for all 'available' standbys either to apply the commit
record,or to cease to be 'available' and begin raising the error if they are still alive (because their authorizations
haveexpired). 
> 4.  Standbys can start causal reads transactions only while they have an authorization with an expiry time in the
future;otherwise they raise an error when an initial snapshot is taken. 
>
> In a follow-up email I can write about the design trade-offs considered (mainly 'writer waits' vs 'reader waits'),
comparisonwith some other products, method of estimating replay lag, wait and timeout logic and how it maintains the
guaranteein various failure scenarios, logic for standbys joining and leaving, implications of system clock skew
betweenservers, or any other questions you may have, depending on feedback/interest (but see comments in the attached
patchfor some of those subjects).  For now I didn't want to clog up the intertubes with too large a wall of text. 
>
>
> === PROOF-OF-CONCEPT ===
>
> Please see the POC patch attached.  It adds two new GUCs.  After setting up one or more hot standbys as per usual,
simplyadd "causal_reads_timeout = 4s" to the primary's postgresql.conf and restart.  Now, you can set "causal_reads =
on"in some/all sessions to get guaranteed causal consistency.  Expected behaviour: the causal reads guarantee is
maintainedat all times, even when you overwhelm, kill, crash, disconnect, restart, pause, add and remove standbys, and
theprimary drops them from the set it waits for in a timely fashion.  You can monitor the system with the replay_lag
andcausal_reads_status in pg_stat_replication and some state transition LOG messages on the primary.  (The patch also
supports"synchronous_commit = apply", but it's not clear how useful that is in practice, as already discussed.) 
>
> Lastly, a few notes about how this feature related to some other work:
>
> The current version of this patch has causal_reads as a feature separate from synchronous_commit, from a user's point
ofview.  The thinking behind this is that load balancing and data loss avoidance are separate concerns:
synchronous_commitdeals with the latter, and causal_reads with the former.  That said, existing SyncRep machinery is
obviouslyused (specifically  SyncRep queues, with a small modification, as a way to wait for apply messages to arrive
fromstandbys).  (An earlier prototype had causal reads as a new level for synchronous_commit and associated states as
newwalsender states above 'streaming'.  When contemplating how to combine this proposal with the
multiple-synchronous-standbypatch, some colleagues and I came around to the view that the concerns are separate.  The
reasonfor wanting to configure complicated quorum definitions is to control data loss risks and has nothing to do with
loadbalancing requirements, so we thought the features should probably be separate.) 
>
> The multiple-synchronous-servers patch[3] could be applied or not independently of this feature as a result of that
separation,as it doesn't use synchronous_standby_names or indeed any kind of statically defined quorum. 
>
> The standby WAL writer patch[4] would significantly improve walreceiver performance and smoothness which would work
verywell with this proposal. 
>
> Please let me know what you think!
>
> Thanks,
>
>
> [1] http://www.postgresql.org/message-id/flat/CAEepm=1fqkivL4V-OTPHwSgw4aF9HcoGiMrCW-yBtjipX9gsag@mail.gmail.com
>
> [2] From http://queue.acm.org/detail.cfm?id=1466448
>
> "Causal consistency. If process A has communicated to process B that it has updated a data item, a subsequent access
byprocess B will return the updated value, and a write is guaranteed to supersede the earlier write. Access by process
Cthat has no causal relationship to process A is subject to the normal eventual consistency rules. 
>
> Read-your-writes consistency. This is an important model where process A, after it has updated a data item, always
accessesthe updated value and will never see an older value. This is a special case of the causal consistency model." 
>
> [3] http://www.postgresql.org/message-id/flat/CAOG9ApHYCPmTypAAwfD3_V7sVOkbnECFivmRc1AxhB40ZBSwNQ@mail.gmail.com
>
> [4] http://www.postgresql.org/message-id/flat/CA+U5nMJifauXvVbx=v3UbYbHO3Jw2rdT4haL6CCooEDM5=4ASQ@mail.gmail.com
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



--
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Very confusing installcheck behavior with PGXS
Next
From: Tom Lane
Date:
Subject: Re: Very confusing installcheck behavior with PGXS