Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Proposal: "Causal reads" mode for load balancing reads without stale data |
Date | |
Msg-id | CAASwCXf_VZGG4mOd6Cw92R4LS0Pc7W7SrpD+1XNsigTfQ0sV7A@mail.gmail.com Whole thread Raw |
In response to | Proposal: "Causal reads" mode for load balancing reads without stale data (Thomas Munro <thomas.munro@enterprisedb.com>) |
List | pgsql-hackers |
+1 to both the feature and the concept of how it's implemented. Haven't looked at the code though. This feature would be very useful for us at Trustly. This would mean we got get rid of an entire system component in our architecture (=memcached) which we only use to write data which must be immediately readable at the sync slave after the master commits. The only such data we currently have is the backoffice sessionid which must be readable on the slave, otherwise the read-only calls which we route to the slave might fail because it's missing. On Wed, Nov 11, 2015 at 6:37 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > Hi hackers, > > Many sites use hot standby servers to spread read-heavy workloads over more hardware, or at least would like to. Thisworks well today if your application can tolerate some time lag on standbys. The problem is that there is no guaranteeof when a particular commit will become visible for clients connected to standbys. The existing synchronous commitfeature is no help here because it guarantees only that the WAL has been flushed on another server before commit returns. It says nothing about whether it has been applied or whether it has been applied on the standby that you happento be talking to. > > A while ago I posted a small patch[1] to allow synchronous_commit to wait for remote apply on the current synchronous standby,but (as Simon Riggs rightly pointed out in that thread) that part isn't the main problem. It seems to me that themain problem for a practical 'writer waits' system is how to support a dynamic set of servers, gracefully tolerating failuresand timing out laggards, while also providing a strong guarantee during any such transitions. Now I would like topropose something to do that, and share a proof-of-concept patch. > > > === PROPOSAL === > > The working name for the proposed feature is "causal reads", because it provides a form of "causal consistency"[2] (and"read-your-writes" consistency) no matter which server the client is connected to. There is a similar feature by thesame name in another product (albeit implemented differently -- 'reader waits'; more about that later). I'm not weddedto the name. > > The feature allows arbitrary read-only transactions to be run on any hot standby, with a specific guarantee about the visibilityof preceding transactions. The guarantee is that if you set a new GUC "causal_reads = on" in any pair of consecutivetransactions (tx1, tx2) where tx2 begins after tx1 successfully returns, then tx2 will either see tx1 or failwith a new error "standby is not available for causal reads", no matter which server it runs on. A discovery mechanismis also provided, giving an instantaneous snapshot of the set of standbys that are currently available for causalreads (ie won't raise the error), in the form of a new column in pg_stat_replication. > > For example, a web server might run tx1 to insert a new row representing a message in a discussion forum on the primaryserver, and then send the user to another web page that runs tx2 to load all messages in the forum on an arbitraryhot standby server. If causal_reads = on in both tx1 and tx2 (for example, because it's on globally), then tx2is guaranteed to see the new post, or get a (hopefully rare) error telling the client to retry on another server. > > Very briefly, the approach is: > 1. The primary tracks apply lag on each standby (including between commits). > 2. The primary deems standbys 'available' for causal reads if they are applying WAL and replying to keepalives fast enough,and periodically sends the standby an authorization to consider itself available for causal reads until a time inthe near future. > 3. Commit on the primary with "causal_reads = on" waits for all 'available' standbys either to apply the commit record,or to cease to be 'available' and begin raising the error if they are still alive (because their authorizations haveexpired). > 4. Standbys can start causal reads transactions only while they have an authorization with an expiry time in the future;otherwise they raise an error when an initial snapshot is taken. > > In a follow-up email I can write about the design trade-offs considered (mainly 'writer waits' vs 'reader waits'), comparisonwith some other products, method of estimating replay lag, wait and timeout logic and how it maintains the guaranteein various failure scenarios, logic for standbys joining and leaving, implications of system clock skew betweenservers, or any other questions you may have, depending on feedback/interest (but see comments in the attached patchfor some of those subjects). For now I didn't want to clog up the intertubes with too large a wall of text. > > > === PROOF-OF-CONCEPT === > > Please see the POC patch attached. It adds two new GUCs. After setting up one or more hot standbys as per usual, simplyadd "causal_reads_timeout = 4s" to the primary's postgresql.conf and restart. Now, you can set "causal_reads = on"in some/all sessions to get guaranteed causal consistency. Expected behaviour: the causal reads guarantee is maintainedat all times, even when you overwhelm, kill, crash, disconnect, restart, pause, add and remove standbys, and theprimary drops them from the set it waits for in a timely fashion. You can monitor the system with the replay_lag andcausal_reads_status in pg_stat_replication and some state transition LOG messages on the primary. (The patch also supports"synchronous_commit = apply", but it's not clear how useful that is in practice, as already discussed.) > > Lastly, a few notes about how this feature related to some other work: > > The current version of this patch has causal_reads as a feature separate from synchronous_commit, from a user's point ofview. The thinking behind this is that load balancing and data loss avoidance are separate concerns: synchronous_commitdeals with the latter, and causal_reads with the former. That said, existing SyncRep machinery is obviouslyused (specifically SyncRep queues, with a small modification, as a way to wait for apply messages to arrive fromstandbys). (An earlier prototype had causal reads as a new level for synchronous_commit and associated states as newwalsender states above 'streaming'. When contemplating how to combine this proposal with the multiple-synchronous-standbypatch, some colleagues and I came around to the view that the concerns are separate. The reasonfor wanting to configure complicated quorum definitions is to control data loss risks and has nothing to do with loadbalancing requirements, so we thought the features should probably be separate.) > > The multiple-synchronous-servers patch[3] could be applied or not independently of this feature as a result of that separation,as it doesn't use synchronous_standby_names or indeed any kind of statically defined quorum. > > The standby WAL writer patch[4] would significantly improve walreceiver performance and smoothness which would work verywell with this proposal. > > Please let me know what you think! > > Thanks, > > > [1] http://www.postgresql.org/message-id/flat/CAEepm=1fqkivL4V-OTPHwSgw4aF9HcoGiMrCW-yBtjipX9gsag@mail.gmail.com > > [2] From http://queue.acm.org/detail.cfm?id=1466448 > > "Causal consistency. If process A has communicated to process B that it has updated a data item, a subsequent access byprocess B will return the updated value, and a write is guaranteed to supersede the earlier write. Access by process Cthat has no causal relationship to process A is subject to the normal eventual consistency rules. > > Read-your-writes consistency. This is an important model where process A, after it has updated a data item, always accessesthe updated value and will never see an older value. This is a special case of the causal consistency model." > > [3] http://www.postgresql.org/message-id/flat/CAOG9ApHYCPmTypAAwfD3_V7sVOkbnECFivmRc1AxhB40ZBSwNQ@mail.gmail.com > > [4] http://www.postgresql.org/message-id/flat/CA+U5nMJifauXvVbx=v3UbYbHO3Jw2rdT4haL6CCooEDM5=4ASQ@mail.gmail.com > > -- > Thomas Munro > http://www.enterprisedb.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Joel Jacobson Mobile: +46703603801 Trustly.com | Newsroom | LinkedIn | Twitter
pgsql-hackers by date: