Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date
Msg-id CAEepm=3NUTR1nZ08P31KtY3cUGDbDZTUpt75C3ZsA5ZWzBg2mg@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
Hi,

Here is a new version of the patch with a few small improvements:

1.  Adopted the term '[read] lease', replacing various hand-wavy language in the comments and code.  That seems to be the established term for this approach[1].

2.  Reduced the stalling time on failure.  When things go wrong with a standby (such as losing contact with it), instead of stalling for a conservative amount of time longer than any lease that might have been granted, the primary now stalls only until the expiry of the last lease that actually was granted to a given dropped standby, which should be sooner.

3.  Fixed a couple of bugs that showed up in testing and review (some bad flow control in the signal handling, and a bug in a circular buffer), and changed the recovery->walreceiver wakeup signal handling to block the signal except while waiting in walrcv_receive (it didn't seem a good idea to interrupt arbitrary syscalls in walreceiver so I thought that would be a improvement; but of course that area's going to be reworked by Simon's patch anyway, as discussed elsewhere).

Restating the central idea using the new terminology:  So long as they are replaying fast enough, the primary grants a series of causal reads leases to standbys allowing them to handle causal reads queries locally without any inter-node communication for a limited time.  Leases are promises that the primary will wait for the standby to apply commit records OR be dropped from the set of available causal reads standbys and know that it has been dropped, before the primary returns from commit, in order to uphold the causal reads guarantee.  In the worst case it can do that by waiting for the most recently granted lease to expire.

I've also attached a couple of things which might be useful when trying the patch out: test-causal-reads.c which can be used to test performance and causality under various conditions, and test-causal-reads.sh which can be used to bring up a primary and a bunch of local hot standbys to talk to.  (In the hope of encouraging people to take the patch for a spin...)

[1] Originally from a well known 1989 paper on caching, but in the context of databases and synchronous replication see for example the recent papers on "Niobe" and "Paxos Quorum Leases" (especially the reference to Google Megastore).  Of course a *lot* more is going on in those very different algorithms, but at some level "read leases" are being used to allow local-node-only reads for a limited time while upholding some kind of global consistency guarantee, in some of those consensus database systems.  I spent a bit of time talking about consistency levels to database guru and former colleague Alex Scotti who works on a Paxos-based system, and he gave me the initial idea to try out a lease-based consistency system for Postgres streaming rep.  It seems like a very useful point in the space of trade-offs to me.

--
Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Bug in numeric multiplication
Next
From: Amit Kapila
Date:
Subject: Re: [DESIGN] ParallelAppend