Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date
Msg-id CAEepm=24grE9werAXQTtOFuO6YNKVFVqkOzmnntZfLNzdzqeBQ@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Ants Aasma <ants.aasma@eesti.ee>)
List pgsql-hackers
On Thu, Nov 12, 2015 at 12:10 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:
On Wed, Nov 11, 2015 at 11:22 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Nov 11, 2015 at 9:42 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On 11/11/2015 10:23 AM, Simon Riggs wrote:
>>> Thanks for working on this issue.
>>
>> +1.

+1. I have seen a lot of interest for something along these lines.

>> I'm thinking the client should get some kind of a token back from the
>> commit, and it could use the token on the standby, to wait for that commit
>> to be applied. The token could be just the XID, or the LSN of the commit
>> record. Or the application could generate the token and pass it to the
>> server in the commit, similar to how 2PC works. So the interaction would be
>> something like:
>>
>> In master:
>> BEGIN;
>> INSERT INTO FOO ...;
>> COMMIT;
>> Server returns: COMMITted with token 1234
>>
>> Later, in standby:
>> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
>> SELECT * FROM foo;
>> ...

To avoid read anomalies (backwards timetravel) it should also be
possible to receive a token from read-only transactions based on the
latest snapshot used.

> My thinking was that the reason for wanting to load balance over a set of
> hot standbys is because you have a very read-heavy workload, so it makes
> sense to tax the writers and leave the many dominant readers unburdened, so
> (3) should be better than (2) for the majority of users who want such a
> configuration.  (Note also that it's not a requirement to tax every write;
> with this proposal you can set causal_reads to off for those transactions
> where you know there is no possibility of a causally dependent read).
>
> As for (1), my thinking was that most application developers would probably
> prefer not to have to deal with that type of interface.  For users who do
> want to do that, it would be comparatively simple to make that possible, and
> would not conflict with this proposal.  This proposal could be used by
> people retrofitting load balancing to an existing applications with relative
> ease, or simply not wanting to have to deal with LSNs and complexity.  (I
> have considered proposing pg_wait_for_xlog_replay_location(lsn, timeout)
> separately, which could be called on a standby with the lsn obtained from
> pg_current_xlog_location() on the primary any time after a COMMIT completes,
> but I was thinking of that as a different feature addressing a different
> user base: people prepared to do more work to squeeze out some extra
> performance.)

Although I still think that 1) is the correct long term solution I
must say that I agree with the reasoning presented. I think we should
review the API in the light that in the future we might have a mix of
clients, some clients that are able to keep track of causality tokens
and either want to wait when a read request arrives, or pick a host to
use based on the token, and then there are "dumb" clients that want to
use write side waits.

Exactly!

I see the causality tokens approach (thank you for that terminology) not so much as a "long term" solution, but rather as an expert feature likely to interest a small number of sophisticated users willing to take on more responsibility in exchange for greater control.  We should definitely add support for that, and I expect the patch would be fairly simple and short.

But I believe the vast majority of users would like to be able to run new and existing plain SQL on any node and see the data they just wrote, with graceful failure modes, and without extra conceptual load or invasive code changes.  So I think we should cater for that mode of usage that too.

Also, it should be possible to configure which standbys are considered
for waiting on. Otherwise a reporting slave will occasionally catch up
enough to be considered "available" and then cause a latency peak when
a long query blocks apply again.

Good point.  Here's a new version which adds the GUC causal_reads_standby_names, defaulting to '*' (but as before, the feature is not activated until you set causal_reads_timeout).  Now you can list standby names explicitly if you want a way to exclude certain standbys.  Also, I noticed that cascaded standbys shouldn't be available for causal reads, so I added a check for that.

--
Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: WIP: Make timestamptz_out less slow.
Next
From: Greg Stark
Date:
Subject: Re: can we add SKIP LOCKED to UPDATE?