Re: [HACKERS] Causal reads take II - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] Causal reads take II
Date
Msg-id CAEepm=1k0VyP8s1yA56_VBmfoXFrsfFHjFOtQjVO_MbxDukyLA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Causal reads take II  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [HACKERS] Causal reads take II
List pgsql-hackers
On Fri, Jun 23, 2017 at 11:48 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Apply the patch after first applying a small bug fix for replication
> lag tracking[4].  Then:

That bug fix was committed, so now causal-reads-v17.patch can be
applied directly on top of master.

> 1.  Set up some streaming replicas.
> 2.  Stick causal_reads_max_replay_lag = 2s (or any time you like) in
> the primary's postgresql.conf.
> 3.  Set causal_reads = on in some transactions on various nodes.
> 4.  Try to break it!

Someone asked me off-list how to set this up quickly and easily for
testing.  Here is a shell script that will start up a primary server
(port 5432) and 3 replicas (ports 5441 to 5443).  Set the two paths at
the top of the file before running in.  Log in with psql postgres [-p
<port>], then SET causal_reads = on to test its effect.
causal_reads_max_replay_lay is set to 2s and depending on your
hardware you might find that stuff like CREATE TABLE big_table AS
SELECT generate_series(1, 10000000) or a large COPY data load causes
replicas to be kicked out of the set after a while; you can also pause
replay on the replicas with SELECT pg_wal_replay_pause() and
pg_wal_replay_resume(), kill -STOP/-CONT or -9 the walreceiver
processes to similar various failure modes, or run the replicas
remotely and unplug the network.  SELECT application_name, replay_lag,
causal_reads_state FROM pg_state_replication to see the current
situation, and also monitor the primary's LOG messages about
transitions.  You should find that the
"read-your-writes-or-fail-explicitly" guarantee is upheld, no matter
what you do, and furthermore than failing or lagging replicas don't
hold hold the primary up very long: in the worst case
causal_reads_lease_time for lost contact, and in the best case the
time to exchange a couple of messages with the standby to tell it its
lease is revoked and it should start raising an error.  You might find
test-causal-reads.c[1] useful for testing.

> Maybe it needs a better name.

Ok, how about this: the feature could be called "synchronous replay".
The new column in pg_stat_replication could be called sync_replay
(like the other sync_XXX columns).  The GUCs could be called
synchronous replay, synchronous_replay_max_lag and
synchronous_replay_lease_time.  The language in log messages could
refer to standbys "joining the synchronous replay set".

Restating the purpose of the feature with that terminology:  If
synchronous_replay is set to on, then you see the effects of all
synchronous_replay = on transactions that committed before your
transaction began, or an error is raised if that is not possible on
the current node.  This allows applications to direct read-only
queries to read-only replicas for load balancing without seeing stale
data.  Is that clearer?

Restating the relationship with synchronous replication with that
terminology: while synchronous_commit and synchronous_standby_names
are concerned with distributed durability, synchronous_replay is
concerned with distributed visibility.  While the former prevents
commits from returning if the configured level of durability isn't met
(for example "must be flushed on master + any 2 standbys"), the latter
will simply drop any standbys from the synchronous replay set if they
fail or lag more than synchronous_replay_max_lag.  It is reasonable to
want to use both features at once:  my policy on distributed
durability might be that I want all transactions to be flushed to disk
on master + any of three servers before I report information to users,
and my policy on distributed visibility might be that I want to be
able to run read-only queries on any of my six read-only replicas, but
don't want to wait for any that lag by more than 1 second.

Thoughts?

[1] https://www.postgresql.org/message-id/CAEepm%3D3NF%3D7eLkVR2fefVF9bg6RxpZXoQFmOP3RWE4r4iuO7vg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Curtis Ruck
Date:
Subject: [HACKERS] FIPS mode?
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] FIPS mode?