Re: [HACKERS] Causal reads take II - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] Causal reads take II |
Date | |
Msg-id | CAEepm=1k0VyP8s1yA56_VBmfoXFrsfFHjFOtQjVO_MbxDukyLA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Causal reads take II (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: [HACKERS] Causal reads take II
|
List | pgsql-hackers |
On Fri, Jun 23, 2017 at 11:48 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Apply the patch after first applying a small bug fix for replication > lag tracking[4]. Then: That bug fix was committed, so now causal-reads-v17.patch can be applied directly on top of master. > 1. Set up some streaming replicas. > 2. Stick causal_reads_max_replay_lag = 2s (or any time you like) in > the primary's postgresql.conf. > 3. Set causal_reads = on in some transactions on various nodes. > 4. Try to break it! Someone asked me off-list how to set this up quickly and easily for testing. Here is a shell script that will start up a primary server (port 5432) and 3 replicas (ports 5441 to 5443). Set the two paths at the top of the file before running in. Log in with psql postgres [-p <port>], then SET causal_reads = on to test its effect. causal_reads_max_replay_lay is set to 2s and depending on your hardware you might find that stuff like CREATE TABLE big_table AS SELECT generate_series(1, 10000000) or a large COPY data load causes replicas to be kicked out of the set after a while; you can also pause replay on the replicas with SELECT pg_wal_replay_pause() and pg_wal_replay_resume(), kill -STOP/-CONT or -9 the walreceiver processes to similar various failure modes, or run the replicas remotely and unplug the network. SELECT application_name, replay_lag, causal_reads_state FROM pg_state_replication to see the current situation, and also monitor the primary's LOG messages about transitions. You should find that the "read-your-writes-or-fail-explicitly" guarantee is upheld, no matter what you do, and furthermore than failing or lagging replicas don't hold hold the primary up very long: in the worst case causal_reads_lease_time for lost contact, and in the best case the time to exchange a couple of messages with the standby to tell it its lease is revoked and it should start raising an error. You might find test-causal-reads.c[1] useful for testing. > Maybe it needs a better name. Ok, how about this: the feature could be called "synchronous replay". The new column in pg_stat_replication could be called sync_replay (like the other sync_XXX columns). The GUCs could be called synchronous replay, synchronous_replay_max_lag and synchronous_replay_lease_time. The language in log messages could refer to standbys "joining the synchronous replay set". Restating the purpose of the feature with that terminology: If synchronous_replay is set to on, then you see the effects of all synchronous_replay = on transactions that committed before your transaction began, or an error is raised if that is not possible on the current node. This allows applications to direct read-only queries to read-only replicas for load balancing without seeing stale data. Is that clearer? Restating the relationship with synchronous replication with that terminology: while synchronous_commit and synchronous_standby_names are concerned with distributed durability, synchronous_replay is concerned with distributed visibility. While the former prevents commits from returning if the configured level of durability isn't met (for example "must be flushed on master + any 2 standbys"), the latter will simply drop any standbys from the synchronous replay set if they fail or lag more than synchronous_replay_max_lag. It is reasonable to want to use both features at once: my policy on distributed durability might be that I want all transactions to be flushed to disk on master + any of three servers before I report information to users, and my policy on distributed visibility might be that I want to be able to run read-only queries on any of my six read-only replicas, but don't want to wait for any that lag by more than 1 second. Thoughts? [1] https://www.postgresql.org/message-id/CAEepm%3D3NF%3D7eLkVR2fefVF9bg6RxpZXoQFmOP3RWE4r4iuO7vg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: