Re: Synchronous replay take III - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Synchronous replay take III
Date
Msg-id CA+q6zcV6463xcLXhN9F1p0vLie8JRs=2jZ27XDW5x_62SUesBg@mail.gmail.com
Whole thread Raw
In response to Re: Synchronous replay take III  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Synchronous replay take III  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
> On Thu, Nov 15, 2018 at 6:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Thu, Mar 1, 2018 at 10:40 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> >
> > In previous threads[1][2][3] I called this feature proposal "causal
> > reads".  That was a terrible name, borrowed from MySQL.  While it is
> > probably a useful term of art, for one thing people kept reading it as
> > "casual"

Yeah, that was rather annoying that I couldn't get rid of this while playing
with the "take II" version :)

> To be clear what did you mean read-mostly workloads?
>
> I think there are two kind of reads on standbys: a read happend after
> writes and a directly read (e.g. reporting). The former usually
> requires the causal reads as you mentioned in order to read its own
> writes but the latter might be different: it often wants to read the
> latest data on the master at the time. IIUC even if we send a
> read-only query directly to a synchronous replay server we could get a
> stale result if the standby delayed for less than
> synchronous_replay_max_lag. So this synchronous replay feature would
> be helpful for the former case(i.e. a few writes and many reads wants
> to see them) whereas for the latter case perhaps the keeping the reads
> waiting on standby seems a reasonable solution.
>
> Also I think it's worth to consider the cost both causal reads *and*
> non-causal reads.
>
> I've considered a mixed workload (transactions requiring causal reads
> and transactions not requiring it) on the current design. IIUC the
> current design seems like that we create something like
> consistent-reads group by specifying servers. For example, if a
> transaction doesn't want to causality read it can send query any
> server with synchronous_replay = off but if it wants, it should select
> a synchronous replay server. It also means that client applications or
> routing middlewares such as pgpool is required to be aware of
> available synchronous replay standbys. That is, this design would cost
> the read-only transactions requiring causal reads. On the other hand,
> in token-based causal reads we can send read-only query any standbys
> if we can wait for the change to be replayed. Of course if we don't
> wait forever we can timeout and switch to either another standby or
> the master to execute query but we don't need to choose a server of
> standby servers.

Unfortunately, cfbot says that patch can't be applied without conflicts, could
you please post a rebased version and address commentaries from Masahiko?


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench doc fix
Next
From: Dmitry Dolgov
Date:
Subject: Re: Range phrase operator in tsquery