Re: Master-slave visibility order - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Master-slave visibility order
Date
Msg-id CA+CSw_uh0jJ9d+7GXT=Nva2KhkLMxp=ACi1qg1XCx9iMhmHEtQ@mail.gmail.com
Whole thread Raw
In response to Re: Master-slave visibility order  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Master-slave visibility order  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Hi, thanks for your reply.

On Thu, Aug 29, 2013 at 6:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think approach #2 is dead on arrival, at least as a default policy.
> It essentially amounts to requiring two commit records per transaction
> rather than one, and I think that has no chance of being acceptable.
> It's not just or even primarily the *volume* of WAL that I'm concerned
> about so much as the feeling that hitting WAL twice rather than once
> at the end of a transaction that may have only written one or two WAL
> records to begin with is going to slow things down pretty
> substantially, especially in high-concurrency scenarios.

Heikki's excellent work on WAL insert scaling improves this so the hit
might not be all that big, considering that the visibility record only
needs to be inserted - relatively cheap compared to a WAL sync. But
it's still not likely to be free. I guess the only way to know for
sure would be to build it and bench it.

> I wouldn't entirely dismiss the idea of changing the user-visible
> semantics.  In addition to a WAL insertion pointer and a WAL flush
> pointer, you'd have a WAL snapshot pointer, which could run ahead of
> the flush pointer if the transactions were all asynchronous, but which
> for synchronous transactions could not advance faster than the flush
> pointer.  Only users running a mix of synchronous_commit=on and
> synchronous_commit=off would be harmed, and maybe we could convince
> ourselves that's OK.

Do you mean that mixed durability workloads with replication would
make async transactions wait or delay the visibility? We have the
additional complication of different synchronous_commit levels, so
this decision also affects different levels of synchronous commits.

> Still, there's no doubt that there is a downside there.  Therefore,
> I'm inclined to suggest that you implement #1.  If, at a later time,
> we want to make progress on the issue of cluster-wide snapshot
> consistency, you could implement #2 or #3 as an optional feature that
> can be turned on via some flag.  However, I would recommend against
> trying to do that in the initial patch; I think that doing either #2
> or #3 is really a separate feature, and I think if you try to
> incorporate all of that code into the main CSN patch it's just going
> to be a distraction from what figures to be a very complicated patch
> even in minimal form.

I'll go with #1. I agree that snapshot consistency a separate feature
that is mostly orthogonal to CSN snapshots. I wanted to get this
decision out of the way, so when it's time to discuss the actual patch
we don't have the distraction of discussing why LSNs are not workable
for determining visibility order.

> If you did choose to implement #2 as an option at some point, it would
> probably be worth optimizing for the case where commit ordering and
> visibility ordering match, and try to find a design where you only
> need the extra WAL record when the orderings don't match.  I'm not
> sure exactly how to do that, but it might be worth investigating.  I
> don't think that's enough to save #2 as a default behavior, but it
> might make it more palatable as an option.

Without a side channel the extra WAL record is necessary. Suppose that
we want to determine the ordering with a single commit record. The
slave must be able to deduce from the single record if it can make the
commit immediately visible or should it wait for additional
information. If it waits for additional information, that may never
come as the master could have committed and then went idle. If it
doesn't wait, then an async transaction could arrive on master, commit
and would want to become visible, but the master can't make it visible
without either violating the visibility order or letting the async
transaction wait behind the sync. In other words, without an oracle
(in the computer science sense :) ) master can't determine at the time
of commit record generation if the orderings can differ, and as WAL is
the only communication channel, neither can the slave. Timeouts won't
help either as that would need clock synchronization between servers,
similarly to Google's F1 system.

Speaking of F1, they solve the same problem by having clients be aware
of how fresh they want their snapshot to be. If we add this capability
then clients aware of this functionality could shift the visibility
wait from commit to the start of next transaction that needs to see
the changes.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: PL/pgSQL PERFORM with CTE
Next
From: Pavel Stehule
Date:
Subject: Re: PL/pgSQL PERFORM with CTE