Re: Master-slave visibility order - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Master-slave visibility order
Date
Msg-id CA+TgmoZS8ir-k5aNB3XMkMmQ0==8HC1pw9SHJnv4eY8v9=LsnA@mail.gmail.com
Whole thread Raw
In response to Master-slave visibility order  (Ants Aasma <ants@cybertec.at>)
Responses Re: Master-slave visibility order  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
On Wed, Aug 28, 2013 at 10:58 AM, Ants Aasma <ants@cybertec.at> wrote:
> I currently see the following courses of action:
>
> 1. Do nothing about the inconsistency, use a transient global counter
> for master commit order and commit record LSN for slaves.
>    Pro: doesn't change any semantics
>    Con: we are not making any progress towards cluster wide snapshots
> or even serializable transactions on slaves.
>
> 2. Create a new WAL record type that is inserted when a transaction
> becomes visible. LSN of this record determines transaction visibility
> order. Async transactions can be optimized to skip this record. This
> record does not need to be flushed.
>    Pro: cluster wide consistency, replication method agnostic
>    Con: one extra WAL record insertion per writing transaction. (32
> bytes of WAL per tx)
>
> 3. Use a transient global counter on master, send xid-csn pairs to
> slave via a side channel on the replication connection.
>    Pro: Less overhead than WAL records
>    Con: replication protocol needs (possibly invasive) changes, WAL
> shipping based replication can't use this mechanism, lots of extra
> code required.
>
> 4. Make the choice between 1 and 2 user configurable (it seems to me
> that it could even be changed without a restart).
>
> Thoughts?

I think approach #2 is dead on arrival, at least as a default policy.
It essentially amounts to requiring two commit records per transaction
rather than one, and I think that has no chance of being acceptable.
It's not just or even primarily the *volume* of WAL that I'm concerned
about so much as the feeling that hitting WAL twice rather than once
at the end of a transaction that may have only written one or two WAL
records to begin with is going to slow things down pretty
substantially, especially in high-concurrency scenarios.

I wouldn't entirely dismiss the idea of changing the user-visible
semantics.  In addition to a WAL insertion pointer and a WAL flush
pointer, you'd have a WAL snapshot pointer, which could run ahead of
the flush pointer if the transactions were all asynchronous, but which
for synchronous transactions could not advance faster than the flush
pointer.  Only users running a mix of synchronous_commit=on and
synchronous_commit=off would be harmed, and maybe we could convince
ourselves that's OK.

Still, there's no doubt that there is a downside there.  Therefore,
I'm inclined to suggest that you implement #1.  If, at a later time,
we want to make progress on the issue of cluster-wide snapshot
consistency, you could implement #2 or #3 as an optional feature that
can be turned on via some flag.  However, I would recommend against
trying to do that in the initial patch; I think that doing either #2
or #3 is really a separate feature, and I think if you try to
incorporate all of that code into the main CSN patch it's just going
to be a distraction from what figures to be a very complicated patch
even in minimal form.

If you did choose to implement #2 as an option at some point, it would
probably be worth optimizing for the case where commit ordering and
visibility ordering match, and try to find a design where you only
need the extra WAL record when the orderings don't match.  I'm not
sure exactly how to do that, but it might be worth investigating.  I
don't think that's enough to save #2 as a default behavior, but it
might make it more palatable as an option.

I agree with what others have said insofar as it would be nifty if we
could use the commit LSN as the commit sequence number.  But I think
you've put your finger on why that's not likely to work out well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: pg_system_identifier()
Next
From: Robert Haas
Date:
Subject: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])