Re: Proposal for CSN based snapshots - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Proposal for CSN based snapshots
Date
Msg-id CA+CSw_tcPTmYbLvdyKSdQdWDeC7kHXVDXLXVRnrm_rdTRb1=Xg@mail.gmail.com
Whole thread Raw
In response to Re: Proposal for CSN based snapshots  (Markus Wanner <markus@bluegap.ch>)
List pgsql-hackers
On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Hmm. There's one more possible way this could all work. Let's have CSN ==
> LSN, also for asynchronous commits. A snapshot is the current insert
> position, but also make note of the current flush position, when you take a
> snapshot. Now, when you use the snapshot, if you ever see an XID that
> committed between the snapshot's insert position and the flush position,
> wait for the WAL to be flushed up to the snapshot's insert position at that
> point. With that scheme, an asynchronous commit could return to the
> application without waiting for a flush, but if someone actually looks at
> the changes the transaction made, then that transaction would have to wait.
> Furthermore, we could probably skip that waiting too, if the reading
> transaction is also using synchronous_commit=off.
>
> That's slightly different from the current behaviour. A transaction that
> runs with synchronous_commit=on, and reads data that was modified by an
> asynchronous transaction, would take a hit. But I think that would be
> acceptable.

My proposal of vector clocks would allow for pretty much exactly
current behavior.

To simplify, there would be lastSyncCommitSeqNo and
lastAsyncCommitSeqNo variables in ShmemVariableCache. Transaction
commit would choose which one to update based on synchronous_commit
setting and embed the value of the setting into CSN log. Snapshots
would contain both values, when checking for CSN visibility use the
value of the looked up synchronous_commit setting to decide which
value to compare against. Standby's replaying commit records would
just update both values, resulting in transactions becoming visible in
xlog order, as they do today. The scheme would allow for inventing a
new xlog record/replication message communicating visibility ordering.

However I don't see why inventing a separate CSN concept is a large
problem. Quite the opposite, unless there is a good reason that I'm
missing, it seems better to not unnecessarily conflate commit record
durability and transaction visibility ordering. Not having them tied
together allows for an external source to provide CSN values, allowing
for interesting distributed transaction implementations. E.g. using a
timestamp as the CSN a'la Google Spanner and the TrueTime API.

Regards,
Ants Aasma



pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Assertion failure in REL9_5_STABLE
Next
From: Tom Lane
Date:
Subject: Re: Assertion failure in REL9_5_STABLE