Re: Parallell Optimizer - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Parallell Optimizer
Date
Msg-id 20130613.100925.212865444421606387.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: Parallell Optimizer  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
> On Thu, Jun 13, 2013 at 3:22 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>> Parallel query execution doesn't require commits to synchronize all
>>> nodes. Parallel execution needs consistent snapshots across all nodes.
>>> In effect this means that nodes need to agree on commit ordering,
>>> either total order or a partial order that accounts for causality.
>>> Most applications also want the guarantee that once they receive
>>> commit confirmation, next snapshot they take will consider their
>>> transaction as committed.
>>>
>>> Coincidentally getting cluster wide consistent snapshots and delaying
>>> until some specific point in commit ordering is almost trivial to
>>> solve with Commit Sequence Number based snapshot scheme that I
>>> proposed.
>>
>> Can you elaborate more on this? Suppose streaming replication primary
>> commits xid = X at time Y. Later on a standy receives WAL including tx
>> X and commit it at time Y + 3 seconds. How can a parallel query
>> execution (which uses snapshot including X) on the standby be delayed
>> until Y + 3 seconds?
> 
> All commits are tagged with a monotonically increasing CSN number in
> the order that they are committed and snapshots read the latest CSN
> value to take notice of what has been committed. When determining
> visibility for a tuple with xmin xid X, you just look up the CSN value
> that X committed with and compare it with the snapshot CSN.  If the
> value is lower, you know it was committed at point in time the
> snapshot was taken, if it is higher or the transaction has not
> committed you know that the transaction was concurrent with or later
> than the snapshot and consequently not visible. This is the core idea,
> everything else in the proposal deals with the technical detail of how
> looking up a CSN value for a xid works.
> 
> In a cluster setting you take the CSN value on the master, then before
> starting execution on a standby you wait until that the standby has
> replayed enough WAL to reach the CSN point read from the master and
> you know that after that everything that the snapshot can see is also
> replayed on the standby.
> 
> The wait for replication can be optimized if the client takes note of
> the CSN that its last transaction committed with and negotiates a new
> snapshot across the cluster that is the same or larger so you only
> need to wait until the point until your specific transaction has been
> replicated. This allows for the replication time to overlap with
> client think time between receiving commit confirmation and taking a
> new snapshot.
> 
> This scheme can almost work now for streaming replication if you
> replace CSN with WAL LSN of the commit record. The issue prohibiting
> it is the fact that visibility order of commits on the master is
> determined by the order that commiters acquire ProcArrayLock, and that
> can be different from the order of WALInsertLock that determines the
> ordering of LSNs, whereas visibility on the slave instance is
> determined purely by WAL LSN order.

Thanks for detailed explanation. The idea of CSN is quite impressive.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Ants Aasma
Date:
Subject: Re: Parallell Optimizer
Next
From: Stephen Frost
Date:
Subject: Re: Parallell Optimizer