Home > mailing lists

Re: Parallell Optimizer - Mailing list pgsql-hackers

From	Ants Aasma
Subject	Re: Parallell Optimizer
Date	June 13, 2013 00:56:20
Msg-id	CA+CSw_ugdgJ3E12a_0RCWXzJXPP6HUBPjP-c7U391eNUZjjvLg@mail.gmail.com Whole thread Raw
In response to	Re: Parallell Optimizer (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: Parallell Optimizer Re: Parallell Optimizer
List	pgsql-hackers

Tree view

On Thu, Jun 13, 2013 at 3:22 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Parallel query execution doesn't require commits to synchronize all
>> nodes. Parallel execution needs consistent snapshots across all nodes.
>> In effect this means that nodes need to agree on commit ordering,
>> either total order or a partial order that accounts for causality.
>> Most applications also want the guarantee that once they receive
>> commit confirmation, next snapshot they take will consider their
>> transaction as committed.
>>
>> Coincidentally getting cluster wide consistent snapshots and delaying
>> until some specific point in commit ordering is almost trivial to
>> solve with Commit Sequence Number based snapshot scheme that I
>> proposed.
>
> Can you elaborate more on this? Suppose streaming replication primary
> commits xid = X at time Y. Later on a standy receives WAL including tx
> X and commit it at time Y + 3 seconds. How can a parallel query
> execution (which uses snapshot including X) on the standby be delayed
> until Y + 3 seconds?

All commits are tagged with a monotonically increasing CSN number in
the order that they are committed and snapshots read the latest CSN
value to take notice of what has been committed. When determining
visibility for a tuple with xmin xid X, you just look up the CSN value
that X committed with and compare it with the snapshot CSN.  If the
value is lower, you know it was committed at point in time the
snapshot was taken, if it is higher or the transaction has not
committed you know that the transaction was concurrent with or later
than the snapshot and consequently not visible. This is the core idea,
everything else in the proposal deals with the technical detail of how
looking up a CSN value for a xid works.

In a cluster setting you take the CSN value on the master, then before
starting execution on a standby you wait until that the standby has
replayed enough WAL to reach the CSN point read from the master and
you know that after that everything that the snapshot can see is also
replayed on the standby.

The wait for replication can be optimized if the client takes note of
the CSN that its last transaction committed with and negotiates a new
snapshot across the cluster that is the same or larger so you only
need to wait until the point until your specific transaction has been
replicated. This allows for the replication time to overlap with
client think time between receiving commit confirmation and taking a
new snapshot.

This scheme can almost work now for streaming replication if you
replace CSN with WAL LSN of the commit record. The issue prohibiting
it is the fact that visibility order of commits on the master is
determined by the order that commiters acquire ProcArrayLock, and that
can be different from the order of WALInsertLock that determines the
ordering of LSNs, whereas visibility on the slave instance is
determined purely by WAL LSN order.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

pgsql-hackers by date:

From: Tatsuo Ishii
Date: 13 June 2013, 00:22:25
Subject: Re: Parallell Optimizer

From: Tatsuo Ishii
Date: 13 June 2013, 01:09:34
Subject: Re: Parallell Optimizer

Re: Parallell Optimizer - Mailing list pgsql-hackers

Previous

Next