Re: Parallell Optimizer - Mailing list pgsql-hackers
From | Ants Aasma |
---|---|
Subject | Re: Parallell Optimizer |
Date | |
Msg-id | CA+CSw_ugdgJ3E12a_0RCWXzJXPP6HUBPjP-c7U391eNUZjjvLg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallell Optimizer (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: Parallell Optimizer
Re: Parallell Optimizer |
List | pgsql-hackers |
On Thu, Jun 13, 2013 at 3:22 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Parallel query execution doesn't require commits to synchronize all >> nodes. Parallel execution needs consistent snapshots across all nodes. >> In effect this means that nodes need to agree on commit ordering, >> either total order or a partial order that accounts for causality. >> Most applications also want the guarantee that once they receive >> commit confirmation, next snapshot they take will consider their >> transaction as committed. >> >> Coincidentally getting cluster wide consistent snapshots and delaying >> until some specific point in commit ordering is almost trivial to >> solve with Commit Sequence Number based snapshot scheme that I >> proposed. > > Can you elaborate more on this? Suppose streaming replication primary > commits xid = X at time Y. Later on a standy receives WAL including tx > X and commit it at time Y + 3 seconds. How can a parallel query > execution (which uses snapshot including X) on the standby be delayed > until Y + 3 seconds? All commits are tagged with a monotonically increasing CSN number in the order that they are committed and snapshots read the latest CSN value to take notice of what has been committed. When determining visibility for a tuple with xmin xid X, you just look up the CSN value that X committed with and compare it with the snapshot CSN. If the value is lower, you know it was committed at point in time the snapshot was taken, if it is higher or the transaction has not committed you know that the transaction was concurrent with or later than the snapshot and consequently not visible. This is the core idea, everything else in the proposal deals with the technical detail of how looking up a CSN value for a xid works. In a cluster setting you take the CSN value on the master, then before starting execution on a standby you wait until that the standby has replayed enough WAL to reach the CSN point read from the master and you know that after that everything that the snapshot can see is also replayed on the standby. The wait for replication can be optimized if the client takes note of the CSN that its last transaction committed with and negotiates a new snapshot across the cluster that is the same or larger so you only need to wait until the point until your specific transaction has been replicated. This allows for the replication time to overlap with client think time between receiving commit confirmation and taking a new snapshot. This scheme can almost work now for streaming replication if you replace CSN with WAL LSN of the commit record. The issue prohibiting it is the fact that visibility order of commits on the master is determined by the order that commiters acquire ProcArrayLock, and that can be different from the order of WALInsertLock that determines the ordering of LSNs, whereas visibility on the slave instance is determined purely by WAL LSN order. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
pgsql-hackers by date: