Re: Parallell Optimizer - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Parallell Optimizer |
Date | |
Msg-id | 20130613.100925.212865444421606387.t-ishii@sraoss.co.jp Whole thread Raw |
In response to | Re: Parallell Optimizer (Ants Aasma <ants@cybertec.at>) |
List | pgsql-hackers |
> On Thu, Jun 13, 2013 at 3:22 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> Parallel query execution doesn't require commits to synchronize all >>> nodes. Parallel execution needs consistent snapshots across all nodes. >>> In effect this means that nodes need to agree on commit ordering, >>> either total order or a partial order that accounts for causality. >>> Most applications also want the guarantee that once they receive >>> commit confirmation, next snapshot they take will consider their >>> transaction as committed. >>> >>> Coincidentally getting cluster wide consistent snapshots and delaying >>> until some specific point in commit ordering is almost trivial to >>> solve with Commit Sequence Number based snapshot scheme that I >>> proposed. >> >> Can you elaborate more on this? Suppose streaming replication primary >> commits xid = X at time Y. Later on a standy receives WAL including tx >> X and commit it at time Y + 3 seconds. How can a parallel query >> execution (which uses snapshot including X) on the standby be delayed >> until Y + 3 seconds? > > All commits are tagged with a monotonically increasing CSN number in > the order that they are committed and snapshots read the latest CSN > value to take notice of what has been committed. When determining > visibility for a tuple with xmin xid X, you just look up the CSN value > that X committed with and compare it with the snapshot CSN. If the > value is lower, you know it was committed at point in time the > snapshot was taken, if it is higher or the transaction has not > committed you know that the transaction was concurrent with or later > than the snapshot and consequently not visible. This is the core idea, > everything else in the proposal deals with the technical detail of how > looking up a CSN value for a xid works. > > In a cluster setting you take the CSN value on the master, then before > starting execution on a standby you wait until that the standby has > replayed enough WAL to reach the CSN point read from the master and > you know that after that everything that the snapshot can see is also > replayed on the standby. > > The wait for replication can be optimized if the client takes note of > the CSN that its last transaction committed with and negotiates a new > snapshot across the cluster that is the same or larger so you only > need to wait until the point until your specific transaction has been > replicated. This allows for the replication time to overlap with > client think time between receiving commit confirmation and taking a > new snapshot. > > This scheme can almost work now for streaming replication if you > replace CSN with WAL LSN of the commit record. The issue prohibiting > it is the fact that visibility order of commits on the master is > determined by the order that commiters acquire ProcArrayLock, and that > can be different from the order of WALInsertLock that determines the > ordering of LSNs, whereas visibility on the slave instance is > determined purely by WAL LSN order. Thanks for detailed explanation. The idea of CSN is quite impressive. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
pgsql-hackers by date: