Re: Parallell Optimizer - Mailing list pgsql-hackers
From | Ants Aasma |
---|---|
Subject | Re: Parallell Optimizer |
Date | |
Msg-id | CA+CSw_vCBYbzzbz9LuGscGvkU8ZPbKVZ6Ou3jd7S78aOyRqH6g@mail.gmail.com Whole thread Raw |
In response to | Re: Parallell Optimizer (Hannu Krosing <hannu@2ndQuadrant.com>) |
List | pgsql-hackers |
On Thu, Jun 13, 2013 at 11:39 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>> Coincidentally getting cluster wide consistent snapshots and delaying >>> until some specific point in commit ordering is almost trivial to >>> solve with Commit Sequence Number based snapshot scheme that I >>> proposed. >> Can you elaborate more on this? Suppose streaming replication primary >> commits xid = X at time Y. Later on a standy receives WAL including tx >> X and commit it at time Y + 3 seconds. How can a parallel query >> execution (which uses snapshot including X) on the standby be delayed >> until Y + 3 seconds? > I do not think that CSN's change anything basic here, as CSN's > are still local to each node. I was mainly talking about what would be needed to support parallel queries in a single master configuration. > What you need is ability to ask for each node to wait until XID > is replicated to it. > > Unless you have some central XID/Snapshot source, there is > no global absolute XID order. That is there may be a transaction > which is committed on node A and not yet on node B and at the > same time a transaction which is committed on node B and not > yet on node A. > > So to get consistent snapshot "after X is committed" in multimaster > you need some coordination and possibly compromises w.r.t. "single > point in time" > > Time in multimaster replication is relativistic, that is the order > of events may depend on where the observer is :) You can get total commit ordering and a non-relativistic database with reasonably low synchronization overhead. You will need a central coordinator that keeps track of latest commit sequence number assigned and largest commit sequence number guaranteed to have finished committing. Snapshots are assigned from the latter number, the value can be cached by nodes as any number less than the actual value is guaranteed consistent. Check out the concurrency control of Google's Spanner database[1] for ideas how this can be done with less consistency and avoiding the single point of failure. A central coordinator won't work for multi-master scenarios where individual masters need to be able to receive commits even with communication failures. In that case a relativistic view is unavoidable. No replication solution is a silver bullet. Some people want simple scale out for performance without having to deal with complexity of an inconsistent view of the database, while others need geographic distribution and resilience to network problems. It's fundamentally impossible to provide both with the same solution. [1] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/spanner-osdi2012.pdf Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
pgsql-hackers by date: