Re: Parallell Optimizer - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Parallell Optimizer
Date
Msg-id CA+CSw_vCBYbzzbz9LuGscGvkU8ZPbKVZ6Ou3jd7S78aOyRqH6g@mail.gmail.com
Whole thread Raw
In response to Re: Parallell Optimizer  (Hannu Krosing <hannu@2ndQuadrant.com>)
List pgsql-hackers
On Thu, Jun 13, 2013 at 11:39 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>>> Coincidentally getting cluster wide consistent snapshots and delaying
>>> until some specific point in commit ordering is almost trivial to
>>> solve with Commit Sequence Number based snapshot scheme that I
>>> proposed.
>> Can you elaborate more on this? Suppose streaming replication primary
>> commits xid = X at time Y. Later on a standy receives WAL including tx
>> X and commit it at time Y + 3 seconds. How can a parallel query
>> execution (which uses snapshot including X) on the standby be delayed
>> until Y + 3 seconds?
> I do not think that CSN's change anything basic here, as CSN's
> are still local to each node.

I was mainly talking about what would be needed to support parallel
queries in a single master configuration.

> What you need is ability to ask for each node to wait until XID
> is replicated to it.
>
> Unless you have some central XID/Snapshot source, there is
> no global absolute XID order. That is there may be a transaction
> which is committed on node A and not yet on node B and at the
> same time a transaction which is committed on node B and not
> yet on node A.
>
> So to get consistent snapshot "after X is committed" in multimaster
> you need some coordination and possibly compromises w.r.t. "single
> point in time"
>
> Time in multimaster replication is relativistic, that is the order
> of events may depend on where the observer is :)

You can get total commit ordering and a non-relativistic database with
reasonably low synchronization overhead. You will need a central
coordinator that keeps track of latest commit sequence number assigned
and largest commit sequence number guaranteed to have finished
committing. Snapshots are assigned from the latter number, the value
can be cached by nodes as any number less than the actual value is
guaranteed consistent. Check out the concurrency control of Google's
Spanner database[1] for ideas how this can be done with less
consistency and avoiding the single point of failure.

A central coordinator won't work for multi-master scenarios where
individual masters need to be able to receive commits even with
communication failures. In that case a relativistic view is
unavoidable. No replication solution is a silver bullet. Some people
want simple scale out for performance without having to deal with
complexity of an inconsistent view of the database, while others need
geographic distribution and resilience to network problems. It's
fundamentally impossible to provide both with the same solution.

[1]
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/spanner-osdi2012.pdf

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: MD5 aggregate
Next
From: Ants Aasma
Date:
Subject: Re: Parallell Optimizer