Re: [HACKERS] Transactions involving multiple postgres foreignservers - Mailing list pgsql-hackers

From Stas Kelvich
Subject Re: [HACKERS] Transactions involving multiple postgres foreignservers
Date
Msg-id 2BBA6115-4CA6-471E-88D5-03FAA96A0BD1@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> On 31 Jul 2017, at 20:03, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Regardless of whether we share XIDs or DXIDs, we need a more complex
> concept of transaction state than we have now.

Seems that discussion shifted from 2PC itself to the general issues with distributed
transactions. So it is probably appropriate to share here resume of things that we
done in area of distributed visibility. During last two years we tried three quite different
approaches and finally settled with Clock-SI.

At first, to test different approaches we did small patch that wrap calls to visibility-related
functions (SetTransactionStatus, GetSnapshot, etc. Described in detail at wiki[1] ) in order to
allow overload them from extension. Such approach allows to implement almost anything
related to distributed visibility since you have full control about how local visibility is done.
That API isn’t hard prerequisite, and if one wants to create some concrete implementation
it can be done just in place. However, I think it is good to have such API in some form.

So three approaches that we tried:

1) Postgres-XL-like:

That is most straightforward way. Basically we need separate network service (GTM/DTM) that is
responsible for xid generation, and managing running-list of transactions. So acquiring
xid and snapshot is done by network calls. Because of shared xid space it is possible
to compare them in ordinary way and get right order. Gap between non-simultaneous
commits by 2pc is covered by the fact that we getting our snapshots from GTM, and
it will remove xid from running list only when transaction committed on both nodes.

Such approach is okay for OLAP-style transactions where tps isn’t high. But OLTP with
high transaction rate GTM will immediately became a bottleneck since even write transactions
need to get snapshot from GTM. Even if they access only one node.


2) Incremental SI [2]

Approach with central coordinator, that can allow local reads without network
communications by slightly altering visibility rules.

Despite the fact that it is kind of patented, we also failed to achieve proper visibility
by implementing algorithms from that paper. It always showed some inconsistencies.
May be because of bugs in our implementation, may be because of some
typos/mistakes in algorithm description itself. Reasoning in paper wasn’t very
clear for us, as well as patent issues, so we just leaved that.


3) Clock-SI [3]

It is MS research paper, that describes algorithm similar to ones used in Spanner and
CockroachDB, without central GTM and with reads that do not require network roundtrip.

There are two ideas behind it:

* Assuming snapshot isolation and visibility on node are based on CSN, use local time as CSN,
then when you are doing 2PC, collect prepare time from all participating nodes and
commit transaction everywhere with maximum of that times. If node during read faces tuples
committed by tx with CSN greater then their snapshot CSN (that can happen due to
time desynchronisation on node) then it just waits until that time come. So time desynchronisation
can affect performance, but can’t affect correctness.

* During distributed commit transaction neither running (if it commits then tuple
should be already visible) nor committed/aborted (it still can be aborted, so it is illegal to read).
So here IN-DOUBT transaction state appears, when reader should wait for writers.

We managed to implement that using mentioned XTM api. XID<->CSN mapping is
accounted by extension itself. Speed/scalability are also good.

I want to resubmit implementation of that algorithm for FDW later in August, along with some
isolation tests based on set of queries in [4].


[1] https://wiki.postgresql.org/wiki/DTM#eXtensible_Transaction_Manager_API
[2] http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf
[3] https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/samehe-clocksi.srds2013.pdf
[4] https://github.com/ept/hermitage


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





pgsql-hackers by date:

Previous
From: Pavel Golub
Date:
Subject: Re: [HACKERS] AlterUserStmt anmd RoleSpec rules in grammar.y
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU