Re: eXtensible Transaction Manager API - Mailing list pgsql-hackers

From Robert Haas
Subject Re: eXtensible Transaction Manager API
Date
Msg-id CA+TgmoZz-Ce318XEP4y_mro67fJ2q04a3G=Qrv2-mj7s2XLbPg@mail.gmail.com
Whole thread Raw
In response to Re: eXtensible Transaction Manager API  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: eXtensible Transaction Manager API  (Simon Riggs <simon@2ndQuadrant.com>)
Re: eXtensible Transaction Manager API  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Sun, Nov 8, 2015 at 6:35 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Sure. Now imagine that the pg_twophase entry is corrupted for this
> transaction on one node. This would trigger a PANIC on it, and
> transaction would not be committed everywhere.

If the database is corrupted, there's no way to guarantee that
anything works as planned.  This is like saying that criticizing
somebody's disaster recovery plan on the basis that it will be
inadequate if the entire planet earth is destroyed.

> I am aware of the fact
> that by definition PREPARE TRANSACTION ensures that a transaction will
> be committed with COMMIT PREPARED, just trying to see any corner cases
> with the approach proposed. The DTM approach is actually rather close
> to what a GTM in Postgres-XC does :)

Yes.  I think that we should try to learn as much as possible from the
XC experience, but that doesn't mean we should incorporate XC's fuzzy
thinking about 2PC into PG.  We should not.

One point I'd like to mention is that it's absolutely critical to
design this in a way that minimizes network roundtrips without
compromising correctness.  XC's GTM proxy suggests that they failed to
do that.  I think we really need to look at what's going to be on the
other sides of the proposed APIs and think about whether it's going to
be possible to have a strong local caching layer that keeps network
roundtrips to a minimum.  We should consider whether the need for such
a caching layer has any impact on what the APIs should look like.

For example, consider a 10-node cluster where each node has 32 cores
and 32 clients, and each client is running lots of short-running SQL
statements.  The demand for snapshots will be intense.  If every
backend separately requests a snapshot for every SQL statement from
the coordinator, that's probably going to be terrible.  We can make it
the problem of the stuff behind the DTM API to figure out a way to
avoid that, but maybe that's going to result in every DTM needing to
solve the same problems.

Another thing that I think we need to consider is fault-tolerance.
For example, suppose that transaction commit and snapshot coordination
services are being provided by a central server which keeps track of
the global commit ordering.  When that server gets hit by a freak bold
of lightning and melted into a heap of slag, somebody else needs to
take over.  Again, this would live below the API proposed here, but I
think it really deserves some thought before we get too far down the
path.  XC didn't start thinking about how to add fault-tolerance until
quite late in the project, I think, and the same could be said of
PostgreSQL itself: some newer systems have easier-to-use fault
tolerance mechanisms because it was designed in from the beginning.
Distributed systems by nature need high availability to a far greater
degree than single systems, because when there are more nodes, node
failures are more frequent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Multixid hindsight design
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: PL/Pythonu - function ereport