Re: [HACKERS] Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Transactions involving multiple postgres foreign servers
Date
Msg-id CAD21AoAjOMH8fAhtmhuOSrFKoFC_Eu=coQbBfnYEU3KtGZ9GKQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Aug 1, 2017 at 1:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jul 27, 2017 at 8:25 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> The remote transaction can be committed/aborted only after the fate of
>> the local transaction is decided. If we commit remote transaction and
>> abort local transaction, that's not good. AtEOXact* functions are
>> called immediately after that decision in post-commit/abort phase. So,
>> if we want to commit/abort the remote transaction immediately it has
>> to be done in post-commit/abort processing. Instead if we delegate
>> that to the remote transaction resolved backend (introduced by the
>> patches) the delay between local commit and remote commits depends
>> upon when the resolve gets a chance to run and process those
>> transactions. One could argue that that delay would anyway exist when
>> post-commit/abort processing fails to resolve remote transaction. But
>> given the real high availability these days, in most of the cases
>> remote transaction will be resolved in the post-commit/abort phase. I
>> think we should optimize for most common case. Your concern is still
>> valid, that we shouldn't raise an error or do anything critical in
>> post-commit/abort phase. So we should device a way to send
>> COMMIT/ABORT prepared messages to the remote server in asynchronous
>> fashion carefully avoiding errors. Recent changes to 2PC have improved
>> performance in that area to a great extent. Relying on resolver
>> backend to resolve remote transactions would erode that performance
>> gain.
>
> I think there are two separate but interconnected issues here.  One is
> that if we give the user a new command prompt without resolving the
> remote transaction, then they might run a new query that sees their
> own work as committed, which would be bad.  Or, they might commit,
> wait for the acknowledgement, and then tell some other session to go
> look at the data, and find it not there.  That would also be bad.  I
> think the solution is likely to do something like what we did for
> synchronous replication in commit
> 9a56dc3389b9470031e9ef8e45c95a680982e01a -- wait for the remove
> transaction to be resolved (by the background process) but allow an
> interrupt to escape the wait-loop.
>
> The second issue is that having the resolver resolve transactions
> might be slower than doing it in the foreground.  I don't necessarily
> see a reason why that should be a big problem.  I mean, the resolver
> might need to establish a separate connection, but if it keeps that
> connection open for a while (say, 5 minutes) in case further
> transactions arrive then it won't be an issue except on really
> low-volume system which isn't really a case I think we need to worry
> about very much.  Also, the hand-off to the resolver might take some
> time, but that's equally true for sync rep and we're living with it
> there.  Anything else is presumably just the resolver itself being
> inefficient which seems like something that can simply be fixed.

I think using the solution similar to sync rep to wait for the
transaction to be resolved is a good way. One concern I have is that
if we have one resolver process per one backend process the switching
connection between participant nodes would be overhead. In current
implementation the backend process uses connection caches to the
remote server. On the other hand if we have one resolver process per
one database on remote server the backend process have to communicate
with multiple resolver processes.

> FWIW, I don't think the present resolver implementation is likely to
> be what we want.  IIRC, it's just calling an SQL function which
> doesn't seem like a good approach.  Ideally we should stick an entry
> into a shared memory queue and then ping the resolver via SetLatch,
> and it can directly invoke an FDW method on the data from the shared
> memory queue.  It should be possible to set things up so that a user
> who wishes to do so can run multiple copies of the resolver thread at
> the same time, which would be a good way to keep latency down if the
> system is very busy with distributed transactions.
>

In current implementation the resolver process exists for resolving
in-doubt transactions. That process periodically checks if there is
unresolved transaction on shared memory and tries to resolve it
according commit log. If we change it so that the backend process can
communicate with the resolver process via SetLatch the resolver
process is better to be implemented into core rather than as a contrib
module.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: [HACKERS] Red-Black tree traversal tests
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU