Re: [HACKERS] Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Transactions involving multiple postgres foreign servers
Date
Msg-id CAD21AoB0M2Zo7aXcJVJQ_MuM6CmrZJGvaGikjhMHMzR7HeSPGg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: [HACKERS] Transactions involving multiple postgres foreignservers
List pgsql-hackers
On Wed, Dec 13, 2017 at 10:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Dec 13, 2017 at 12:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> The question I have is how would we deal with a foreign server that is
>>>> not available for longer duration due to crash, longer network outage
>>>> etc. Example is the foreign server crashed/got disconnected after
>>>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>>>> blocked for much longer duration without user having an idea of what's
>>>> going on. May be we should add some timeout.
>>>
>>> After more thought, I agree with adding some timeout. I can image
>>> there are users who want the timeout, for example, who cannot accept
>>> even a few seconds latency. If the timeout occurs backend unlocks the
>>> foreign transactions and breaks the loop. The resolver process will
>>> keep to continue to resolve foreign transactions at certain interval.
>>
>> I don't think a timeout is a very good idea.  There is no timeout for
>> synchronous replication and the issues here are similar.  I will not
>> try to block a patch adding a timeout, but I think it had better be
>> disabled by default and have very clear documentation explaining why
>> it's really dangerous.  And this is why: with no timeout, you can
>> count on being able to see the effects of your own previous
>> transactions, unless at some point you sent a query cancel or got
>> disconnected.  With a timeout, you may or may not see the effects of
>> your own previous transactions depending on whether or not you hit the
>> timeout, which you have no sure way of knowing.
>>
>>>>> transactions after the coordinator server recovered. On the other
>>>>> hand, for the reading a consistent result on such situation by
>>>>> subsequent reads, for example, we can disallow backends to inquiry SQL
>>>>> to the foreign server if a foreign transaction of the foreign server
>>>>> is remained.
>>>>
>>>> +1 for the last sentence. If we do that, we don't need the backend to
>>>> be blocked by resolver since a subsequent read accessing that foreign
>>>> server would get an error and not inconsistent data.
>>>
>>> Yeah, however the disadvantage of this is that we manage foreign
>>> transactions per foreign servers. If a transaction that modified even
>>> one table is remained as a in-doubt transaction, we cannot issue any
>>> SQL that touches that foreign server. Can we occur an error at
>>> ExecInitForeignScan()?
>>
>> I really feel strongly we shouldn't complicate the initial patch with
>> this kind of thing.  Let's make it enough for this patch to guarantee
>> that either all parts of the transaction commit eventually or they all
>> abort eventually.  Ensuring consistent visibility is a different and
>> hard project, and if we try to do that now, this patch is not going to
>> be done any time soon.
>>
>
> Thank you for the suggestion.
>
> I was really wondering if we should add a timeout to this feature.
> It's a common concern that we want to put a timeout at critical
> section. But currently we don't have such timeout to neither
> synchronous replication or writing WAL. I can image there will be
> users who want to a timeout for such cases but obviously it makes this
> feature more complex. Anyway, even if we add a timeout to this feature
> we can make it as a separated patch and feature. So I'd like to keep
> it simple as first step. This patch guarantees that the transaction
> commit or rollback on all foreign servers or not unless users doesn't
> cancel.
>
> Regards,
>

I've updated documentation of patches, and fixed some bugs. I did some
failure tests of this feature using a fault simulation tool[1] for
PostgreSQL that I created.

0001 patch adds a mechanism to track of writes on local server. This
is required to determine whether we should use 2pc at commit. 0002
patch is the main part. It adds a distributed transaction manager
(currently only for atomic commit), APIs for 2pc and foreign
transaction manager resolver process. 0003 patch makes postgres_fdw
support atomic commit using 2pc.

Please review patches.

[1] https://github.com/MasahikoSawada/pg_simula

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Murtuza Zabuawala
Date:
Subject: Re: pgadmin4 server error message processing
Next
From: Robert Haas
Date:
Subject: Re: Should we nonblocking open FIFO files in COPY?