Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id 20201009.145514.78253792462097980.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: Transactions involving multiple postgres foreign servers, take 2  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: Transactions involving multiple postgres foreign servers, take 2
Re: Transactions involving multiple postgres foreign servers, take 2
List pgsql-hackers
At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> > What about temporary network failures? I think there are users who
> > don't want to give up resolving foreign transactions failed due to a
> > temporary network failure. Or even they might want to wait for
> > transaction completion until they send a cancel request. If we want to
> > call the commit routine only once and therefore want FDW to retry
> > connecting the foreign server within the call, it means we require all
> > FDW implementors to write a retry loop code that is interruptible and
> > ensures not to raise an error, which increases difficulty.
> >
> > Yes, but if we don’t retry to resolve foreign transactions at all on
> > an unreliable network environment, the user might end up requiring
> > every transaction to check the status of foreign transactions of the
> > previous distributed transaction before starts. If we allow to do
> > retry, I guess we ease that somewhat.
> 
> OK.  As I said, I'm not against trying to cope with temporary network failure.  I just don't think it's mandatory.
Ifthe network failure is really temporary and thus recovers soon, then the resolver will be able to commit the
transactionsoon, too.
 

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

(FWIW, I think remote commits should be performed by backends, not by
another process, because backends should wait for all remote-commits
to end anyway and it is simpler. If we want to multiple remote-commits
in parallel, we could do that by adding some async-waiting interface.)

> Then, we can have a commit retry timeout or retry count like the following WebLogic manual says.  (I couldn't quickly
findthe English manual, so below is in Japanese.  I quoted some text that got through machine translation, which
appearsa bit strange.)
 
> 
> https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm
> --------------------------------------------------
> Abandon timeout
> Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a
two-phasecommit transaction.
 
> 
> In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction
untilall resource managers indicate that the transaction is complete. After the abort transaction timer expires, no
attemptis made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the
transactionmanager rolls back the transaction and releases the held lock on behalf of the destroyed transaction.
 
> --------------------------------------------------

That's not a retry timeout but a timeout for total time of all
2nd-phase-commits.  But I think it would be sufficient.  Even if an
fdw could retry 2pc-commit, it's a matter of that fdw and the core has
nothing to do with.

> > Also, what if the user sets the statement timeout to 60 sec and they
> > want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
> > that client libraries of other DBMSs don't have asynchronous execution
> > functionality. If the SQL execution function is not interruptible, the
> > user will end up waiting for 60 sec, which seems not good.

I think fdw-2pc-commit can be interruptible safely as far as we run
the remote commits before entring critical section of local commit.

> FDW functions can be uninterruptible in general, aren't they?  We experienced that odbc_fdw didn't allow cancellation
ofSQL execution.
 

At least postgres_fdw is interruptible while waiting the remote.

create view lt as select 1 as slp from (select pg_sleep(10)) t;
create foreign table ft(slp int) server sv1 options (table_name 'lt');
select * from ft;
^CCancel request sent
ERROR:  canceling statement due to user request

regrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: "Daniel Westermann (DWE)"
Date:
Subject: Re: Wrong example in the bloom documentation
Next
From: "Deng, Gang"
Date:
Subject: RE: [PoC] Non-volatile WAL buffer