Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CA+fd4k5aFADyeNe4_Rk-W+Y2a1J7LWikj5zzUk7+5XYepmYO5g@mail.gmail.com
Whole thread Raw
In response to RE: Transactions involving multiple postgres foreign servers, take 2  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: Transactions involving multiple postgres foreign servers, take 2
List pgsql-hackers
On Fri, 9 Oct 2020 at 11:33, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> > What about temporary network failures? I think there are users who
> > don't want to give up resolving foreign transactions failed due to a
> > temporary network failure. Or even they might want to wait for
> > transaction completion until they send a cancel request. If we want to
> > call the commit routine only once and therefore want FDW to retry
> > connecting the foreign server within the call, it means we require all
> > FDW implementors to write a retry loop code that is interruptible and
> > ensures not to raise an error, which increases difficulty.
> >
> > Yes, but if we don’t retry to resolve foreign transactions at all on
> > an unreliable network environment, the user might end up requiring
> > every transaction to check the status of foreign transactions of the
> > previous distributed transaction before starts. If we allow to do
> > retry, I guess we ease that somewhat.
>
> OK.  As I said, I'm not against trying to cope with temporary network failure.  I just don't think it's mandatory.
Ifthe network failure is really temporary and thus recovers soon, then the resolver will be able to commit the
transactionsoon, too. 

Well, I agree that it's not mandatory. I think it's better if the user
can choose.

I also doubt how useful the per-foreign-server timeout setting you
mentioned before. For example, suppose the transaction involves with
three foreign servers that have different timeout setting, what if the
backend failed to commit on the first one of the server due to
timeout? Does it attempt to commit on the other two servers? Or does
it give up and return the control to the client? In the former case,
what if the backend failed again on one of the other two servers due
to timeout? The backend might end up waiting for all timeouts and in
practice the user is not aware of how many servers are involved with
the transaction, for example in a sharding. So It seems to be hard to
predict the total timeout. In the latter case, the backend might
succeed to commit on the other two nodes. Also, the timeout setting of
the first foreign server virtually is used as the whole foreign
transaction resolution timeout. However, the user cannot control the
order of resolution. So again it seems to be hard for the user to
predict the timeout. So If we have a timeout mechanism, I think it's
better if the user can control the timeout for each transaction.
Probably the same is true for the retry.

>
> Then, we can have a commit retry timeout or retry count like the following WebLogic manual says.  (I couldn't quickly
findthe English manual, so below is in Japanese.  I quoted some text that got through machine translation, which
appearsa bit strange.) 
>
> https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm
> --------------------------------------------------
> Abandon timeout
> Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a
two-phasecommit transaction. 
>
> In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction
untilall resource managers indicate that the transaction is complete. After the abort transaction timer expires, no
attemptis made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the
transactionmanager rolls back the transaction and releases the held lock on behalf of the destroyed transaction. 
> --------------------------------------------------

Yeah per-transaction timeout for 2nd phase of 2PC seems a good idea.

>
>
>
> > Also, what if the user sets the statement timeout to 60 sec and they
> > want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
> > that client libraries of other DBMSs don't have asynchronous execution
> > functionality. If the SQL execution function is not interruptible, the
> > user will end up waiting for 60 sec, which seems not good.
>
> FDW functions can be uninterruptible in general, aren't they?  We experienced that odbc_fdw didn't allow cancellation
ofSQL execution. 

For example in postgres_fdw, it executes a SQL in asynchronous manner
using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on
(see do_sql_command() and pgfdw_get_result()). Therefore it the user
pressed ctl-C, the remote query would be canceled and raise an ERROR.


Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Next
From: John Naylor
Date:
Subject: Re: [PATCH] ecpg: fix progname memory leak