Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | CA+fd4k5aFADyeNe4_Rk-W+Y2a1J7LWikj5zzUk7+5XYepmYO5g@mail.gmail.com Whole thread Raw |
In response to | RE: Transactions involving multiple postgres foreign servers, take 2 ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Responses |
RE: Transactions involving multiple postgres foreign servers, take 2
|
List | pgsql-hackers |
On Fri, 9 Oct 2020 at 11:33, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > What about temporary network failures? I think there are users who > > don't want to give up resolving foreign transactions failed due to a > > temporary network failure. Or even they might want to wait for > > transaction completion until they send a cancel request. If we want to > > call the commit routine only once and therefore want FDW to retry > > connecting the foreign server within the call, it means we require all > > FDW implementors to write a retry loop code that is interruptible and > > ensures not to raise an error, which increases difficulty. > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > an unreliable network environment, the user might end up requiring > > every transaction to check the status of foreign transactions of the > > previous distributed transaction before starts. If we allow to do > > retry, I guess we ease that somewhat. > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. Ifthe network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transactionsoon, too. Well, I agree that it's not mandatory. I think it's better if the user can choose. I also doubt how useful the per-foreign-server timeout setting you mentioned before. For example, suppose the transaction involves with three foreign servers that have different timeout setting, what if the backend failed to commit on the first one of the server due to timeout? Does it attempt to commit on the other two servers? Or does it give up and return the control to the client? In the former case, what if the backend failed again on one of the other two servers due to timeout? The backend might end up waiting for all timeouts and in practice the user is not aware of how many servers are involved with the transaction, for example in a sharding. So It seems to be hard to predict the total timeout. In the latter case, the backend might succeed to commit on the other two nodes. Also, the timeout setting of the first foreign server virtually is used as the whole foreign transaction resolution timeout. However, the user cannot control the order of resolution. So again it seems to be hard for the user to predict the timeout. So If we have a timeout mechanism, I think it's better if the user can control the timeout for each transaction. Probably the same is true for the retry. > > Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly findthe English manual, so below is in Japanese. I quoted some text that got through machine translation, which appearsa bit strange.) > > https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm > -------------------------------------------------- > Abandon timeout > Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phasecommit transaction. > > In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction untilall resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attemptis made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transactionmanager rolls back the transaction and releases the held lock on behalf of the destroyed transaction. > -------------------------------------------------- Yeah per-transaction timeout for 2nd phase of 2PC seems a good idea. > > > > > Also, what if the user sets the statement timeout to 60 sec and they > > want to cancel the waits after 5 sec by pressing ctl-C? You mentioned > > that client libraries of other DBMSs don't have asynchronous execution > > functionality. If the SQL execution function is not interruptible, the > > user will end up waiting for 60 sec, which seems not good. > > FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation ofSQL execution. For example in postgres_fdw, it executes a SQL in asynchronous manner using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on (see do_sql_command() and pgfdw_get_result()). Therefore it the user pressed ctl-C, the remote query would be canceled and raise an ERROR. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: