Re: [HACKERS] Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Transactions involving multiple postgres foreign servers |
Date | |
Msg-id | CAD21AoB0M2Zo7aXcJVJQ_MuM6CmrZJGvaGikjhMHMzR7HeSPGg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Transactions involving multiple postgres foreign servers (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
RE: [HACKERS] Transactions involving multiple postgres foreignservers
|
List | pgsql-hackers |
On Wed, Dec 13, 2017 at 10:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Dec 13, 2017 at 12:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> The question I have is how would we deal with a foreign server that is >>>> not available for longer duration due to crash, longer network outage >>>> etc. Example is the foreign server crashed/got disconnected after >>>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain >>>> blocked for much longer duration without user having an idea of what's >>>> going on. May be we should add some timeout. >>> >>> After more thought, I agree with adding some timeout. I can image >>> there are users who want the timeout, for example, who cannot accept >>> even a few seconds latency. If the timeout occurs backend unlocks the >>> foreign transactions and breaks the loop. The resolver process will >>> keep to continue to resolve foreign transactions at certain interval. >> >> I don't think a timeout is a very good idea. There is no timeout for >> synchronous replication and the issues here are similar. I will not >> try to block a patch adding a timeout, but I think it had better be >> disabled by default and have very clear documentation explaining why >> it's really dangerous. And this is why: with no timeout, you can >> count on being able to see the effects of your own previous >> transactions, unless at some point you sent a query cancel or got >> disconnected. With a timeout, you may or may not see the effects of >> your own previous transactions depending on whether or not you hit the >> timeout, which you have no sure way of knowing. >> >>>>> transactions after the coordinator server recovered. On the other >>>>> hand, for the reading a consistent result on such situation by >>>>> subsequent reads, for example, we can disallow backends to inquiry SQL >>>>> to the foreign server if a foreign transaction of the foreign server >>>>> is remained. >>>> >>>> +1 for the last sentence. If we do that, we don't need the backend to >>>> be blocked by resolver since a subsequent read accessing that foreign >>>> server would get an error and not inconsistent data. >>> >>> Yeah, however the disadvantage of this is that we manage foreign >>> transactions per foreign servers. If a transaction that modified even >>> one table is remained as a in-doubt transaction, we cannot issue any >>> SQL that touches that foreign server. Can we occur an error at >>> ExecInitForeignScan()? >> >> I really feel strongly we shouldn't complicate the initial patch with >> this kind of thing. Let's make it enough for this patch to guarantee >> that either all parts of the transaction commit eventually or they all >> abort eventually. Ensuring consistent visibility is a different and >> hard project, and if we try to do that now, this patch is not going to >> be done any time soon. >> > > Thank you for the suggestion. > > I was really wondering if we should add a timeout to this feature. > It's a common concern that we want to put a timeout at critical > section. But currently we don't have such timeout to neither > synchronous replication or writing WAL. I can image there will be > users who want to a timeout for such cases but obviously it makes this > feature more complex. Anyway, even if we add a timeout to this feature > we can make it as a separated patch and feature. So I'd like to keep > it simple as first step. This patch guarantees that the transaction > commit or rollback on all foreign servers or not unless users doesn't > cancel. > > Regards, > I've updated documentation of patches, and fixed some bugs. I did some failure tests of this feature using a fault simulation tool[1] for PostgreSQL that I created. 0001 patch adds a mechanism to track of writes on local server. This is required to determine whether we should use 2pc at commit. 0002 patch is the main part. It adds a distributed transaction manager (currently only for atomic commit), APIs for 2pc and foreign transaction manager resolver process. 0003 patch makes postgres_fdw support atomic commit using 2pc. Please review patches. [1] https://github.com/MasahikoSawada/pg_simula Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
pgsql-hackers by date: