Thread: PG wire protocol question
Hi, it was a long time I have read this list or written to it. Now, I have a question. This blog post was written about 3 years ago: https://aphyr.com/posts/282-jepsen-postgres Basically, it talks about the client AND the server as a system and if the network is cut between sending COMMIT and receiving the answer for it, the client has no way to know whether the transaction was actually committed. The client connection may just timeout and a reconnect would give it a new connection but it cannot pick up its old connection where it left. So it cannot really know whether the old transaction was committed or not, possibly without doing expensive queries first. Has anything changed on that front? There is a 10.0 debate on -hackers. If this problem posed by the above article is not fixed yet and needs a new wire protocol to get it fixed, 10.0 would be justified. Thanks in advance, Zoltán Böszörményi
Boszormenyi Zoltan wrote: > it was a long time I have read this list or written to it. > > Now, I have a question. This blog post was written about 3 years ago: > https://aphyr.com/posts/282-jepsen-postgres > > Basically, it talks about the client AND the server as a system > and if the network is cut between sending COMMIT and > receiving the answer for it, the client has no way to know > whether the transaction was actually committed. > > The client connection may just timeout and a reconnect would > give it a new connection but it cannot pick up its old connection > where it left. So it cannot really know whether the old transaction > was committed or not, possibly without doing expensive queries first. > > Has anything changed on that front? That blog post seems ill-informed - that has nothing to do with two-phase commit. The problem - that the server may commit a transaction, but the client never receives the server's response - is independent of whether two-phase commit is used or not. This is not a problem of PostgreSQL, it is a generic problem of communication. What would be the alternative? That the server has to wait for the client to receive the commit response? But what if the client received the message and the server or the network go down before the server learns of the fact? You see that this would lead to an infinite regress. Yours, Laurenz Albe
On Tue, May 17, 2016 at 9:29 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > That blog post seems ill-informed - that has nothing to do with > two-phase commit. > > The problem - that the server may commit a transaction, but the client > never receives the server's response - is independent of whether > two-phase commit is used or not. The author addresses this in a comment within the linked page: «The database may be consistent, but the system isn’t. A concurrent request to the db will get the answer “yes, the transaction has committed”, but the same request of the remote client gets “no, the transaction has not yet committed.” The system may eventuallybecome consistent, if the partition is healed and the acknowledgement reaches the client. But it isn’t consistent until that point. And the client can’t just wait indefinitely for acknowledgement–the commit request may not have reached the server, in which case the client would deadlock forever. Not to mention practical concerns (a customer and clerk aren’t going to wait very long for a credit card transaction to complete). Introducing timeouts then causes the temporary inconsistency to become permanent.»
On Sat, 14 May 2016 21:58:48 +0200, Boszormenyi Zoltan <zboszor@pr.hu> wrote: >Hi, > >it was a long time I have read this list or written to it. > >Now, I have a question. This blog post was written about 3 years ago: >https://aphyr.com/posts/282-jepsen-postgres > >Basically, it talks about the client AND the server as a system >and if the network is cut between sending COMMIT and >receiving the answer for it, the client has no way to know >whether the transaction was actually committed. > >The client connection may just timeout and a reconnect would >give it a new connection but it cannot pick up its old connection >where it left. So it cannot really know whether the old transaction >was committed or not, possibly without doing expensive queries first. > >Has anything changed on that front? > >There is a 10.0 debate on -hackers. If this problem posed by >the above article is not fixed yet and needs a new wire protocol >to get it fixed, 10.0 would be justified. It isn't going to be fixed ... it is a basic *unsolvable* problem in communication theory that affects coordination in any distributed system. For a simple explanation, see https://en.wikipedia.org/wiki/Two_Generals'_Problem >Thanks in advance, >Zoltán Böszörményi George
2016-05-17 15:29 keltezéssel, Albe Laurenz írta: > Boszormenyi Zoltan wrote: >> it was a long time I have read this list or written to it. >> >> Now, I have a question. This blog post was written about 3 years ago: >> https://aphyr.com/posts/282-jepsen-postgres >> >> Basically, it talks about the client AND the server as a system >> and if the network is cut between sending COMMIT and >> receiving the answer for it, the client has no way to know >> whether the transaction was actually committed. >> >> The client connection may just timeout and a reconnect would >> give it a new connection but it cannot pick up its old connection >> where it left. So it cannot really know whether the old transaction >> was committed or not, possibly without doing expensive queries first. >> >> Has anything changed on that front? > That blog post seems ill-informed - that has nothing to do with > two-phase commit. In the blog post 2pc was mentioned related to the communication, not as a transaction control inside the database. I wouldn't call it misinformed. After all, terminology can mean different things in different contexts. > The problem - that the server may commit a transaction, but the client > never receives the server's response - is independent of whether > two-phase commit is used or not. > > This is not a problem of PostgreSQL, it is a generic problem of communication. Indeed. > What would be the alternative? > That the server has to wait for the client to receive the commit response? Not quite. That would mean constantly sending an ack that the other received the last ack, which would be silly. If the network connection is cut, the client should be able to reconnect to the old backend and query the last state and continue where it left, maybe confirming via some key or UUID that it was indeed the client that connected previously. > But what if the client received the message and the server or the network > go down before the server learns of the fact? > You see that this would lead to an infinite regress. > > Yours, > Laurenz Albe >
On Wed, May 18, 2016 at 5:05 AM, Boszormenyi Zoltan <zboszor@pr.hu> wrote: > 2016-05-17 15:29 keltezéssel, Albe Laurenz írta: >> >> Boszormenyi Zoltan wrote: >>> >>> it was a long time I have read this list or written to it. >>> >>> Now, I have a question. This blog post was written about 3 years ago: >>> https://aphyr.com/posts/282-jepsen-postgres >>> >>> Basically, it talks about the client AND the server as a system >>> and if the network is cut between sending COMMIT and >>> receiving the answer for it, the client has no way to know >>> whether the transaction was actually committed. >>> >>> The client connection may just timeout and a reconnect would >>> give it a new connection but it cannot pick up its old connection >>> where it left. So it cannot really know whether the old transaction >>> was committed or not, possibly without doing expensive queries first. >>> >>> Has anything changed on that front? >> >> That blog post seems ill-informed - that has nothing to do with >> two-phase commit. > > Not quite. That would mean constantly sending an ack that the other > received the last ack, which would be silly. > > If the network connection is cut, the client should be able to > reconnect to the old backend and query the last state and continue > where it left, maybe confirming via some key or UUID that it was > indeed the client that connected previously. I agree. It's the server's job to make sure itself is consistent. If the client is suspicious it may have lost the ack for whatever reason, it needs to verify against the database that the transaction succeeded. This is an application problem, not a protocol problem. merlin