Thread: PG wire protocol question

PG wire protocol question

From

Boszormenyi Zoltan

Date:

14 May 2016, 19:59:03

Hi,

it was a long time I have read this list or written to it.

Now, I have a question. This blog post was written about 3 years ago:
https://aphyr.com/posts/282-jepsen-postgres

Basically, it talks about the client AND the server as a system
and if the network is cut between sending COMMIT and
receiving the answer for it, the client has no way to know
whether the transaction was actually committed.

The client connection may just timeout and a reconnect would
give it a new connection but it cannot pick up its old connection
where it left. So it cannot really know whether the old transaction
was committed or not, possibly without doing expensive queries first.

Has anything changed on that front?

There is a 10.0 debate on -hackers. If this problem posed by
the above article is not fixed yet and needs a new wire protocol
to get it fixed, 10.0 would be justified.

Thanks in advance,
Zoltán Böszörményi

Re: PG wire protocol question

From

Albe Laurenz

Date:

17 May 2016, 13:29:10

Boszormenyi Zoltan wrote:
> it was a long time I have read this list or written to it.
> 
> Now, I have a question. This blog post was written about 3 years ago:
> https://aphyr.com/posts/282-jepsen-postgres
> 
> Basically, it talks about the client AND the server as a system
> and if the network is cut between sending COMMIT and
> receiving the answer for it, the client has no way to know
> whether the transaction was actually committed.
> 
> The client connection may just timeout and a reconnect would
> give it a new connection but it cannot pick up its old connection
> where it left. So it cannot really know whether the old transaction
> was committed or not, possibly without doing expensive queries first.
> 
> Has anything changed on that front?

That blog post seems ill-informed - that has nothing to do with
two-phase commit.

The problem - that the server may commit a transaction, but the client
never receives the server's response - is independent of whether
two-phase commit is used or not.

This is not a problem of PostgreSQL, it is a generic problem of communication.

What would be the alternative?
That the server has to wait for the client to receive the commit response?
But what if the client received the message and the server or the network
go down before the server learns of the fact?
You see that this would lead to an infinite regress.

Yours,
Laurenz Albe

Re: PG wire protocol question

From

Manuel Gómez

Date:

17 May 2016, 13:55:05

On Tue, May 17, 2016 at 9:29 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> That blog post seems ill-informed - that has nothing to do with
> two-phase commit.
>
> The problem - that the server may commit a transaction, but the client
> never receives the server's response - is independent of whether
> two-phase commit is used or not.

The author addresses this in a comment within the linked page:

«The database may be consistent, but the system isn’t. A concurrent
request to the db will get the answer “yes, the transaction has
committed”, but the same request of the remote client gets “no, the
transaction has not yet committed.” The system may eventuallybecome
consistent, if the partition is healed and the acknowledgement reaches
the client. But it isn’t consistent until that point.

And the client can’t just wait indefinitely for acknowledgement–the
commit request may not have reached the server, in which case the
client would deadlock forever. Not to mention practical concerns (a
customer and clerk aren’t going to wait very long for a credit card
transaction to complete). Introducing timeouts then causes the
temporary inconsistency to become permanent.»

Re: PG wire protocol question

From

George Neuner

Date:

17 May 2016, 16:32:00

On Sat, 14 May 2016 21:58:48 +0200, Boszormenyi Zoltan <zboszor@pr.hu>
wrote:

>Hi,
>
>it was a long time I have read this list or written to it.
>
>Now, I have a question. This blog post was written about 3 years ago:
>https://aphyr.com/posts/282-jepsen-postgres
>
>Basically, it talks about the client AND the server as a system
>and if the network is cut between sending COMMIT and
>receiving the answer for it, the client has no way to know
>whether the transaction was actually committed.
>
>The client connection may just timeout and a reconnect would
>give it a new connection but it cannot pick up its old connection
>where it left. So it cannot really know whether the old transaction
>was committed or not, possibly without doing expensive queries first.
>
>Has anything changed on that front?
>
>There is a 10.0 debate on -hackers. If this problem posed by
>the above article is not fixed yet and needs a new wire protocol
>to get it fixed, 10.0 would be justified.

It isn't going to be fixed ... it is a basic *unsolvable* problem in
communication theory that affects coordination in any distributed
system.  For a simple explanation, see

https://en.wikipedia.org/wiki/Two_Generals'_Problem


>Thanks in advance,
>Zoltán Böszörményi

George

Re: PG wire protocol question

From

Boszormenyi Zoltan

Date:

18 May 2016, 10:05:34

2016-05-17 15:29 keltezéssel, Albe Laurenz írta:
> Boszormenyi Zoltan wrote:
>> it was a long time I have read this list or written to it.
>>
>> Now, I have a question. This blog post was written about 3 years ago:
>> https://aphyr.com/posts/282-jepsen-postgres
>>
>> Basically, it talks about the client AND the server as a system
>> and if the network is cut between sending COMMIT and
>> receiving the answer for it, the client has no way to know
>> whether the transaction was actually committed.
>>
>> The client connection may just timeout and a reconnect would
>> give it a new connection but it cannot pick up its old connection
>> where it left. So it cannot really know whether the old transaction
>> was committed or not, possibly without doing expensive queries first.
>>
>> Has anything changed on that front?
> That blog post seems ill-informed - that has nothing to do with
> two-phase commit.

In the blog post 2pc was mentioned related to the communication,
not as a transaction control inside the database. I wouldn't call
it misinformed. After all, terminology can mean different things
in different contexts.

> The problem - that the server may commit a transaction, but the client
> never receives the server's response - is independent of whether
> two-phase commit is used or not.
>
> This is not a problem of PostgreSQL, it is a generic problem of communication.

Indeed.

> What would be the alternative?
> That the server has to wait for the client to receive the commit response?

Not quite. That would mean constantly sending an ack that the other
received the last ack, which would be silly.

If the network connection is cut, the client should be able to
reconnect to the old backend and query the last state and continue
where it left, maybe confirming via some key or UUID that it was
indeed the client that connected previously.

> But what if the client received the message and the server or the network
> go down before the server learns of the fact?
> You see that this would lead to an infinite regress.
>
> Yours,
> Laurenz Albe
>

Re: PG wire protocol question

From

Merlin Moncure

Date:

19 May 2016, 14:19:21

On Wed, May 18, 2016 at 5:05 AM, Boszormenyi Zoltan <zboszor@pr.hu> wrote:
> 2016-05-17 15:29 keltezéssel, Albe Laurenz írta:
>>
>> Boszormenyi Zoltan wrote:
>>>
>>> it was a long time I have read this list or written to it.
>>>
>>> Now, I have a question. This blog post was written about 3 years ago:
>>> https://aphyr.com/posts/282-jepsen-postgres
>>>
>>> Basically, it talks about the client AND the server as a system
>>> and if the network is cut between sending COMMIT and
>>> receiving the answer for it, the client has no way to know
>>> whether the transaction was actually committed.
>>>
>>> The client connection may just timeout and a reconnect would
>>> give it a new connection but it cannot pick up its old connection
>>> where it left. So it cannot really know whether the old transaction
>>> was committed or not, possibly without doing expensive queries first.
>>>
>>> Has anything changed on that front?
>>
>> That blog post seems ill-informed - that has nothing to do with
>> two-phase commit.
>
> Not quite. That would mean constantly sending an ack that the other
> received the last ack, which would be silly.
>
> If the network connection is cut, the client should be able to
> reconnect to the old backend and query the last state and continue
> where it left, maybe confirming via some key or UUID that it was
> indeed the client that connected previously.

I agree. It's the server's job to make sure itself is consistent.  If
the client is suspicious it may have lost the ack for whatever reason,
it needs to verify against the database that the transaction
succeeded.  This is an application problem, not a protocol problem.

merlin