Thread: Mixed up protocol packets in server response?

Mixed up protocol packets in server response?

From
Michal Politowski
Date:
I have an instance of what looks like a weird mixup of data returned by the server.
This is observed in a Java application using the 9.0-801.jdbc4 JDBC driver
connecting to an 8.3.4 server (yes, it's old) running on Solaris.

The application hung waiting for the result of one
select * from a_table where a_field = parameter;

Looking at it with a debugger I see that:
1. it waits in:
    SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native method]
    SocketInputStream.read(byte[], int, int) line: 129
    VisibleBufferedInputStream.read(byte[], int, int) line: 219
    PGStream.Receive(byte[], int, int) line: 460
    PGStream.ReceiveTupleV3() line: 365
    QueryExecutorImpl.processResults(ResultHandler, int) line: 1814
    QueryExecutorImpl.execute(Query, ParameterList, ResultHandler, int, int, int) line: 257
    Jdbc4PreparedStatement(AbstractJdbc2Statement).execute(Query, ParameterList, int) line: 500
    Jdbc4PreparedStatement(AbstractJdbc2Statement).executeWithFlags(int) line: 388
    Jdbc4PreparedStatement(AbstractJdbc2Statement).executeQuery() line: 273
because it expects a field of size = 825767394 (l_size in ReceiveTupleV3)
which is much more than the whole result should be.

2. then looking at the answer array in ReceiveTupleV3:
A few first fields have the expected values,
then answer[3] has the expected
size but somewhere in the middle it changes from field 3 of the first row
of the result to the middle of field 3 of the last row of the result.
Then of course there are 4 bytes of this field that as a 32-bit integer
have the value of 825767394.
Then answer[4] obviously has 825767394 elements. And it continues with the
field 3 of the last row and then it contains the 4-byte length and then
value of the field 4 (last) of the last row of the result - so the end of a
DataRow message.
Then a CommandComplete message with the tag "SELECT",
then a ReadyForQuery message with the status indicator "T".
And then there is the rest of the field 3 of the first row, then length and
value for the field 4 of the first row and then the DataRow messages for
the following rows of the result. The last one up until the place where
it got mixed up before.

So it looks like the server wanted to send
D row 1, D row 2, D row 3, D row 4, C SELECT, Z T
but the application sees
D ro, ow 4, C SELECT, Z T, w 1, D row 2, D row 3, D r

What may be the cause of this weird problem? Is it some known or unknown bug in
8.3.4 or is the application/Java side more suspected?

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.

Re: Mixed up protocol packets in server response?

From
Tom Lane
Date:
Michal Politowski <mpol+pg@meep.pl> writes:
> 2. then looking at the answer array in ReceiveTupleV3:
> ...
> So it looks like the server wanted to send
> D row 1, D row 2, D row 3, D row 4, C SELECT, Z T
> but the application sees
> D ro, ow 4, C SELECT, Z T, w 1, D row 2, D row 3, D r

> What may be the cause of this weird problem? Is it some known or unknown bug in
> 8.3.4 or is the application/Java side more suspected?

It's just about impossible to believe that the server sent that, because
(1) it generates and sends those messages one at a time, and (2) nothing
of the sort has ever been reported before.  My money is on some foulup
in buffer-wrangling on the client side.

            regards, tom lane

Re: Mixed up protocol packets in server response?

From
Craig Ringer
Date:
On 1/06/2011 9:06 PM, Michal Politowski wrote:

> What may be the cause of this weird problem? Is it some known or unknown bug in
> 8.3.4 or is the application/Java side more suspected?

It'd be really helpful if you could collect and examine a trace of the
client/server communication using WireShark. That way you can confirm
whether it is (as Tom suspects) the client side mangling its buffers, or
whether the server really did send the nonsensical sequence.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Mixed up protocol packets in server response?

From
Michal Politowski
Date:
On Thu,  2 Jun 2011 08:50:30 +0800, Craig Ringer wrote:
> On 1/06/2011 9:06 PM, Michal Politowski wrote:
>
> >What may be the cause of this weird problem? Is it some known or unknown bug in
> >8.3.4 or is the application/Java side more suspected?
>
> It'd be really helpful if you could collect and examine a trace of
> the client/server communication using WireShark. That way you can
> confirm whether it is (as Tom suspects) the client side mangling its
> buffers, or whether the server really did send the nonsensical
> sequence.

Actually my own money is with Tom's. It's very hard to believe Postgres
would do something like this, unless it were some obvious and long fixed bug in 8.3.4.

Still, trying to trace the communication makes sense, if I can convince the
owners of the system to let me do it. Unfortunately this is an
"one in a million of successful queries" (actually two in much more than a million)
problem. And the next run of the application seems not to have hit it, yet.

Thinking aloud: If this is, as it is to be suspected, an application-side problem,
there is at first sight not much space in the application where it could hide. The data is
mixed up in a driver buffer, two method calls from the standard library
socket code. There is the VisibleBufferedInputStream there. Could it do
something like this? Maybe if the connection was erroneously used from two threads
concurrently? The connections are pooled via commons-dbcp BasicDataSource
and queries are executed via Spring JdbcTemplate within Spring-configured
transaction. No passing connections by hand anywhere, everything should be
nicely thread-bound. Still, if not here, where could it go wrong?

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.

Re: Mixed up protocol packets in server response?

From
Tom Lane
Date:
Michal Politowski <mpol+pg@meep.pl> writes:
> Thinking aloud: If this is, as it is to be suspected, an application-side problem,
> there is at first sight not much space in the application where it could hide. The data is
> mixed up in a driver buffer, two method calls from the standard library
> socket code. There is the VisibleBufferedInputStream there. Could it do
> something like this? Maybe if the connection was erroneously used from two threads
> concurrently? The connections are pooled via commons-dbcp BasicDataSource
> and queries are executed via Spring JdbcTemplate within Spring-configured
> transaction. No passing connections by hand anywhere, everything should be
> nicely thread-bound. Still, if not here, where could it go wrong?

You'd probably be better off asking these questions in pgsql-jdbc ...
a lot of us here don't even speak Java.

            regards, tom lane

Re: Mixed up protocol packets in server response?

From
Michal Politowski
Date:
On Fri,  3 Jun 2011 09:53:59 -0400, Tom Lane wrote:
[...]
> You'd probably be better off asking these questions in pgsql-jdbc ...
> a lot of us here don't even speak Java.

Good point. Will try the other list. Thank you for your help.

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.

Re: Mixed up protocol packets in server response?

From
David Boreham
Date:
> transaction. No passing connections by hand anywhere, everything should be
> nicely thread-bound. Still, if not here, where could it go wrong?
>
I have seen two cases in my career where there was an "evil box" on the
network that corrupted the traffic.
The first was a very long time ago (in the late '80s) but the second was
only a couple of years ago and presented
with very similar symptoms to your report. This happened at a consulting
client's site (actually between two sites).
Weird "franken-packets" showed up once in a while, leading to a protocol
decode failure. Luckily we had been
involved in writing both the client and the server, and therefore had a
high degree of confidence that they were
correct. The network administrators denied strongly that they had any
equipment deployed that touched the
payload of any packet. They denied this several times. Eventually we
were able to take packet traces
on both client and server machines, correlate the traffic (not
necessarily an easy task), and prove
conclusively that what had been sent from one end did not show up intact
at the other end.
A few days later the network people revealed that they had some sort of
firewall/traffic management box
that was mangling the traffic.

Having said that, bugs in buffer management code are also not uncommon,
and can manifest intermittently
since they may be triggered by specific boundary conditions, specific
received data buffer size, and so on.

I have also seen once case of data leaking between threads in an
unpleasant and intermittent way
in a Java application, in buffer management code that attempted to avoid
GC overhead by re-using
buffers across sessions. So that's definitely a non-zero possibility too.