Thread: Strange error (Socket command option unknown)

Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

22 February 2003, 12:41:11

Hello,

I hope this is the right list to post this question.

I'm at a loss trying to figure out a strange error message
I'm getting  (at an excessively high rate -- around 200 times
per day, out of a total of several hundred thousands statements
executed per day).

The error message reads:

"FATAL 1: Socket command option e unknown"


And the server closes the connection.  Actually, the
complete error message contains several lines, and reads:

FATAL 1: Socket command option e unknown
server closed the connection unexpectedly
    This probably means that the server terminated abnormally
    before or while processing the request.


Except that the server does not shutdown  (more statements
continue to arrive, and they're executed).

I checked the docs, but I can't figure out what may be
happening, so I'm at a loss at trying to determine what to
do.  (the one thing I did notice is that it is related to
one specific statement -- however, it's not that the
statement is unconditionally wrong:  it does execute
several thousand times per day, and it succeeds most of
the time)

Any ideas?  Pointers where to look for more information
would also be highly appreciated!

Thanks,

Carlos
--

Re: Strange error (Socket command option unknown)

From

Justin Clift

Date:

22 February 2003, 18:35:54

Hi Carlos,

Just to give everyone a bit more background information, which version
of PostgreSQL are you having the troubles with, and which operating
system is it running on?

:-)

Regards and best wishes,

Justin Clift


Carlos Moreno wrote:
>
> Hello,
>
> I hope this is the right list to post this question.
>
> I'm at a loss trying to figure out a strange error message
> I'm getting  (at an excessively high rate -- around 200 times
> per day, out of a total of several hundred thousands statements
> executed per day).
>
> The error message reads:
>
> "FATAL 1: Socket command option e unknown"
>
>
> And the server closes the connection.  Actually, the
> complete error message contains several lines, and reads:
>
> FATAL 1: Socket command option e unknown
> server closed the connection unexpectedly
>    This probably means that the server terminated abnormally
>    before or while processing the request.
>
>
> Except that the server does not shutdown  (more statements
> continue to arrive, and they're executed).
>
> I checked the docs, but I can't figure out what may be
> happening, so I'm at a loss at trying to determine what to
> do.  (the one thing I did notice is that it is related to
> one specific statement -- however, it's not that the
> statement is unconditionally wrong:  it does execute
> several thousand times per day, and it succeeds most of
> the time)
>
> Any ideas?  Pointers where to look for more information
> would also be highly appreciated!
>
> Thanks,
>
> Carlos
> --
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi

Re: Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

22 February 2003, 19:49:17

Oops!  That was pretty irresponsible from me, not providing
that information!  :-(

Anyway, it's postgres 7.2.3 running on a RedHat 7.3 Linux
system, on a dual Athlon CPU (presumably installed
properly -- i.e., with the SMP kernel), with 1GB of
memory.

I installed postgres from the source, and changed the
parameters related to the shared buffers; I put something
around 250MB dedicated to shared buffers.

At peak times, we estimate that the system may do
something around 10 to 20 SQL statements per second;
maybe 70% to 80% of those are insert statements.  The
largest table has around a million records (we clean it
up weekly, since it would otherwise keep growing to
infinity).

Sorry for the lack of information in my initial post!  I
hope this will increase the chances that someone will
be familiar with the problem and can guide me to find
a solution!

Thanks,

Carlos
--

Justin Clift wrote:

> Hi Carlos,
>
> Just to give everyone a bit more background information, which version
> of PostgreSQL are you having the troubles with, and which operating
> system is it running on?
>
>
> [...]
>
>>
>> I'm at a loss trying to figure out a strange error message
>> I'm getting  (at an excessively high rate -- around 200 times
>> per day, out of a total of several hundred thousands statements
>> executed per day).
>>
>> The error message reads:
>>
>> "FATAL 1: Socket command option e unknown"
>>
>>
>> And the server closes the connection.  Actually, the
>> complete error message contains several lines, and reads:
>>
>> FATAL 1: Socket command option e unknown
>> server closed the connection unexpectedly
>>    This probably means that the server terminated abnormally
>>    before or while processing the request.
>>
>>
>> Except that the server does not shutdown  (more statements
>> continue to arrive, and they're executed).
>>
>> I checked the docs, but I can't figure out what may be
>> happening, so I'm at a loss at trying to determine what to
>> do.  (the one thing I did notice is that it is related to
>> one specific statement -- however, it's not that the
>> statement is unconditionally wrong:  it does execute
>> several thousand times per day, and it succeeds most of
>> the time)
>

Re: Strange error (Socket command option unknown)

From

Tom Lane

Date:

23 February 2003, 00:14:35

Carlos Moreno <moreno@mochima.com> writes:
> "FATAL 1: Socket command option e unknown"

You sure it doesn't say
    Socket command type e unknown
?  Because that's how it's spelled in tcop/postgres.c, and I don't see
any other messages like this.

This message is basically saying that the backend got an unexpected
start-of-message character from the frontend.  Usually this happens
because the frontend and backend have gotten out of sync, and the most
common cause of that is a failed COPY IN operation.  COPY IN has no
error-recovery ability worthy of the name :-(.

Does your app use COPY IN a lot?  What shows up in the postmaster log
right before these messages?

            regards, tom lane

Re: Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

23 February 2003, 10:04:17

Tom Lane wrote:

>Carlos Moreno <moreno@mochima.com> writes:
>
>>"FATAL 1: Socket command option e unknown"
>>
>
>You sure it doesn't say
>    Socket command type e unknown
>

No, I'm not sure  :-)   In fact, I *am sure* that it is indeed
the way you say  :-)

BTW, some times the type is p, instead of e.

>
>?  Because that's how it's spelled in tcop/postgres.c, and I don't see
>any other messages like this.
>
>This message is basically saying that the backend got an unexpected
>start-of-message character from the frontend.  Usually this happens
>because the frontend and backend have gotten out of sync, and the most
>common cause of that is a failed COPY IN operation.  COPY IN has no
>error-recovery ability worthy of the name :-(.
>

What is COPY IN??  I didn't find it in the documentation
(it talks about copy, copy from, copy to, copy binary, but
I didn't find copy in).

If COPY IN is an explicit SQL command, then I can tell
you for sure that my app. does not use a single one of
those  (the fact that I still have no idea what it is may be
offered as evidence of that :-)).  If it is something that
may be one of the parts involved in some other operation(s),
then maybe I'm using them  (it's just that I wouldn't know)

Now, what I understand from your explanation is that
the *communication* between server and client get
out of sync??  Is that possible even when both are
running on the same machine, and not even using
TCP/IP to communicate??   (the server does have
TCP/IP enabled, since other applications, running on
different machines use it, but this application, the one
logging these errors, is running locally, and connect
to the server with a connection string that specifies
only dbname and user)

I wonder if the fact that my app. is multi-threaded
(logging to database is something that has kind of
low-priority, and I don't want it to interfere with the
"main loop" of my app. -- the main loop is basically
the communications loop, which is critical for the
responsiveness of my server to the multiple clients).

So, I wonder if I (or postgres??) is doing something
wrong with the multiple requests??  There are only
two threads that make any use of the database:
one that logs to db (i.e., exclusively insert, update,
and delete commands), and one to process the
login authentications (users' names and passwords
are stored in the database, along with related
information that has to be retrieved at login-time).

So, is it possible that these two threads are "stepping
on each other's tails"?  The curious thing is that it
is always one specific insert statement the one
that causes the problem.  I'm pretty intrigued,
since I don't see a single difference in the way
that statement is produced or processed...  :-(

>Does your app use COPY IN a lot?  What shows up in the postmaster log
>right before these messages?
>

Errr....  Where would I find that??  Should I explicitly
redirect the output of postmaster when I run it??
(I didn't  :-( ).   So I guess I know the first thing I
have to do...??  :-)

Thanks for your message!

Carlos
--

Re: Strange error (Socket command option unknown)

From

Doug McNaught

Date:

23 February 2003, 10:16:12

Carlos Moreno <moreno@mochima.com> writes:

> I wonder if the fact that my app. is multi-threaded
> (logging to database is something that has kind of
> low-priority, and I don't want it to interfere with the
> "main loop" of my app. -- the main loop is basically
> the communications loop, which is critical for the
> responsiveness of my server to the multiple clients).
>
> So, I wonder if I (or postgres??) is doing something
> wrong with the multiple requests??  There are only
> two threads that make any use of the database:
> one that logs to db (i.e., exclusively insert, update,
> and delete commands), and one to process the
> login authentications (users' names and passwords
> are stored in the database, along with related
> information that has to be retrieved at login-time).

If you're using the database from two threads, you should either be
using one DB connection per thread, or protecting the single DB
connection with a mutex lock for the duration of each query.  One
connection per thread is probably simplest.

> So, is it possible that these two threads are "stepping
> on each other's tails"?

Quite possible, unless you're taking steps to prevent it.  Sharing a
single connection between independent threads with no locking is
probably not going to work.

-Doug

Re: Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

23 February 2003, 10:55:14

Doug McNaught wrote:

>Carlos Moreno <moreno@mochima.com> writes:
>
>
>>I wonder if the fact that my app. is multi-threaded
>>(logging to database is something that has kind of
>>low-priority, and I don't want it to interfere with the
>>"main loop" of my app. -- the main loop is basically
>>the communications loop, which is critical for the
>>responsiveness of my server to the multiple clients).
>>
>>So, I wonder if I (or postgres??) is doing something
>>wrong with the multiple requests??  There are only
>>two threads that make any use of the database:
>>one that logs to db (i.e., exclusively insert, update,
>>and delete commands), and one to process the
>>login authentications (users' names and passwords
>>are stored in the database, along with related
>>information that has to be retrieved at login-time).
>>
>
>If you're using the database from two threads, you should either be
>using one DB connection per thread, or protecting the single DB
>connection with a mutex lock for the duration of each query.  One
>connection per thread is probably simplest.
>

Hmm, are you referring to the PgDatabase object that I
use to connect to the database?  (I'm using libpq++)

If that is what you're referring to, then yes, I'm perfectly
fine there -- the PgDatabase object is a local variable in
each of the two threads.  Each thread loops checking
a "jobs queue" (properly synchronized -- honest! :-)),
and whenever it finds one or several SQL's to be
executed (or a login to be processed), *then* it creates
the PgDatabase object, which lives for the duration of
that loop -- i.e., the same PgDatabase object will be
used to execute the one or several jobs in the queue.

>>So, is it possible that these two threads are "stepping
>>on each other's tails"?
>>
>
>Quite possible, unless you're taking steps to prevent it.  Sharing a
>single connection between independent threads with no locking is
>probably not going to work.
>

No, that's not what is happening.  But I was wondering
if there is some information that indirectly is being shared
by the two threads (that it is because of my mistake or
postgres' mistake -- far less likely, of course)

BTW, after reading Tom's message, I restarted the system
to change the startup script and send all output from
postmaster to a log file.

Nothing relevant;  there are about a dozen logged messages
"Socket command type e unknown", and the line before is
either another identical message, or a failed insert (a
foreign key referential integrity violation).  These failed
inserts are not supposed to happen, but right now, it is
happening because of a minor bug recently fixed, that
was causing certain things not to be inserted to the DB,
and so now the subsequent inserts are making reference
to something that is not present.  Still, this should be
normal operation from the point of view of the server;
it is simply an insert that failed and was properly
reported by the backend, right?

Thanks,

Carlos
--

Re: Strange error (Socket command option unknown)

From

Tom Lane

Date:

23 February 2003, 11:58:03

Carlos Moreno <moreno@mochima.com> writes:
> Nothing relevant;  there are about a dozen logged messages
> "Socket command type e unknown", and the line before is
> either another identical message, or a failed insert (a
> foreign key referential integrity violation).

You should look very hard at the part of your code that deals with
error reports from the server.

It might be worth logging the whole session between client and
server with tcpdump --- that'd be tedious to wade through, but
it might give some evidence of what's happening.  It seems pretty
clear that the client is sending an incomplete message, but knowing
only one character of the broken message isn't much to help you
debug it.  With a tcpdump trace you could see the whole packet ...

            regards, tom lane

Re: Strange error (Socket command option unknown)

From

Doug McNaught

Date:

23 February 2003, 12:36:13

Carlos Moreno <moreno@mochima.com> writes:

> Doug McNaught wrote:
>
> >If you're using the database from two threads, you should either be
> >using one DB connection per thread, or protecting the single DB
> >connection with a mutex lock for the duration of each query.  One
> >connection per thread is probably simplest.
> >
>
> Hmm, are you referring to the PgDatabase object that I
> use to connect to the database?  (I'm using libpq++)

I don't know libpq++ but I would assume that each PgDatabase object
has its own underlying database connection.  If this is true than you
shouldn't see any thread-related problems.  You should read the
libpq++ docs and/or the source code to make sure of this, though.

> If that is what you're referring to, then yes, I'm perfectly
> fine there -- the PgDatabase object is a local variable in
> each of the two threads.  Each thread loops checking
> a "jobs queue" (properly synchronized -- honest! :-)),
> and whenever it finds one or several SQL's to be
> executed (or a login to be processed), *then* it creates
> the PgDatabase object, which lives for the duration of
> that loop -- i.e., the same PgDatabase object will be
> used to execute the one or several jobs in the queue.

Creating and destroying database connections on the fly is kind of
inefficient, but shouldn't cause the problems you're seeing.

I'm wondering if it might not be a libpq++ bug, just because you don't
seem to be doing anything nonstandard otherwise, and there haven't
been any reports that I've seen of people running into this using
regular libpq, especially since you're not using COPY...

> No, that's not what is happening.  But I was wondering
> if there is some information that indirectly is being shared
> by the two threads (that it is because of my mistake or
> postgres' mistake -- far less likely, of course)

As I said, it *might* be a libpq++ bug.  What you say below kind of
smells like it:

> BTW, after reading Tom's message, I restarted the system
> to change the startup script and send all output from
> postmaster to a log file.
>
> Nothing relevant;  there are about a dozen logged messages
> "Socket command type e unknown", and the line before is
> either another identical message, or a failed insert (a
> foreign key referential integrity violation).  These failed
> inserts are not supposed to happen, but right now, it is
> happening because of a minor bug recently fixed, that
> was causing certain things not to be inserted to the DB,
> and so now the subsequent inserts are making reference
> to something that is not present.  Still, this should be
> normal operation from the point of view of the server;
> it is simply an insert that failed and was properly
> reported by the backend, right?

Right.  Perhaps the client-side library isn't handling it properly.

Might be time to dig in with a debugger.  If you can trigger the
referential integrity violation at will, do so and step through the
libpq++ code with gdb (or whatever)--see if it handles the error from
the server properly (you'll need to read up on the wire protocol).

Good luck...

-Doug

Re: Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

23 February 2003, 13:15:11

Hi Doug,

Doug McNaught wrote:

>
>Creating and destroying database connections on the fly is kind of
>inefficient, but shouldn't cause the problems you're seeing.
>

But doing otherwise has the problem that if for some reason
the connection fails (or drops at some point), then it's gone
forever  (in libpq++, there is a way to check if the connection
is alive, but I see no method to reconnect;  only when creating
the PgDatabase or PgConnection object does it connect to
the database -- in fact, this was precisely the bug I was
referring to, the one that led to the referential integrity
violations right now).

>
>Right.  Perhaps the client-side library isn't handling it properly.
>
>Might be time to dig in with a debugger.  If you can trigger the
>referential integrity violation at will, do so and step through the
>libpq++ code with gdb (or whatever)--see if it handles the error from
>the server properly (you'll need to read up on the wire protocol).
>

Ouff!!  This sounds sooo scary!!  :-)

But I guess if that is the problem, then I might completely
eliminate it with a mutex for access to the database.  I.e.,
any time I execute *any* SQL statement, then I lock the
mutex.

Hmmm, however, that doesn't fit too well the fact that
it is one specific insert statement that *always* causes
the error.  Also, before the most recent changes, I had
very few of these errors (about 20 in a month and a half
of activity, versus a couple hundred in just two days),
and before, I was doing the login in the main thread,
and the logs to database in a separate thread...  Hmm,
intriguing...

I guess I'm going to try using a mutex for any access
to the database as my first step, to *determine* if the
problem is related to concurrent database accesses
from the two threads.  (hmmm, at this point, I wonder
if it is about time to upgrade to 7.3)

Thanks a lot -- and thanks to Tom -- for your comments
and suggestions!

Cheers,

Carlos
--

Re: Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

24 February 2003, 20:23:03

Tom Lane wrote:

>
>You should look very hard at the part of your code that deals with
>error reports from the server.
>
>It might be worth logging the whole session between client and
>server with tcpdump --- that'd be tedious to wade through, but
>it might give some evidence of what's happening.  It seems pretty
>clear that the client is sending an incomplete message, but knowing
>only one character of the broken message isn't much to help you
>debug it.  With a tcpdump trace you could see the whole packet ...
>

Ok, it would look like I am going to have to face the
terror of doing this!  :-)

However, I have zero experience on this; if I may,
I'd like to ask for some pointers on this tcpdump;

checking the man pages, I came up with:

tcpdump -i local port 5432

I then connected using psql -h localhost, as a test;
it seems to be showing only timing and length information.
What could I do?

The other thing is that this is only a quick test to first
see that I can figure out the information that tcpdump
gives me...  But the connection is not being done through
TCP/IP, since the application and postmaster are running
on the same physical machine.

So, how can I listen on local (i.e., AF_UNIX) sockets?
(which I assume is the way my application and the postgres
server communicate)

Thanks!

Carlos
--

Re: Strange error (Socket command option unknown)

From

Doug McNaught

Date:

25 February 2003, 00:39:01

Carlos Moreno <moreno@mochima.com> writes:

> Tom Lane wrote:
>
> >It might be worth logging the whole session between client and
> >server with tcpdump --- that'd be tedious to wade through, but
> >it might give some evidence of what's happening.  It seems pretty
> >clear that the client is sending an incomplete message, but knowing
> >only one character of the broken message isn't much to help you
> >debug it.  With a tcpdump trace you could see the whole packet ...
> >
>
> Ok, it would look like I am going to have to face the
> terror of doing this!  :-)
>
> However, I have zero experience on this; if I may,
> I'd like to ask for some pointers on this tcpdump;
>
> checking the man pages, I came up with:
>
> tcpdump -i local port 5432
>
> I then connected using psql -h localhost, as a test;
> it seems to be showing only timing and length information.
> What could I do?

You'll need to use '-s 1500' (or whatever the MTU is) to get all of
each packet, and '-xX' to dump the whole packet in hex/ascii.

> The other thing is that this is only a quick test to first
> see that I can figure out the information that tcpdump
> gives me...  But the connection is not being done through
> TCP/IP, since the application and postmaster are running
> on the same physical machine.
>
> So, how can I listen on local (i.e., AF_UNIX) sockets?
> (which I assume is the way my application and the postgres
> server communicate)

No can do with tcpdump.  Either switch your app (temporarily) to use
TCP/IP or look into using strace(1) to log the read/write/send/rcv
calls made by the app (this may be tricky, and will slow down your
app).

The other thing you could do (which would work very well but take some
savvy) is reconfigure PG to use a different port (5433, say), then
write a little daemon that connects to
'/var/run/postgresql/.s.PGSQL.5433' and listens on 's.PGSQL.5432',
passing data both ways and logging it.  This would be a great
diagnostic tool but takes some sockets knowledge to write (not
anything super-arcane, though).  Switching to TCP temporarily is
probably less work.  :)

Hope this helps!

-Doug

Re: Strange error (Socket command option unknown)

From

Tom Lane

Date:

25 February 2003, 02:36:38

Carlos Moreno <moreno@mochima.com> writes:
> checking the man pages, I came up with:
> tcpdump -i local port 5432
> I then connected using psql -h localhost, as a test;
> it seems to be showing only timing and length information.

I get useful results with

/usr/sbin/tcpdump -x -X -s 0 -i lo port 5432

You do need to use "-h localhost" when connecting with psql.

            regards, tom lane

Re: Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

25 February 2003, 04:42:36

Doug McNaught wrote:

>
>No can do with tcpdump.  Either switch your app (temporarily) to use
>TCP/IP or look into using strace(1)
>

D'oh!!  Of course!!  How could I forget about the good old strace!!!
I used it soo much during the debugging period of my application,
and now I'm so ungrateful that I forgot about it?  :-)

Anyway, thanks for the advice!  I guess I'll try first with strace
(because I can do that right now -- the app is running as we
speak), and if it is too complicated, I would temporarily change
my app so that it connects through TCP/IP (to localhost) to be
able to tcpdump it.

Thanks!

Carlos
--

Re: Strange error (Socket command option unknown)

From

Carlos Moreno

Date:

25 February 2003, 04:43:26

Tom Lane wrote:

>
>I get useful results with
>
>/usr/sbin/tcpdump -x -X -s 0 -i lo port 5432
>
>You do need to use "-h localhost" when connecting with psql.
>

Yes, the -h localhost would be necessary;  notice that I want to
eavesdrop on my application  (the test with psql was just to
try and see the protocol just to make sure that I understand
the basics before jumping to some serious debugging on my
application).  But yes, the above translates to the fact that
I'll have to add host=localhost to the connection string in my
code.

Thanks,

Carlos
--

Re: Strange error (Socket command option unknown)

From

Steve Crawford

Date:

25 February 2003, 12:11:24

May I suggest looking at tcpflow? The syntax is similar to tcpdump but it
will create a file for each connection (actually two files, one for inbound
and one for outbound data). The files contain the data stream for that
connection - extremely useful for debugging.

The only thing I've found frustrating is that since one file is everything
from the client and the other is everything from the server you are left to
"synchronize" which parts of the inbound stream correspond to which parts of
the outbound.

If you want a few more bells/whistles and a GUI, try Ethereal.

Cheers,
Steve


On Monday 24 February 2003 2:31 pm, Carlos Moreno wrote:
> Tom Lane wrote:
> >You should look very hard at the part of your code that deals with
> >error reports from the server.
> >
> >It might be worth logging the whole session between client and
> >server with tcpdump --- that'd be tedious to wade through, but
> >it might give some evidence of what's happening.  It seems pretty
> >clear that the client is sending an incomplete message, but knowing
> >only one character of the broken message isn't much to help you
> >debug it.  With a tcpdump trace you could see the whole packet ...
>
> Ok, it would look like I am going to have to face the
> terror of doing this!  :-)
>
> However, I have zero experience on this; if I may,
> I'd like to ask for some pointers on this tcpdump;
>
> checking the man pages, I came up with:
>
> tcpdump -i local port 5432
>
> I then connected using psql -h localhost, as a test;
> it seems to be showing only timing and length information.
> What could I do?
>
> The other thing is that this is only a quick test to first
> see that I can figure out the information that tcpdump
> gives me...  But the connection is not being done through
> TCP/IP, since the application and postmaster are running
> on the same physical machine.
>
> So, how can I listen on local (i.e., AF_UNIX) sockets?
> (which I assume is the way my application and the postgres
> server communicate)
>
> Thanks!
>
> Carlos