Thread: Postgres connection errors

Postgres connection errors

From
Tim Uckun
Date:
Hello.

I have lots of ruby daemons running connected to postgres. Some of
them start getting connection errors after about a day or two of
running. The odd thing is that they don't all get the same error.

Some get this error:  PGError: lost synchronization with server: got
message type "T"
Others get this          PGError: lost synchronization with server:
got message type "e"
And sometimes this   PGError: lost synchronization with server: got
message type ""


What is postgres trying to tell me here?  This error is most likely
coming out of libpq I would think.

Thanks.

Re: Postgres connection errors

From
Tom Lane
Date:
Tim Uckun <timuckun@gmail.com> writes:
> I have lots of ruby daemons running connected to postgres. Some of
> them start getting connection errors after about a day or two of
> running. The odd thing is that they don't all get the same error.

> Some get this error:  PGError: lost synchronization with server: got
> message type "T"
> Others get this          PGError: lost synchronization with server:
> got message type "e"
> And sometimes this   PGError: lost synchronization with server: got
> message type ""

> What is postgres trying to tell me here?

Most of the cases we've seen like that have been because multiple
threads in the client application were trying to use the same PGconn
connection object concurrently.  There's no cross-thread synchronization
built into libpq, so you have to provide the interlocks yourself if
there's any possibility of multiple threads touching the same PGconn
concurrently.  And it will not support more than one query at a time
in any case.

But having said that ... usually apps that have made this type of
mistake start falling over almost immediately.  Maybe you have a case
where it's mostly interlocked correctly, and you just missed one
infrequent code path?

            regards, tom lane

Re: Postgres connection errors

From
Tim Uckun
Date:
>
> Most of the cases we've seen like that have been because multiple
> threads in the client application were trying to use the same PGconn
> connection object concurrently.  There's no cross-thread synchronization
> built into libpq, so you have to provide the interlocks yourself if
> there's any possibility of multiple threads touching the same PGconn
> concurrently.  And it will not support more than one query at a time
> in any case.


These are not threaded daemons but this does give me some sort of a
clue to work on. I noticed that there is a call to clear stale
connections which might be the culprit because in the case of these
workers there is only one connection.