Thread: libpq maligning postgres stability

libpq maligning postgres stability

From
Andres Freund
Date:
Hi,

We have several places in libpq where libpq says that a connection closing is
probably due to a server crash with a message like:

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing


I think this is rather unhelpful, at least these days. There are a lot of
reasons the connection could have failed, the server having terminated
abnormally is just one of them.

It's common to see this due to network issues, for example.  I've quite a few
times fielded worried questions of postgres users due to the message.

The reason I was looking at this message just now was a discussion of CI
failures on windows [1], which were likely caused by the known issue of
windows occasionally swallowing the server's last messages before the backend
exits (more detail e.g. in [2]).  It's easy to think that the failure was
wrongly caused by a postgres crash, due to the message, rather than due to not
receiving the expected FATAL.


And we don't even just add this message when the connection was actually
closed unexpectedly, we often do it even when we *did* get a FATAL, as in this
example:

psql -c 'select pg_terminate_backend(pg_backend_pid())'
FATAL:  57P01: terminating connection due to administrator command
LOCATION:  ProcessInterrupts, postgres.c:3351
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
connection to server was lost


I think this one is mostly a weakness in how libpq tracks connection state,
but it kind of shows the silliness of claiming postgres probably crashed.


Greetings,

Andres Freund


[1] Via Bilal:

    4 of the failures on the front page are related to Windows:
    https://cirrus-ci.com/build/4878370632105984
    https://cirrus-ci.com/build/5063665856020480
    https://cirrus-ci.com/build/4636858312818688
    https://cirrus-ci.com/build/6385762419081216

[2] https://postgr.es/m/CA%2BhUKGLR10ZqRCvdoRrkQusq75wF5%3DvEetRSs2_u1s%2BFAUosFQ%40mail.gmail.com



Re: libpq maligning postgres stability

From
Robert Haas
Date:
On Thu, Mar 27, 2025 at 11:19 AM Andres Freund <andres@anarazel.de> wrote:
> We have several places in libpq where libpq says that a connection closing is
> probably due to a server crash with a message like:
>
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing
>
> I think this is rather unhelpful, at least these days. There are a lot of
> reasons the connection could have failed, the server having terminated
> abnormally is just one of them.
>
> It's common to see this due to network issues, for example.  I've quite a few
> times fielded worried questions of postgres users due to the message.

Yeah, I agree. I used to think this hint was helpful, but it's gotten
less helpful as the years have passed, because the server is more
stable these days. Another thing that can cause this (as discussed in
Discord) is that the individual backend process can have died, but not
the server as a whole. In that case, the hint is only accurate if you
mean "server" to read your individual server process.

I wonder if, in addition to removing the hint, we could also consider
rewording the message. For example, a slight rewording to "server
connection closed unexpectedly" would avoid implying that it was the
server that took action, which is correct, because it could be a
firewall in between the machines or even security software on the
client side.  Maybe there is some more dramatic rewording that is even
better, but there's probably some value in keeping it similar to what
people are used to seeing.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: libpq maligning postgres stability

From
Christoph Berg
Date:
Re: Robert Haas
> I wonder if, in addition to removing the hint, we could also consider
> rewording the message. For example, a slight rewording to "server
> connection closed unexpectedly" would avoid implying that it was the

There is a lot of software doing string-parsing of this part of the
message, so it might be advisable to leave the first line alone.


https://sources.debian.org/src/php-laravel-framework/10.48.25+dfsg-2/src/Illuminate/Database/DetectsLostConnections.php/?hl=28#L28
https://sources.debian.org/src/python-taskflow/5.9.1-4/taskflow/persistence/backends/impl_sqlalchemy.py/?hl=87#L87
https://sources.debian.org/src/gnucash/1:5.10-0.1/libgnucash/backend/dbi/gnc-backend-dbi.cpp/?hl=798#L798
https://sources.debian.org/src/pgbouncer/1.24.0-3/test/test_misc.py/?hl=301#L301
https://sources.debian.org/src/icingaweb2-module-reporting/1.0.2-2/library/Reporting/RetryConnection.php/?hl=23#L23
https://sources.debian.org/src/storm/1.0-1/storm/databases/postgres.py/?hl=353#L353
https://sources.debian.org/src/timescaledb/2.19.0+dfsg-1/test/expected/loader-tsl.out/?hl=473#L473
https://sources.debian.org/src/odoo/18.0.0+dfsg-2/addons/web/tests/test_db_manager.py/?hl=277#L277

https://codesearch.debian.net/search?q=server+closed+the+connection+unexpectedly&literal=1

(There might be room for asking why this string parsing is being done,
is libpq missing "connection lost" detection vs. other errors?)

The remaining message lines are admittedly very pessimistic about
PostgreSQL's stability and should mention networking issues first.

Christoph