Thread: libpq maligning postgres stability
Hi, We have several places in libpq where libpq says that a connection closing is probably due to a server crash with a message like: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing I think this is rather unhelpful, at least these days. There are a lot of reasons the connection could have failed, the server having terminated abnormally is just one of them. It's common to see this due to network issues, for example. I've quite a few times fielded worried questions of postgres users due to the message. The reason I was looking at this message just now was a discussion of CI failures on windows [1], which were likely caused by the known issue of windows occasionally swallowing the server's last messages before the backend exits (more detail e.g. in [2]). It's easy to think that the failure was wrongly caused by a postgres crash, due to the message, rather than due to not receiving the expected FATAL. And we don't even just add this message when the connection was actually closed unexpectedly, we often do it even when we *did* get a FATAL, as in this example: psql -c 'select pg_terminate_backend(pg_backend_pid())' FATAL: 57P01: terminating connection due to administrator command LOCATION: ProcessInterrupts, postgres.c:3351 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost I think this one is mostly a weakness in how libpq tracks connection state, but it kind of shows the silliness of claiming postgres probably crashed. Greetings, Andres Freund [1] Via Bilal: 4 of the failures on the front page are related to Windows: https://cirrus-ci.com/build/4878370632105984 https://cirrus-ci.com/build/5063665856020480 https://cirrus-ci.com/build/4636858312818688 https://cirrus-ci.com/build/6385762419081216 [2] https://postgr.es/m/CA%2BhUKGLR10ZqRCvdoRrkQusq75wF5%3DvEetRSs2_u1s%2BFAUosFQ%40mail.gmail.com
On Thu, Mar 27, 2025 at 11:19 AM Andres Freund <andres@anarazel.de> wrote: > We have several places in libpq where libpq says that a connection closing is > probably due to a server crash with a message like: > > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing > > I think this is rather unhelpful, at least these days. There are a lot of > reasons the connection could have failed, the server having terminated > abnormally is just one of them. > > It's common to see this due to network issues, for example. I've quite a few > times fielded worried questions of postgres users due to the message. Yeah, I agree. I used to think this hint was helpful, but it's gotten less helpful as the years have passed, because the server is more stable these days. Another thing that can cause this (as discussed in Discord) is that the individual backend process can have died, but not the server as a whole. In that case, the hint is only accurate if you mean "server" to read your individual server process. I wonder if, in addition to removing the hint, we could also consider rewording the message. For example, a slight rewording to "server connection closed unexpectedly" would avoid implying that it was the server that took action, which is correct, because it could be a firewall in between the machines or even security software on the client side. Maybe there is some more dramatic rewording that is even better, but there's probably some value in keeping it similar to what people are used to seeing. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Robert Haas > I wonder if, in addition to removing the hint, we could also consider > rewording the message. For example, a slight rewording to "server > connection closed unexpectedly" would avoid implying that it was the There is a lot of software doing string-parsing of this part of the message, so it might be advisable to leave the first line alone. https://sources.debian.org/src/php-laravel-framework/10.48.25+dfsg-2/src/Illuminate/Database/DetectsLostConnections.php/?hl=28#L28 https://sources.debian.org/src/python-taskflow/5.9.1-4/taskflow/persistence/backends/impl_sqlalchemy.py/?hl=87#L87 https://sources.debian.org/src/gnucash/1:5.10-0.1/libgnucash/backend/dbi/gnc-backend-dbi.cpp/?hl=798#L798 https://sources.debian.org/src/pgbouncer/1.24.0-3/test/test_misc.py/?hl=301#L301 https://sources.debian.org/src/icingaweb2-module-reporting/1.0.2-2/library/Reporting/RetryConnection.php/?hl=23#L23 https://sources.debian.org/src/storm/1.0-1/storm/databases/postgres.py/?hl=353#L353 https://sources.debian.org/src/timescaledb/2.19.0+dfsg-1/test/expected/loader-tsl.out/?hl=473#L473 https://sources.debian.org/src/odoo/18.0.0+dfsg-2/addons/web/tests/test_db_manager.py/?hl=277#L277 https://codesearch.debian.net/search?q=server+closed+the+connection+unexpectedly&literal=1 (There might be room for asking why this string parsing is being done, is libpq missing "connection lost" detection vs. other errors?) The remaining message lines are admittedly very pessimistic about PostgreSQL's stability and should mention networking issues first. Christoph