Thread: BUG #16264: Server closed the connection unexpectedly

BUG #16264: Server closed the connection unexpectedly

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16264
Logged by:          Robin Duquette
Email address:      research@pyxidr.com
PostgreSQL version: 12.2
Operating system:   macOS version 10.15.3
Description:

I just installed postgresql 12.2 (from 12.1) and I get many "server closed
the connection unexpectedly" when executing queries in pgAdmin or using
psycopg2 (in Python). I didn't get this issue with 12.1.


Re: BUG #16264: Server closed the connection unexpectedly

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> I just installed postgresql 12.2 (from 12.1) and I get many "server closed
> the connection unexpectedly" when executing queries in pgAdmin or using
> psycopg2 (in Python). I didn't get this issue with 12.1.

This might well be a bug, but there's nothing we can do about it with
no details.  There's some advice about submitting useful bug reports
here:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

            regards, tom lane



Re: BUG #16264: Server closed the connection unexpectedly

From
Robin Duquette
Date:
Thanks Tom for your prompt response and my apologies for the lack of details (indeed, I didn't give you much to help you!). These are what I can tell you, let me know if you have any questions.
  1. I'm running macOS 10.15.3
  2. Everything was working fine with PostgreSQL version 12.1 and since I have installed 12.2 on two machines (with identical os), many queries induce server process to get terminated by signal 9 (according to log). This is the case on the two machines that I'm running 12.2 Unfortunately, the log doesn't say more than that (see below an extract).
  3. I have used the installer certified by EnterpriseDB to install PostgreSQL
  4. I didn't make any changes to the postgresql.conf file
  5. The issue shows up with pgAdmin 4 and psycopg2 (Python 3.7).
This an extract of the log file:
2020-02-19 18:24:57.449 EST [175] LOG:  server process (PID 3032) was terminated by signal 9: Killed: 9
2020-02-19 18:24:57.449 EST [175] DETAIL:  Failed process was running: set timezone to 'Europe/Paris';
        select
                extract(year from fd.datetime_starting) as year,
                rfd.market,
                rfd.fundamental_subtype as prod_type,
                sum(fd.value) as installed_capacity
        from
                edb.tbl_ref_fundamental_data rfd,
                cwe_uk.tbl_hist_datetime_fundamental_data fd
        where
                rfd.id = fd.data_id and
                rfd.value_type = 'Actual' and
                rfd.fundamental_type = 'Installed capacity' and
                rfd.commodity = 'Power' and
                rfd.market in ('AT', 'BE', 'FR', 'DE', 'NL', 'CH', 'UK') and
                rfd.term = 'Annual' and
                fd.datetime_starting::date >= '2000-01-01' and
                fd.datetime_starting::date <= '2020-02-18'
        group by
                extract(year from fd.datetime_starting),
                rfd.market,
                rfd.fundamental_subtype
        order by
                extract(year from fd.datetime_starting),
                rfd.market,
                rfd.fundamental_subtype;
2020-02-19 18:24:57.449 EST [175] LOG:  terminating any other active server processes
2020-02-19 18:24:57.449 EST [2986] WARNING:  terminating connection because of crash of another server process
2020-02-19 18:24:57.449 EST [2986] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Best regards,
Pyxidr

ROBIN DUQUETTE
CEO
E: robin.duquette@pyxidr.com
C: +1 (514) 262-0418
Pyxidr Website Pyxidr Telegram Robin Duquette LinkedIn Robin Duquette Medium



On Wed, Feb 19, 2020 at 11:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> I just installed postgresql 12.2 (from 12.1) and I get many "server closed
> the connection unexpectedly" when executing queries in pgAdmin or using
> psycopg2 (in Python). I didn't get this issue with 12.1.

This might well be a bug, but there's nothing we can do about it with
no details.  There's some advice about submitting useful bug reports
here:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

                        regards, tom lane

Re: BUG #16264: Server closed the connection unexpectedly

From
Tom Lane
Date:
Robin Duquette <robin.duquette@pyxidr.com> writes:
>    1. I'm running macOS 10.15.3
>    2. Everything was working fine with PostgreSQL version 12.1 and since
>    I have installed 12.2 on two machines (with identical os), many queries
>    induce server process to get terminated by signal 9 (according to log).
>    This is the case on the two machines that I'm running 12.2 Unfortunately,
>    the log doesn't say more than that (see below an extract).

Huh, interesting.  Signal 9 (SIGKILL) is an externally-imposed process
termination, rather than an internal failure.  We've seen one similar
report recently:

https://www.postgresql.org/message-id/flat/CEF2C288-13E6-4727-81D0-0775F40F313B%40arcict.com

and as mentioned there, the most likely theory is that the backend process
is consuming an unreasonable amount of memory and the SIGKILL is coming
from a system-level out-of-memory defense mechanism.  I hadn't thought
that macOS did that, but it looks like I'm finding out differently.

That does not get us a whole lot closer to identifying the cause, though.
It's certainly believable that we introduced some kind of memory leak
between 12.1 and 12.2, but that's not enough info to find it.

First things first though.  Can you watch the system with "top" or
Activity Monitor and confirm or disprove that there's a memory
consumption issue before the SIGKILL?  We ought to be sure about
that before we go spending a lot of time.

If that does seem to be the case, launching the postmaster under a
restrictive ulimit (maybe "ulimit -v 1000000" or so) could be a
second step.  That ought to help reduce the problem from a SIGKILL
to a normal out-of-memory error, which not only would make things
a bit more stable for you, but it should allow the failing query
to dump a memory map to the postmaster's stderr, which would give
us a little more to go on about where the leak is.

In the end, though, I'm afraid we might have to ask you to produce
a reproducible test case of a query that consumes excessive memory.
These things can be very hard to identify without digging into it
with a debugger.

            regards, tom lane