Re: backends stuck in "startup" - Mailing list pgsql-general

From Tom Lane
Subject Re: backends stuck in "startup"
Date
Msg-id 5668.1511649959@sss.pgh.pa.us
Whole thread Raw
In response to Re: backends stuck in "startup"  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: backends stuck in "startup"  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-general
Justin Pryzby <pryzby@telsasoft.com> writes:
> We never had any issue during the ~2 years running PG96 on this VM, until
> upgrading Monday to PG10.1, and we've now hit it 5+ times.

> BTW this is a VM run on a hypervisor managed by our customer:
> DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012

> Linux TS-DB 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Actually ... I was focusing on the wrong part of that.  It's not
your hypervisor, it's your kernel.  Running four-year-old kernels
is seldom a great idea, and in this case, the one you're using
contains the well-reported missed-futex-wakeups bug:

https://bugs.centos.org/view.php?id=8371

While rebuilding PG so it doesn't use POSIX semaphores will dodge
that bug, I think a kernel update would be a far better idea.
There are lots of other known bugs in that version.

Relevant to our discussion, the fix involves inserting a memory
barrier into the kernel's futex call handling:

https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0
        regards, tom lane


pgsql-general by date:

Previous
From: nikhil raj
Date:
Subject: A particular database to move to other drive
Next
From: John R Pierce
Date:
Subject: Re: Roles and security