Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date
Msg-id 20230121232922.juo7t3fhaso7qh3s@awork3.anarazel.de
Whole thread Raw
In response to Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
Hi,

On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> On 1/20/23 23:48, PG Bug reporting form wrote:
> > In these cases, the initdb phase will attempt to allocate huge pages that
> > are available in the OS, but it will be denied access by Kubernetes and
> > fail.
> 
> Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
> with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
> not available, or what? Because that's the only explanation I can see,
> looking at the code.

Yea, that's what I was wondering about as well.


> Or it just does not realize there are no hugepages, returns something
> and then crashes with SIGBUS later when trying to access it?

I assume that that's the case. There's references to bus errors in a bunch of
the linked issues. E.g.
https://github.com/CrunchyData/postgres-operator/issues/413

selecting default max_connections ... sh: line 1:    60 Bus error               (core dumped)
"/usr/pgsql-10/bin/postgres"--boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c
dynamic_shared_memory_type=none< "/dev/null" > "/dev/null" 2>&1
 

It's possible that the problem would go away if we used MAP_POPULATE for the
allocation.

I'd guess that this is annoying cgroups stuff :(


> I doubt we want to just go straight to changing the default value for
> everyone. IMHO if the "try" logic is somehow broken, we should fix the
> try logic, not mess with the defaults.

Agreed. But we could disable huge pages explicitly inside initdb - there's
really no point in using it there...

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Next
From: Tom Lane
Date:
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes