RE: [Bus error] huge_pages default value (try) not fall back - Mailing list pgsql-bugs

From Fan Liu
Subject RE: [Bus error] huge_pages default value (try) not fall back
Date
Msg-id HE1PR0701MB25698FF3F18D794F7C79457B9E110@HE1PR0701MB2569.eurprd07.prod.outlook.com
Whole thread Raw
In response to Re: [Bus error] huge_pages default value (try) not fall back  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: [Bus error] huge_pages default value (try) not fall back
List pgsql-bugs
-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: 2020年2月18日 17:33
To: Fan Liu <fan.liu@ericsson.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: [Bus error] huge_pages default value (try) not fall back

> On Tue, Feb 18, 2020 at 07:52:50AM +0000, Fan Liu wrote:
> Hi,
>
> We have seen a bus error when running postgresql in container (where on K8s). According current finding, there is bug
onK8s, they are working on it. 
> But we also want to know why huge_pages default value(try) didn't fall back.
>
> K8s BUG
> https://protect2.fireeye.com/v1/url?k=dbfabaf1-872eb600-dbfafa6a-86468
> 5b2085c-354ea5332684eaef&q=1&e=4521865a-6ad9-42a9-b74a-2b5462a7c73b&u=
> https%3A%2F%2Fgithub.com%2Fkubernetes%2Fkubernetes%2Fissues%2F71233
>
> Problem quick summary:
> When hugepage not working, initdb produce bus error.

Thanks for reporting!

This one is fun. If I understand everything correctly, Postgres will fall back to non huge pages if it fails to
allocatesome. But in this case kernel actually allocates everything without problems (there are some available huge
pageson a node after all), and return SIGBUS only when a first page fault within this cgroup happened, see the docs
[1]:

    The HugeTLB controller allows to limit the HugeTLB usage per control
    group and enforces the controller limit during page fault. Since
    HugeTLB doesn't support page reclaim, enforcing the limit at page
    fault time implies that, the application will get SIGBUS signal if
    it tries to access HugeTLB pages beyond its limit. This requires the
    application to know beforehand how much HugeTLB pages it would
    require for its use.

Unfortunately I'm not sure what would be the best solution in this situation.

[1]: https://www.kernel.org/doc/html/latest/_sources/admin-guide/cgroup-v1/hugetlb.rst.txt


---------------------------------------------------------

Hi Dmitry,
Thank you for the explanation.

In the K8s BUG https://protect2.fireeye.com/v1/url?k=dbfabaf1-872eb600-dbfafa6a-86468, there is someone proposed a
workaround. 

"Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before
initdbwas ran (this is what I did)." 

I am working on this workaround , but has not really tested yet.  So, do you think this could avoid this issue?  Or do
yousee any side impact for this workaround?  

BRs,
Fan Liu



BRs,
Fan Liu
ADP Document Database PG



pgsql-bugs by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: [Bus error] huge_pages default value (try) not fall back
Next
From: Tom Lane
Date:
Subject: Re: DB running out of memory issues after upgrade