Thread: [Bus error] huge_pages default value (try) not fall back

[Bus error] huge_pages default value (try) not fall back

From

Fan Liu

Date:

18 February 2020, 07:52:50

Hi,

We have seen a bus error when running postgresql in container (where on K8s). According current finding, there is bug on K8s, they are working on it.

But we also want to know why huge_pages default value(try) didn’t fall back.

K8s BUG https://github.com/kubernetes/kubernetes/issues/71233

Problem quick summary:

When hugepage not working, initdb produce bus error.

Logs:

2020-02-17 06:33:21,606 INFO: trying to bootstrap a new cluster

2020-02-17 06:33:21,610 INFO: pg_ctl args: ('-o', '--auth-host=md5 --auth-local=trust --encoding=UTF8 --locale=en_US.UTF-8 --data-checksums --username=postgres --pwfile=/tmp/tmpcdHEH3'), {}

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".

The default text search configuration will be set to "english".

Data page checksums are enabled.

fixing permissions on existing directory /var/lib/postgresql/data/pgdata ... ok

creating subdirectories ... ok

sh: line 1: 100 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 102 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 104 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 106 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 108 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

selecting default max_connections ... 20

sh: line 1: 110 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=16384 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 112 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=8192 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 114 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=4096 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 116 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3584 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 118 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3072 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 120 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2560 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 122 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2048 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 124 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1536 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 126 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 128 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=900 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 130 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=800 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 132 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=700 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 134 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=600 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 136 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 138 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 140 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 142 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 144 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

sh: line 1: 146 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=50 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

selecting default shared_buffers ... 400kB

selecting default timezone ... UTC

selecting dynamic shared memory implementation ... posix

creating configuration files ... ok

child process was terminated by signal 7: Bus error

initdb: removing contents of data directory "/var/lib/postgresql/data/pgdata"

pg_ctl: database system initialization failed

running bootstrap script ... 2020-02-17 06:33:22,254 INFO: removing initialize key after failed attempt to bootstrap the cluster

------------ end of log -------------

Hugepage:

Output from "kubectl describe node"
========================
Capacity:
cpu: 56
ephemeral-storage: 365912640Ki
hugepages-1Gi: 16Gi
hugepages-2Mi: 0
memory: 131922340Ki
pods: 110
Allocatable:
cpu: 55900m
ephemeral-storage: 337225088466
hugepages-1Gi: 16Gi
hugepages-2Mi: 0
memory: 114792724Ki
pods: 110

========================
Grub command line:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset vga=normal default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M hugepages=0"

========================

BRs,

Fan Liu

ADP Document Database PG

Re: [Bus error] huge_pages default value (try) not fall back

From

Dmitry Dolgov

Date:

18 February 2020, 09:32:40

> On Tue, Feb 18, 2020 at 07:52:50AM +0000, Fan Liu wrote:
> Hi,
>
> We have seen a bus error when running postgresql in container (where on K8s). According current finding, there is bug
onK8s, they are working on it.

> But we also want to know why huge_pages default value(try) didn't fall back.
>
> K8s BUG https://github.com/kubernetes/kubernetes/issues/71233
>
> Problem quick summary:
> When hugepage not working, initdb produce bus error.

Thanks for reporting!

This one is fun. If I understand everything correctly, Postgres will
fall back to non huge pages if it fails to allocate some. But in this
case kernel actually allocates everything without problems (there are
some available huge pages on a node after all), and return SIGBUS only
when a first page fault within this cgroup happened, see the docs [1]:

    The HugeTLB controller allows to limit the HugeTLB usage per control
    group and enforces the controller limit during page fault. Since
    HugeTLB doesn't support page reclaim, enforcing the limit at page
    fault time implies that, the application will get SIGBUS signal if
    it tries to access HugeTLB pages beyond its limit. This requires the
    application to know beforehand how much HugeTLB pages it would
    require for its use.

Unfortunately I'm not sure what would be the best solution in this
situation.

[1]: https://www.kernel.org/doc/html/latest/_sources/admin-guide/cgroup-v1/hugetlb.rst.txt

RE: [Bus error] huge_pages default value (try) not fall back

From

Fan Liu

Date:

18 February 2020, 12:31:51

-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: 2020年2月18日 17:33
To: Fan Liu <fan.liu@ericsson.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: [Bus error] huge_pages default value (try) not fall back

> On Tue, Feb 18, 2020 at 07:52:50AM +0000, Fan Liu wrote:
> Hi,
>
> We have seen a bus error when running postgresql in container (where on K8s). According current finding, there is bug
onK8s, they are working on it. 
> But we also want to know why huge_pages default value(try) didn't fall back.
>
> K8s BUG
> https://protect2.fireeye.com/v1/url?k=dbfabaf1-872eb600-dbfafa6a-86468
> 5b2085c-354ea5332684eaef&q=1&e=4521865a-6ad9-42a9-b74a-2b5462a7c73b&u=
> https%3A%2F%2Fgithub.com%2Fkubernetes%2Fkubernetes%2Fissues%2F71233
>
> Problem quick summary:
> When hugepage not working, initdb produce bus error.

Thanks for reporting!

This one is fun. If I understand everything correctly, Postgres will fall back to non huge pages if it fails to
allocatesome. But in this case kernel actually allocates everything without problems (there are some available huge
pageson a node after all), and return SIGBUS only when a first page fault within this cgroup happened, see the docs
[1]:

    The HugeTLB controller allows to limit the HugeTLB usage per control
    group and enforces the controller limit during page fault. Since
    HugeTLB doesn't support page reclaim, enforcing the limit at page
    fault time implies that, the application will get SIGBUS signal if
    it tries to access HugeTLB pages beyond its limit. This requires the
    application to know beforehand how much HugeTLB pages it would
    require for its use.

Unfortunately I'm not sure what would be the best solution in this situation.

[1]: https://www.kernel.org/doc/html/latest/_sources/admin-guide/cgroup-v1/hugetlb.rst.txt

---------------------------------------------------------

Hi Dmitry,
Thank you for the explanation.

In the K8s BUG https://protect2.fireeye.com/v1/url?k=dbfabaf1-872eb600-dbfafa6a-86468, there is someone proposed a
workaround. 

"Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before
initdbwas ran (this is what I did)." 

I am working on this workaround , but has not really tested yet.  So, do you think this could avoid this issue?  Or do
yousee any side impact for this workaround?  

BRs,
Fan Liu

BRs,
Fan Liu
ADP Document Database PG

Re: [Bus error] huge_pages default value (try) not fall back

From

Dmitry Dolgov

Date:

19 February 2020, 09:35:50

> On Tue, Feb 18, 2020 at 12:31:51PM +0000, Fan Liu wrote:
>
> "Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before
initdbwas ran (this is what I did)."

>
> I am working on this workaround , but has not really tested yet.  So, do you think this could avoid this issue?  Or
doyou see any side impact for this workaround?

If you don't necessarily need to use huge pages, then yes, I guess it
should work. In case if initdb tries to read config from some other
location, you can always point it to whatever you need via -L option.

RE: [Bus error] huge_pages default value (try) not fall back

From

Fan Liu

Date:

21 February 2020, 02:19:55

-----Original Message-----
From: Fan Liu
Sent: 2020年2月21日 10:15
To: Dmitry Dolgov <9erthalion6@gmail.com>; pgsql-bugs@lists.postgresql.org
Subject: RE: [Bus error] huge_pages default value (try) not fall back


-----Original Message-----
>From: Dmitry Dolgov <9erthalion6@gmail.com>
>Sent: 2020年2月19日 17:36
>To: Fan Liu <fan.liu@ericsson.com>
>Cc: pgsql-bugs@lists.postgresql.org
>Subject: Re: [Bus error] huge_pages default value (try) not fall back
>
>> On Tue, Feb 18, 2020 at 12:31:51PM +0000, Fan Liu wrote:
>>
>> "Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before
initdbwas ran (this is what I did)." 
>>
>> I am working on this workaround , but has not really tested yet.  So, do you think this could avoid this issue?  Or
doyou see any side impact for this workaround? 
>
>If you don't necessarily need to use huge pages, then yes, I guess it should work. In case if initdb tries to read
configfrom some other location, you can always point it to whatever you need via -L option. 

-----------------------------------

Hi Dmitry,

I had try the workaround. The result is that there is still bus error, but postgresql did come up.

I am not that understand why this could happen.

Attached core dump file, could you take a look?

$ file core
core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/lib/postgresql10/bin/postgres --boot
-x0-F -c max_connections=20 -c share', real uid: 26, effective uid: 26, real gid: 26, effective gid: 26, execfn:
'/usr/lib/postgresql10/bin/postgres',platform: 'x86_64' 

BRs,
Fan Liu

Attachment

core

Re: [Bus error] huge_pages default value (try) not fall back

From

Dmitry Dolgov

Date:

21 February 2020, 18:58:03

> On Fri, Feb 21, 2020, 3:19 AM Fan Liu <fan.liu@ericsson.com> wrote:

> Attached core dump file, could you take a look?

I can take a look on Monday, but at the same time if you have issues on initdb stage you try to run it with -d option and check out debugging output, should be helpful.

RE: [Bus error] huge_pages default value (try) not fall back

From

Fan Liu

Date:

24 February 2020, 01:32:26

>From: Dmitry Dolgov <9erthalion6@gmail.com>
>Sent: 2020年2月22日 2:58
>To: Fan Liu <fan.liu@ericsson.com>
>Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
>Subject: Re: [Bus error] huge_pages default value (try) not fall back

>> On Fri, Feb 21, 2020, 3:19 AM Fan Liu <fan.liu@ericsson.com> wrote:

>> Attached core dump file, could you take a look?

>I can take a look on Monday, but at the same time if you have issues on initdb stage you try to run it with -d option and check out debugging output, should be helpful.

Hi Dmitry,

Appreciate for your support.

I will working on a new package and ask my collector for validation and collect logs.

BRs,

Fan Liu

Re: [Bus error] huge_pages default value (try) not fall back

From

Dmitry Dolgov

Date:

24 February 2020, 10:38:37

>> On Fri, Feb 21, 2020, 3:19 AM Fan Liu <fan.liu@ericsson.com<mailto:fan.liu@ericsson.com>> wrote:
>>
>> Attached core dump file, could you take a look?
>
> I can take a look on Monday, but at the same time if you have issues
> on initdb stage you try to run it with -d option and check out
> debugging output, should be helpful.

Unfortunately, I wasn't able to get a meaningful stack trace from this
dump, most likely due to different versions (I hoped that the latest
pgdg package with 10 for bionic would fit). But you can also try to post
it following this instructions [1].

[1]: https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

RE: [Bus error] huge_pages default value (try) not fall back

From

Fan Liu

Date:

25 February 2020, 01:57:25

I had created a new package for my customer with -d for initdb, but they said they can accept current workaround.
As I don't have an NODE has hugepage on, I will not able to collect the logs.

I'd like to thank you again for the supporting and troubleshooting. I think we may close this ticket.

BRs,
Fan Liu
ADP Document Database PG

-----Original Message-----
From: Dmitry Dolgov <9erthalion6@gmail.com>
Sent: 2020年2月24日 18:39
To: Fan Liu <fan.liu@ericsson.com>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: [Bus error] huge_pages default value (try) not fall back

>> On Fri, Feb 21, 2020, 3:19 AM Fan Liu <fan.liu@ericsson.com<mailto:fan.liu@ericsson.com>> wrote:
>>
>> Attached core dump file, could you take a look?
>
> I can take a look on Monday, but at the same time if you have issues
> on initdb stage you try to run it with -d option and check out
> debugging output, should be helpful.

Unfortunately, I wasn't able to get a meaningful stack trace from this dump, most likely due to different versions (I
hopedthat the latest pgdg package with 10 for bionic would fit). But you can also try to post it following this
instructions[1]. 

[1]: https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

Re: [Bus error] huge_pages default value (try) not fall back

From

Odin Ugedal

Date:

09 June 2020, 15:22:58

Hi,

I stumbled upon this issue when working with the related issue in
Kubernetes that was referenced a few mails behind. So from what I
understand, it looks like this issue is/may be a result of how hugetlb
cgroup is enforcing the "limit_in_bytes" limit for huge pages. A
process should theoretically don't segfault like this under normal
circumstances when using memory received from a successful mmap.  The
value set to "limit_in_bytes" is only enforced during page allocation,
and _not_ when mapping pages using mmap. This results in a successful
mmap for -n- huge pages as long as the system has -n- free hugepages,
even though the size is bigger than "limit_in_bytes". The process then
reserves the huge page memory, and makes it inaccessible to other
processes.

The real issue is when postgres tries to write to the memory it
received from mmap, and the kernel tries to allocate the reserved huge
page memory. Since it is not allowed to do so by the cgroup, the
process segfaults.

This issue has been fixed in Linux this patch
https://lkml.org/lkml/2020/2/3/1153, that adds a new element of
control to the cgroup that will fix this issue. There are however no
container runtimes that use it yet, and only 5.7+ (afaik.) kernels
support it, but the progress can be tracked here:
https://github.com/opencontainers/runtime-spec/issues/1050. The fix
for the upstream Kubernetes issue
(https://github.com/opencontainers/runtime-spec/issues/1050) that made
kubernetes set wrong value to the top level "limit_in_bytes" when the
pre-allocated page count increased after kubernetes (kubelet) startup,
will hopefully land in Kubernetes 1.19 (or 1.20). Fingers crossed!

Hopefully this makes some sense, and gives some insights into the issue...

Best regards,
Odin Ugedal

RE: [Bus error] huge_pages default value (try) not fall back

From

Fan Liu

Date:

10 June 2020, 01:28:57

Thank you so much for the information. 


BRs,
Fan Liu
ADP Document Database PG

>>-----Original Message-----
>>From: Odin Ugedal <odin@ugedal.com>
>>Sent: 2020年6月9日 23:23
>>To: Fan Liu <fan.liu@ericsson.com>
>>Cc: Dmitry Dolgov <9erthalion6@gmail.com>; PostgreSQL mailing lists
>><pgsql-bugs@lists.postgresql.org>
>>Subject: Re: [Bus error] huge_pages default value (try) not fall back
>>
>>Hi,
>>
>>I stumbled upon this issue when working with the related issue in Kubernetes
>>that was referenced a few mails behind. So from what I understand, it looks
>>like this issue is/may be a result of how hugetlb cgroup is enforcing the
>>"limit_in_bytes" limit for huge pages. A process should theoretically don't
>>segfault like this under normal circumstances when using memory received from
>>a successful mmap.  The value set to "limit_in_bytes" is only enforced during
>>page allocation, and _not_ when mapping pages using mmap. This results in a
>>successful mmap for -n- huge pages as long as the system has -n- free hugepages,
>>even though the size is bigger than "limit_in_bytes". The process then reserves
>>the huge page memory, and makes it inaccessible to other processes.
>>
>>The real issue is when postgres tries to write to the memory it received from
>>mmap, and the kernel tries to allocate the reserved huge page memory. Since
>>it is not allowed to do so by the cgroup, the process segfaults.
>>
>>This issue has been fixed in Linux this patch
>>https://protect2.fireeye.com/v1/url?k=41942750-1f34c7c4-419467cb-86d2114ea
>>b2f-4c9655dbe24776b3&q=1&e=4467c237-1149-49f1-ab6c-bc0a3c31b0f3&u=https%3A
>>%2F%2Flkml.org%2Flkml%2F2020%2F2%2F3%2F1153, that adds a new element of
>>control to the cgroup that will fix this issue. There are however no container
>>runtimes that use it yet, and only 5.7+ (afaik.) kernels support it, but the
>>progress can be tracked here:
>>https://protect2.fireeye.com/v1/url?k=8e01d669-d0a136fd-8e0196f2-86d2114ea
>>b2f-dd1ff954a0920218&q=1&e=4467c237-1149-49f1-ab6c-bc0a3c31b0f3&u=https%3A
>>%2F%2Fgithub.com%2Fopencontainers%2Fruntime-spec%2Fissues%2F1050. The fix
>>for the upstream Kubernetes issue
>>(https://protect2.fireeye.com/v1/url?k=5d33f1ab-0393113f-5d33b130-86d2114e
>>ab2f-38b5ca047e5124c3&q=1&e=4467c237-1149-49f1-ab6c-bc0a3c31b0f3&u=https%3
>>A%2F%2Fgithub.com%2Fopencontainers%2Fruntime-spec%2Fissues%2F1050) that made
>>kubernetes set wrong value to the top level "limit_in_bytes" when the
>>pre-allocated page count increased after kubernetes (kubelet) startup, will
>>hopefully land in Kubernetes 1.19 (or 1.20). Fingers crossed!
>>
>>Hopefully this makes some sense, and gives some insights into the issue...
>>
>>Best regards,
>>Odin Ugedal