Thread: [Bus error] huge_pages default value (try) not fall back
Hi,
We have seen a bus error when running postgresql in container (where on K8s). According current finding, there is bug on K8s, they are working on it.
But we also want to know why huge_pages default value(try) didn’t fall back.
K8s BUG https://github.com/kubernetes/kubernetes/issues/71233
Problem quick summary:
When hugepage not working, initdb produce bus error.
Logs:
2020-02-17 06:33:21,606 INFO: trying to bootstrap a new cluster
2020-02-17 06:33:21,610 INFO: pg_ctl args: ('-o', '--auth-host=md5 --auth-local=trust --encoding=UTF8 --locale=en_US.UTF-8 --data-checksums --username=postgres --pwfile=/tmp/tmpcdHEH3'), {}
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".
Data page checksums are enabled.
fixing permissions on existing directory /var/lib/postgresql/data/pgdata ... ok
creating subdirectories ... ok
sh: line 1: 100 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 102 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 104 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 106 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 108 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
selecting default max_connections ... 20
sh: line 1: 110 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=16384 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 112 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=8192 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 114 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=4096 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 116 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3584 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 118 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3072 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 120 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2560 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 122 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2048 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 124 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1536 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 126 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 128 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=900 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 130 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=800 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 132 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=700 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 134 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=600 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 136 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 138 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 140 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 142 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 144 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 146 Bus error (core dumped) "/usr/lib/postgresql10/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=50 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
selecting default shared_buffers ... 400kB
selecting default timezone ... UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
child process was terminated by signal 7: Bus error
initdb: removing contents of data directory "/var/lib/postgresql/data/pgdata"
pg_ctl: database system initialization failed
running bootstrap script ... 2020-02-17 06:33:22,254 INFO: removing initialize key after failed attempt to bootstrap the cluster
------------ end of log -------------
Hugepage:
Output from "kubectl describe node"
========================
Capacity:
cpu: 56
ephemeral-storage: 365912640Ki
hugepages-1Gi: 16Gi
hugepages-2Mi: 0
memory: 131922340Ki
pods: 110
Allocatable:
cpu: 55900m
ephemeral-storage: 337225088466
hugepages-1Gi: 16Gi
hugepages-2Mi: 0
memory: 114792724Ki
pods: 110
========================
Grub command line:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset vga=normal default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M hugepages=0"
========================
BRs,
Fan Liu
ADP Document Database PG
> On Tue, Feb 18, 2020 at 07:52:50AM +0000, Fan Liu wrote: > Hi, > > We have seen a bus error when running postgresql in container (where on K8s). According current finding, there is bug onK8s, they are working on it. > But we also want to know why huge_pages default value(try) didn't fall back. > > K8s BUG https://github.com/kubernetes/kubernetes/issues/71233 > > Problem quick summary: > When hugepage not working, initdb produce bus error. Thanks for reporting! This one is fun. If I understand everything correctly, Postgres will fall back to non huge pages if it fails to allocate some. But in this case kernel actually allocates everything without problems (there are some available huge pages on a node after all), and return SIGBUS only when a first page fault within this cgroup happened, see the docs [1]: The HugeTLB controller allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. Unfortunately I'm not sure what would be the best solution in this situation. [1]: https://www.kernel.org/doc/html/latest/_sources/admin-guide/cgroup-v1/hugetlb.rst.txt
-----Original Message----- From: Dmitry Dolgov <9erthalion6@gmail.com> Sent: 2020年2月18日 17:33 To: Fan Liu <fan.liu@ericsson.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: [Bus error] huge_pages default value (try) not fall back > On Tue, Feb 18, 2020 at 07:52:50AM +0000, Fan Liu wrote: > Hi, > > We have seen a bus error when running postgresql in container (where on K8s). According current finding, there is bug onK8s, they are working on it. > But we also want to know why huge_pages default value(try) didn't fall back. > > K8s BUG > https://protect2.fireeye.com/v1/url?k=dbfabaf1-872eb600-dbfafa6a-86468 > 5b2085c-354ea5332684eaef&q=1&e=4521865a-6ad9-42a9-b74a-2b5462a7c73b&u= > https%3A%2F%2Fgithub.com%2Fkubernetes%2Fkubernetes%2Fissues%2F71233 > > Problem quick summary: > When hugepage not working, initdb produce bus error. Thanks for reporting! This one is fun. If I understand everything correctly, Postgres will fall back to non huge pages if it fails to allocatesome. But in this case kernel actually allocates everything without problems (there are some available huge pageson a node after all), and return SIGBUS only when a first page fault within this cgroup happened, see the docs [1]: The HugeTLB controller allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. Unfortunately I'm not sure what would be the best solution in this situation. [1]: https://www.kernel.org/doc/html/latest/_sources/admin-guide/cgroup-v1/hugetlb.rst.txt --------------------------------------------------------- Hi Dmitry, Thank you for the explanation. In the K8s BUG https://protect2.fireeye.com/v1/url?k=dbfabaf1-872eb600-dbfafa6a-86468, there is someone proposed a workaround. "Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdbwas ran (this is what I did)." I am working on this workaround , but has not really tested yet. So, do you think this could avoid this issue? Or do yousee any side impact for this workaround? BRs, Fan Liu BRs, Fan Liu ADP Document Database PG
> On Tue, Feb 18, 2020 at 12:31:51PM +0000, Fan Liu wrote: > > "Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdbwas ran (this is what I did)." > > I am working on this workaround , but has not really tested yet. So, do you think this could avoid this issue? Or doyou see any side impact for this workaround? If you don't necessarily need to use huge pages, then yes, I guess it should work. In case if initdb tries to read config from some other location, you can always point it to whatever you need via -L option.
-----Original Message----- From: Fan Liu Sent: 2020年2月21日 10:15 To: Dmitry Dolgov <9erthalion6@gmail.com>; pgsql-bugs@lists.postgresql.org Subject: RE: [Bus error] huge_pages default value (try) not fall back -----Original Message----- >From: Dmitry Dolgov <9erthalion6@gmail.com> >Sent: 2020年2月19日 17:36 >To: Fan Liu <fan.liu@ericsson.com> >Cc: pgsql-bugs@lists.postgresql.org >Subject: Re: [Bus error] huge_pages default value (try) not fall back > >> On Tue, Feb 18, 2020 at 12:31:51PM +0000, Fan Liu wrote: >> >> "Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdbwas ran (this is what I did)." >> >> I am working on this workaround , but has not really tested yet. So, do you think this could avoid this issue? Or doyou see any side impact for this workaround? > >If you don't necessarily need to use huge pages, then yes, I guess it should work. In case if initdb tries to read configfrom some other location, you can always point it to whatever you need via -L option. ----------------------------------- Hi Dmitry, I had try the workaround. The result is that there is still bus error, but postgresql did come up. I am not that understand why this could happen. Attached core dump file, could you take a look? $ file core core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/lib/postgresql10/bin/postgres --boot -x0-F -c max_connections=20 -c share', real uid: 26, effective uid: 26, real gid: 26, effective gid: 26, execfn: '/usr/lib/postgresql10/bin/postgres',platform: 'x86_64' BRs, Fan Liu
Attachment
>From: Dmitry Dolgov <9erthalion6@gmail.com>
>Sent: 2020年2月22日 2:58
>To: Fan Liu <fan.liu@ericsson.com>
>Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
>Subject: Re: [Bus error] huge_pages default value (try) not fall back
>
>> On Fri, Feb 21, 2020, 3:19 AM Fan Liu <fan.liu@ericsson.com> wrote:
>>
>> Attached core dump file, could you take a look?
>I can take a look on Monday, but at the same time if you have issues on initdb stage you try to run it with -d option and check out debugging output, should be helpful.
Hi Dmitry,
Appreciate for your support.
I will working on a new package and ask my collector for validation and collect logs.
BRs,
Fan Liu
>> On Fri, Feb 21, 2020, 3:19 AM Fan Liu <fan.liu@ericsson.com<mailto:fan.liu@ericsson.com>> wrote: >> >> Attached core dump file, could you take a look? > > I can take a look on Monday, but at the same time if you have issues > on initdb stage you try to run it with -d option and check out > debugging output, should be helpful. Unfortunately, I wasn't able to get a meaningful stack trace from this dump, most likely due to different versions (I hoped that the latest pgdg package with 10 for bionic would fit). But you can also try to post it following this instructions [1]. [1]: https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
I had created a new package for my customer with -d for initdb, but they said they can accept current workaround. As I don't have an NODE has hugepage on, I will not able to collect the logs. I'd like to thank you again for the supporting and troubleshooting. I think we may close this ticket. BRs, Fan Liu ADP Document Database PG -----Original Message----- From: Dmitry Dolgov <9erthalion6@gmail.com> Sent: 2020年2月24日 18:39 To: Fan Liu <fan.liu@ericsson.com> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: Re: [Bus error] huge_pages default value (try) not fall back >> On Fri, Feb 21, 2020, 3:19 AM Fan Liu <fan.liu@ericsson.com<mailto:fan.liu@ericsson.com>> wrote: >> >> Attached core dump file, could you take a look? > > I can take a look on Monday, but at the same time if you have issues > on initdb stage you try to run it with -d option and check out > debugging output, should be helpful. Unfortunately, I wasn't able to get a meaningful stack trace from this dump, most likely due to different versions (I hopedthat the latest pgdg package with 10 for bionic would fit). But you can also try to post it following this instructions[1]. [1]: https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
Hi, I stumbled upon this issue when working with the related issue in Kubernetes that was referenced a few mails behind. So from what I understand, it looks like this issue is/may be a result of how hugetlb cgroup is enforcing the "limit_in_bytes" limit for huge pages. A process should theoretically don't segfault like this under normal circumstances when using memory received from a successful mmap. The value set to "limit_in_bytes" is only enforced during page allocation, and _not_ when mapping pages using mmap. This results in a successful mmap for -n- huge pages as long as the system has -n- free hugepages, even though the size is bigger than "limit_in_bytes". The process then reserves the huge page memory, and makes it inaccessible to other processes. The real issue is when postgres tries to write to the memory it received from mmap, and the kernel tries to allocate the reserved huge page memory. Since it is not allowed to do so by the cgroup, the process segfaults. This issue has been fixed in Linux this patch https://lkml.org/lkml/2020/2/3/1153, that adds a new element of control to the cgroup that will fix this issue. There are however no container runtimes that use it yet, and only 5.7+ (afaik.) kernels support it, but the progress can be tracked here: https://github.com/opencontainers/runtime-spec/issues/1050. The fix for the upstream Kubernetes issue (https://github.com/opencontainers/runtime-spec/issues/1050) that made kubernetes set wrong value to the top level "limit_in_bytes" when the pre-allocated page count increased after kubernetes (kubelet) startup, will hopefully land in Kubernetes 1.19 (or 1.20). Fingers crossed! Hopefully this makes some sense, and gives some insights into the issue... Best regards, Odin Ugedal
Thank you so much for the information. BRs, Fan Liu ADP Document Database PG >>-----Original Message----- >>From: Odin Ugedal <odin@ugedal.com> >>Sent: 2020年6月9日 23:23 >>To: Fan Liu <fan.liu@ericsson.com> >>Cc: Dmitry Dolgov <9erthalion6@gmail.com>; PostgreSQL mailing lists >><pgsql-bugs@lists.postgresql.org> >>Subject: Re: [Bus error] huge_pages default value (try) not fall back >> >>Hi, >> >>I stumbled upon this issue when working with the related issue in Kubernetes >>that was referenced a few mails behind. So from what I understand, it looks >>like this issue is/may be a result of how hugetlb cgroup is enforcing the >>"limit_in_bytes" limit for huge pages. A process should theoretically don't >>segfault like this under normal circumstances when using memory received from >>a successful mmap. The value set to "limit_in_bytes" is only enforced during >>page allocation, and _not_ when mapping pages using mmap. This results in a >>successful mmap for -n- huge pages as long as the system has -n- free hugepages, >>even though the size is bigger than "limit_in_bytes". The process then reserves >>the huge page memory, and makes it inaccessible to other processes. >> >>The real issue is when postgres tries to write to the memory it received from >>mmap, and the kernel tries to allocate the reserved huge page memory. Since >>it is not allowed to do so by the cgroup, the process segfaults. >> >>This issue has been fixed in Linux this patch >>https://protect2.fireeye.com/v1/url?k=41942750-1f34c7c4-419467cb-86d2114ea >>b2f-4c9655dbe24776b3&q=1&e=4467c237-1149-49f1-ab6c-bc0a3c31b0f3&u=https%3A >>%2F%2Flkml.org%2Flkml%2F2020%2F2%2F3%2F1153, that adds a new element of >>control to the cgroup that will fix this issue. There are however no container >>runtimes that use it yet, and only 5.7+ (afaik.) kernels support it, but the >>progress can be tracked here: >>https://protect2.fireeye.com/v1/url?k=8e01d669-d0a136fd-8e0196f2-86d2114ea >>b2f-dd1ff954a0920218&q=1&e=4467c237-1149-49f1-ab6c-bc0a3c31b0f3&u=https%3A >>%2F%2Fgithub.com%2Fopencontainers%2Fruntime-spec%2Fissues%2F1050. The fix >>for the upstream Kubernetes issue >>(https://protect2.fireeye.com/v1/url?k=5d33f1ab-0393113f-5d33b130-86d2114e >>ab2f-38b5ca047e5124c3&q=1&e=4467c237-1149-49f1-ab6c-bc0a3c31b0f3&u=https%3 >>A%2F%2Fgithub.com%2Fopencontainers%2Fruntime-spec%2Fissues%2F1050) that made >>kubernetes set wrong value to the top level "limit_in_bytes" when the >>pre-allocated page count increased after kubernetes (kubelet) startup, will >>hopefully land in Kubernetes 1.19 (or 1.20). Fingers crossed! >> >>Hopefully this makes some sense, and gives some insights into the issue... >> >>Best regards, >>Odin Ugedal