Thread: 9.4 beta1 crash on Debian sid/i386
Building 9.4 beta1 on Debian sid/i386 fails during the regression tests. amd64 works fine, as does i386 on the released distributions. parallel group (11 tests): create_cast create_aggregate drop_if_exists typed_table create_function_3 vacuum constraintscreate_table_like triggers inherit updatable_views create_aggregate ... ok create_function_3 ... ok create_cast ... ok constraints ... ok triggers ... ok inherit ... ok create_table_like ... ok typed_table ... ok vacuum ... ok drop_if_exists ... ok updatable_views ... ok test sanity_check ... ok test errors ... FAILED (test process exited with exit code 2) test select ... FAILED (test process exited with exit code 2) [...] LOG: server process (PID 8403) was terminated by signal 7: Bus error DETAIL: Failed process was running: select infinite_recurse(); LOG: terminating any other active server processes *** /home/cbe/projects/postgresql/9.4/trunk/build/../src/test/regress/expected/errors.out 2014-05-11 23:16:48.000000000+0200 --- /srv/projects/postgresql/9.4/trunk/build/src/test/regress/results/errors.out 2014-05-13 22:16:05.337798163 +0200 *************** *** 444,447 **** 'select infinite_recurse()' language sql; \set VERBOSITY terse select infinite_recurse(); ! ERROR: stack depth limit exceeded --- 444,447 ---- 'select infinite_recurse()' language sql; \set VERBOSITY terse select infinite_recurse(); ! connection to server was lost Reading symbols from /srv/projects/postgresql/9.4/trunk/build/src/backend/postgres...done. [New LWP 8403] warning: Could not load shared library symbols for linux-gate.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1". Core was generated by `postgres: cbe regression [local] SELECT '. Program terminated with signal 7, Bus error. #0 0xf72abbc3 in ScanKeywordLookup (text=0xed2ec330 "select", keywords=0xf773ca60 <ScanKeywords>, num_keywords=405) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/kwlookup.c:42 42 { (gdb) bt #0 0xf72abbc3 in ScanKeywordLookup (text=0xed2ec330 "select", keywords=0xf773ca60 <ScanKeywords>, num_keywords=405) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/kwlookup.c:42 #1 0xf72aa41f in core_yylex (yylval_param=yylval_param@entry=0xffdcb104, yylloc_param=yylloc_param@entry=0xffdcb108,yyscanner=yyscanner@entry=0xed2ec2a8) at scan.l:950 #2 0xf72abe30 in base_yylex (lvalp=lvalp@entry=0xffdcb104, llocp=llocp@entry=0xffdcb108, yyscanner=yyscanner@entry=0xed2ec2a8) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/parser.c:99 #3 0xf72949c8 in base_yyparse (yyscanner=yyscanner@entry=0xed2ec2a8) at gram.c:21442 #4 0xf72abd02 in raw_parser (str=str@entry=0xed2ec1e8 "select infinite_recurse()") at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/parser.c:52 #5 0xf7441ac7 in pg_parse_query ( query_string=query_string@entry=0xed2ec1e8 "select infinite_recurse()") at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/postgres.c:563 #6 0xf73c77a8 in inline_function (context=0xffdcbc1c, func_tuple=0xedad9b98, funcvariadic=<optimized out>, args=0x0,input_collid=0, result_collid=0, result_type=23, funcid=29299) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/optimizer/util/clauses.c:4268 #7 simplify_function (funcid=29299, result_type=23, result_typmod=-1, result_collid=result_collid@entry=0, input_collid=input_collid@entry=0, args_p=args_p@entry=0xffdcba9c, funcvariadic=funcvariadic@entry=0 '\000', process_args=process_args@entry=1'\001', allow_non_const=allow_non_const@entry=1 '\001', context=context@entry=0xffdcbc1c) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/optimizer/util/clauses.c:3796 [...] ... help bt ... ah I can also ask it to print the outermost stack frames separately... (gdb) bt -10 #35639 0xf733927c in ExecutorRun (queryDesc=queryDesc@entry=0xf945d188, direction=direction@entry=ForwardScanDirection,count=count@entry=0) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/executor/execMain.c:256 #35640 0xf7445463 in PortalRunSelect (portal=portal@entry=0xf945b180, forward=forward@entry=1 '\001', count=0, count@entry=2147483647,dest=dest@entry=0xf94304c8) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/pquery.c:946 #35641 0xf744694e in PortalRun (portal=portal@entry=0xf945b180, count=count@entry=2147483647, isTopLevel=isTopLevel@entry=1'\001', dest=dest@entry=0xf94304c8, altdest=altdest@entry=0xf94304c8, completionTag=completionTag@entry=0xfff9740c"") at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/pquery.c:790 #35642 0xf744421e in exec_simple_query (query_string=0xf942f1a0 "select infinite_recurse();") at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/postgres.c:1045 #35643 PostgresMain (argc=1, argv=argv@entry=0xf93cff50, dbname=0xf93cfde0 "regression", username=0xf93cfdd0 "cbe") at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/postgres.c:4004 #35644 0xf71e068e in BackendRun (port=0xf93ed768) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:4117 #35645 BackendStartup (port=0xf93ed768) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:3791 #35646 ServerLoop () at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:1570 #35647 0xf73e738a in PostmasterMain (argc=argc@entry=6, argv=argv@entry=0xf93cf510) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:1223 #35648 0xf71e1976 in main (argc=6, argv=0xf93cf510) at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/main/main.c:225 ../../../src/bin/pg_config/pg_config BINDIR = /usr/lib/postgresql/9.4/bin DOCDIR = /usr/share/doc/postgresql-doc-9.4 HTMLDIR = /usr/share/doc/postgresql-doc-9.4 INCLUDEDIR = /usr/include/postgresql PKGINCLUDEDIR = /usr/include/postgresql INCLUDEDIR-SERVER = /usr/include/postgresql/9.4/server LIBDIR = /usr/lib PKGLIBDIR = /usr/lib/postgresql/9.4/lib LOCALEDIR = /usr/share/locale MANDIR = /usr/share/postgresql/9.4/man SHAREDIR = /usr/share/postgresql/9.4 SYSCONFDIR = /etc/postgresql-common PGXS = /usr/lib/postgresql/9.4/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' '--with-tclconfig=/usr/lib/i386-linux-gnu/tcl8.6''--with-includes=/usr/include/tcl8.6' 'PYTHON=/usr/bin/python' '--mandir=/usr/share/postgresql/9.4/man''--docdir=/usr/share/doc/postgresql-doc-9.4' '--sysconfdir=/etc/postgresql-common''--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/9.4' '--bindir=/usr/lib/postgresql/9.4/bin''--libdir=/usr/lib/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/''--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety' '--enable-debug''--disable-rpath' '--with-ossp-uuid' '--with-gnu-ld' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo''CFLAGS=-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security-fPIC -pie -I/usr/include/mit-krb5 -DLINUX_OOM_SCORE_ADJ=0' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -Wl,--as-needed-L/usr/lib/mit-krb5 -L/usr/lib/i386-linux-gnu/mit-krb5' '--with-krb5' '--with-gssapi' '--with-ldap' 'CPPFLAGS=-D_FORTIFY_SOURCE=2' CC = gcc CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.6 CFLAGS = -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie -I/usr/include/mit-krb5-DLINUX_OOM_SCORE_ADJ=0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g CFLAGS_SL = -fpic LDFLAGS = -L../../../src/common -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -L/usr/lib/mit-krb5 -L/usr/lib/i386-linux-gnu/mit-krb5-Wl,--as-needed LDFLAGS_EX = LDFLAGS_SL = LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -ledit -lrt -lcrypt -ldl -lm VERSION = PostgreSQL 9.4beta1 gcc (Debian 4.8.2-21) 4.8.2 libc6:i386 2.18-5 Christoph -- cb@df7cb.de | http://www.df7cb.de/
Christoph Berg <cb@df7cb.de> writes: > Building 9.4 beta1 on Debian sid/i386 fails during the regression > tests. amd64 works fine, as does i386 on the released distributions. It would appear that something is wrong with check_stack_depth(), and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack. None of that logic has changed in awhile. You might try checking what the max_stack_depth GUC gets set to by default in each build, and how that compares to what "ulimit -s" has to say. If it looks sane, try tracing through check_stack_depth. regards, tom lane
Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us> > Christoph Berg <cb@df7cb.de> writes: > > Building 9.4 beta1 on Debian sid/i386 fails during the regression > > tests. amd64 works fine, as does i386 on the released distributions. > > It would appear that something is wrong with check_stack_depth(), > and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack. > None of that logic has changed in awhile. You might try checking what > the max_stack_depth GUC gets set to by default in each build, and how that > compares to what "ulimit -s" has to say. If it looks sane, try tracing > through check_stack_depth. ulimit -s is 8192 (kB); max_stack_depth is 2MB. check_stack_depth looks right, max_stack_depth_bytes there is 2097152 and I can see stack_base_ptr - &stack_top_loc grow over repeated invocations of the function (stack_depth itself is optimized out). Still, it never enters "if (stack_depth > max_stack_depth_bytes...)". Using "b check_stack_depth if (stack_base_ptr - &stack_top_loc) > 1000000", I could see the stack size at 1000264, though when I then tried with > 1900000, it caught SIGBUS again. In the meantime, the problem has manifested itself also on other architectures: armel, armhf, and mipsel (the build logs are at [1], though they don't contain anything except a "FATAL: the database system is in recovery mode"). [1] https://buildd.debian.org/status/logs.php?pkg=postgresql-9.4&ver=9.4~beta1-1 Interestingly, the Debian buildd managed to run the testsuite for i386, while I could reproduce the problem on the pgapt build machine and on my notebook, so there must be some system difference. Possibly the reason is these two machines are running a 64bit kernel and I'm building in a 32bit chroot, though that hasn't been a problem before. Christoph -- cb@df7cb.de | http://www.df7cb.de/
Christoph Berg <cb@df7cb.de> writes: > Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us> >> It would appear that something is wrong with check_stack_depth(), >> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack. > ulimit -s is 8192 (kB); max_stack_depth is 2MB. > check_stack_depth looks right, max_stack_depth_bytes there is 2097152 > and I can see stack_base_ptr - &stack_top_loc grow over repeated > invocations of the function (stack_depth itself is optimized out). > Still, it never enters "if (stack_depth > max_stack_depth_bytes...)". Hm. Did you check that stack_base_ptr is non-NULL? If it were somehow not getting set, that would disable the error report. But on most architectures that would also result in silly values for the pointer difference, so I doubt this is the issue. > Interestingly, the Debian buildd managed to run the testsuite for > i386, while I could reproduce the problem on the pgapt build machine > and on my notebook, so there must be some system difference. Possibly > the reason is these two machines are running a 64bit kernel and I'm > building in a 32bit chroot, though that hasn't been a problem before. I'm suspicious that something has changed in your build environment, because that stack-checking logic hasn't changed since these commits: Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Branch: master Release: REL9_2_BR [ef3883d13] 2012-04-08 19:07:55 +0300 Branch: REL9_1_STABLE Release: REL9_1_4 [ef29bb1f7] 2012-04-08 19:08:13 +0300 Branch: REL9_0_STABLE Release: REL9_0_8 [77dc2b0a4] 2012-04-08 19:09:12 +0300 Branch: REL8_4_STABLE Release: REL8_4_12 [89da5dc6d] 2012-04-08 19:09:26 +0300 Branch: REL8_3_STABLE Release: REL8_3_19 [ddeac5dec] 2012-04-08 19:09:37 +0300 Do stack-depth checking in all postmaster children. We used to only initialize the stack base pointer when startingup a regular backend, not in other processes. In particular, autovacuum workers can run arbitrary user code,and without stack-depth checking, infinite recursion in e.g an index expression will bring down the whole cluster. The lack of reports from the buildfarm or other users is also evidence against there being a widespread issue here. A different thought: I have heard of environments in which the available stack depth is much less than what ulimit would suggest because the ulimit space gets split up for multiple per-thread stacks. That should not be happening in a Postgres backend, since we don't do threading, but I'm running out of ideas to investigate ... regards, tom lane
Re: Tom Lane 2014-05-18 <9058.1400385611@sss.pgh.pa.us> > Christoph Berg <cb@df7cb.de> writes: > > Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us> > >> It would appear that something is wrong with check_stack_depth(), > >> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack. > > > ulimit -s is 8192 (kB); max_stack_depth is 2MB. > > > check_stack_depth looks right, max_stack_depth_bytes there is 2097152 > > and I can see stack_base_ptr - &stack_top_loc grow over repeated > > invocations of the function (stack_depth itself is optimized out). > > Still, it never enters "if (stack_depth > max_stack_depth_bytes...)". > > Hm. Did you check that stack_base_ptr is non-NULL? If it were somehow > not getting set, that would disable the error report. But on most > architectures that would also result in silly values for the pointer > difference, so I doubt this is the issue. stack_base_ptr was non-NULL. The stack size started around 3 or 5kB (don't remember exactly), and grew by something like a few 100B in each iteration, so this looked sane. > > Interestingly, the Debian buildd managed to run the testsuite for > > i386, while I could reproduce the problem on the pgapt build machine > > and on my notebook, so there must be some system difference. Possibly > > the reason is these two machines are running a 64bit kernel and I'm > > building in a 32bit chroot, though that hasn't been a problem before. > > I'm suspicious that something has changed in your build environment, > because that stack-checking logic hasn't changed since these commits: It's something in the combination of build and runtime environment. I can reproduce the problem in the package that the Debian i386/experimental buildd has compiled, including passing the regression tests there. Possibly a change in libc there. I'll try to ask some kernel/libc people if they have an idea. My current bet is on the gcc hardening flags we are using. > The lack of reports from the buildfarm or other users is also evidence > against there being a widespread issue here. The only animal running Debian testing/unstable I can see is dugong, which is ia64 - which has been removed from Debian some months ago. I guess I should look into setting up a new animal for this. > A different thought: I have heard of environments in which the available > stack depth is much less than what ulimit would suggest because the ulimit > space gets split up for multiple per-thread stacks. That should not be > happening in a Postgres backend, since we don't do threading, but I'm > running out of ideas to investigate ... I've done some builds now and there's no clear picture yet when the problem is occurring. Still trying... Christoph -- cb@df7cb.de | http://www.df7cb.de/
On 2014-05-18 11:08:34 +0200, Christoph Berg wrote: > > > Interestingly, the Debian buildd managed to run the testsuite for > > > i386, while I could reproduce the problem on the pgapt build machine > > > and on my notebook, so there must be some system difference. Possibly > > > the reason is these two machines are running a 64bit kernel and I'm > > > building in a 32bit chroot, though that hasn't been a problem before. > > > > I'm suspicious that something has changed in your build environment, > > because that stack-checking logic hasn't changed since these commits: > > It's something in the combination of build and runtime environment. I > can reproduce the problem in the package that the Debian > i386/experimental buildd has compiled, including passing the > regression tests there. Possibly a change in libc there. I'll try to > ask some kernel/libc people if they have an idea. My current bet is on > the gcc hardening flags we are using. As another datapoint: I don't see the problem on 32bit packages build with a 64bit gcc with -m32 on debian unstable on my laptop. Didn't try hard though. Did you measure how large the stack actually was when you got the SIGBUS? Should be possible to determine that by computing the offset using some local stack variable in one of the depeest stack frames. Greetings, Andres Freund
Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de> > Did you measure how large the stack actually was when you got the > SIGBUS? Should be possible to determine that by computing the offset > using some local stack variable in one of the depeest stack frames. Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a process that just got SIGBUS. This seems to be in line with stack_base_ptr = 0xffd1c317 and the fcinfo address in #0 hashname (fcinfo=fcinfo@entry=0xffb38024) at /build/postgresql-9.4-4lNBaG/postgresql-9.4-9.4~beta1/build/../src/backend/access/hash/hashfunc.c:143 (Things work fine when I set max_stack_depth = '1900kB'.) Christoph -- cb@df7cb.de | http://www.df7cb.de/
Christoph Berg <cb@df7cb.de> writes: > Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de> >> Did you measure how large the stack actually was when you got the >> SIGBUS? Should be possible to determine that by computing the offset >> using some local stack variable in one of the depeest stack frames. > Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a > process that just got SIGBUS. This seems to be in line with > stack_base_ptr = 0xffd1c317 and the fcinfo address in OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about the available stack depth. I'd classify that as a kernel bug. I wonder if it's a different manifestation of this issue: https://bugzilla.redhat.com/show_bug.cgi?id=952946 A different line of thought is that if ulimit -s is 8192, why are we not getting 8MB of stack? But in any case, if we're only going to get 1944kB, getrlimit ought to tell us that. regards, tom lane
On 2014-05-18 17:41:17 -0400, Tom Lane wrote: > Christoph Berg <cb@df7cb.de> writes: > > Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de> > >> Did you measure how large the stack actually was when you got the > >> SIGBUS? Should be possible to determine that by computing the offset > >> using some local stack variable in one of the depeest stack frames. > > > Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a > > process that just got SIGBUS. This seems to be in line with > > stack_base_ptr = 0xffd1c317 and the fcinfo address in > > OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about > the available stack depth. I'd classify that as a kernel bug. I wonder > if it's a different manifestation of this issue: > https://bugzilla.redhat.com/show_bug.cgi?id=952946 That'd explain why I couldn't reproduce it. And I seme to recall some messages about the hardening stuff in debian accidentally being lost some time ago. So if that got re-introduced into 9.4... The CFLAGS certainly indicate that -pie is getting used. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-18 17:41:17 -0400, Tom Lane wrote: >> OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about >> the available stack depth. I'd classify that as a kernel bug. I wonder >> if it's a different manifestation of this issue: >> https://bugzilla.redhat.com/show_bug.cgi?id=952946 > That'd explain why I couldn't reproduce it. And I seme to recall some > messages about the hardening stuff in debian accidentally being lost > some time ago. So if that got re-introduced into 9.4... The CFLAGS > certainly indicate that -pie is getting used. Yeah. Re-reading the Red Hat bug, it seems like an exact match for this issue. The dependency on ASLR means that the identical run might sometimes work and sometimes crash, which would explain why Christoph was getting less-than-consistent results. The bad news is that the kernel guys have been ignoring the issue for over a year. Dunno if some pressure from the Debian camp would help raise their priority for this. regards, tom lane
On 2014-05-18 23:52:32 +0200, Andres Freund wrote: > On 2014-05-18 17:41:17 -0400, Tom Lane wrote: > > Christoph Berg <cb@df7cb.de> writes: > > > Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de> > > >> Did you measure how large the stack actually was when you got the > > >> SIGBUS? Should be possible to determine that by computing the offset > > >> using some local stack variable in one of the depeest stack frames. > > > > > Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a > > > process that just got SIGBUS. This seems to be in line with > > > stack_base_ptr = 0xffd1c317 and the fcinfo address in > > > > OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about > > the available stack depth. I'd classify that as a kernel bug. I wonder > > if it's a different manifestation of this issue: > > https://bugzilla.redhat.com/show_bug.cgi?id=952946 > > That'd explain why I couldn't reproduce it. And I seme to recall some > messages about the hardening stuff in debian accidentally being lost > some time ago. So if that got re-introduced into 9.4... The CFLAGS > certainly indicate that -pie is getting used. Indeed. If I add -pie to my 32bit vpath's configure invocation it crashes, too. Not that that helps much to resolve the bug, given it's been sedentary for a long while :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-05-18 17:56:48 -0400, Tom Lane wrote: > The bad news is that the kernel guys have been ignoring the issue > for over a year. Dunno if some pressure from the Debian camp would > help raise their priority for this. I guess we should forward the bug to the lkml/linux-mm lists. I think a fair number of people involved in those areas won't read RH bugzilla without pointed towards it, err, pointedly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Tom Lane 2014-05-18 <26862.1400449277@sss.pgh.pa.us> > OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about > the available stack depth. I'd classify that as a kernel bug. I wonder > if it's a different manifestation of this issue: > https://bugzilla.redhat.com/show_bug.cgi?id=952946 > > A different line of thought is that if ulimit -s is 8192, why are we > not getting 8MB of stack? But in any case, if we're only going to > get 1944kB, getrlimit ought to tell us that. The issue looks exactly like what you are writing in that bugzilla bug, including the fact that [stack] in /proc/maps gets replaced by [heap] once the bus error happens (Comment 11). Christoph -- cb@df7cb.de | http://www.df7cb.de/
Re: To Tom Lane 2014-05-19 <20140519091808.GA7296@msgid.df7cb.de> > Re: Tom Lane 2014-05-18 <26862.1400449277@sss.pgh.pa.us> > > OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about > > the available stack depth. I'd classify that as a kernel bug. I wonder > > if it's a different manifestation of this issue: > > https://bugzilla.redhat.com/show_bug.cgi?id=952946 > > > > A different line of thought is that if ulimit -s is 8192, why are we > > not getting 8MB of stack? But in any case, if we're only going to > > get 1944kB, getrlimit ought to tell us that. > > The issue looks exactly like what you are writing in that bugzilla > bug, including the fact that [stack] in /proc/maps gets replaced by > [heap] once the bus error happens (Comment 11). I've done some more digging. The problem exists also on plain 32bit kernels, not only 64bit running a 32bit userland. (Tested on Debian Wheezy's 3.2.57 kernel.) The problem seems to be that the address layout puts heap and stack too close together - there's only about 125MB between the start of heap and the end of stack. Apparently 9.4 is a bit more memory-hungry on the heap side when running infinite_recurse() so it SIGBUSses before it reaches the 2MB max_stack_depth. In 9.3 I can easily see the same problem with max_stack_depth = '7MB', when at the time of the crash, the stack is 2797568 bytes as reported by /proc/maps, and in 9.1, the crash happens at 3084288. (Both do catch the problem properly with max_stack_depth = '2MB' at which point 2105344 bytes stack are allocated.) Debian/Ubuntu have been using hardened PostgreSQL builds for years now, including running the regression tests - apparently we were always close to a crash, it just had not happened yet. So there's a few points to consider: * ASLR leaves only 125MB for brk()-style heap plus stack * RLIMIT_STACK is treated as an upper limit, not a reservation * PostgreSQL thinks max_stack_depth=2MB plus check_stack_depth() is safe, instead of having a SIGBUS handler * PostgreSQL allocates lots of heap using brk() instead of mmap() If any of that wouldn't hold, the problem wouln't appear. I'm not sure where to go from here. Getting the kernel (or the libc) changed seems hard, and that would probably only affect future distributions anyway. A short-term fix might be to reduce max_stack_depth for the regression tests, which tests the functionality, but leaves the problem open for production. Implementing a SIGBUS/SIGSEGV handler would probably mean that the whole ouch-lets-restart-on-error logic would become ineffective, unless we go check with address caused the error and decided if it was part of the stack or not. An hack would be to touch some address deep in the stack early at backend start, so the address space would be reserved for the stack. Though it seems ugly to do that for all backends, not only that are actually using much stack. (The cost would be one memory page, which isn't too much, otoh.) Christoph -- cb@df7cb.de | http://www.df7cb.de/
On 2014-05-19 13:53:18 +0200, Christoph Berg wrote: > I've done some more digging. The problem exists also on plain 32bit > kernels, not only 64bit running a 32bit userland. (Tested on Debian > Wheezy's 3.2.57 kernel.) Too bad. > Debian/Ubuntu have been using hardened PostgreSQL builds for years > now, including running the regression tests - apparently we were > always close to a crash, it just had not happened yet. There might be some user defined workloads triggering it as well... > So there's a few points to consider: > * ASLR leaves only 125MB for brk()-style heap plus stack > * RLIMIT_STACK is treated as an upper limit, not a reservation > * PostgreSQL thinks max_stack_depth=2MB plus check_stack_depth() is > safe, instead of having a SIGBUS handler > * PostgreSQL allocates lots of heap using brk() instead of mmap() * postgres on debian is build with -pie. > If any of that wouldn't hold, the problem wouln't appear. > I'm not sure where to go from here. Getting the kernel (or the libc) > changed seems hard, and that would probably only affect future > distributions anyway. Hm, this certainly looks like the kind of bug that should get backported to -stable et al. > A short-term fix might be to reduce > max_stack_depth for the regression tests, which tests the > functionality, but leaves the problem open for production. > Implementing a SIGBUS/SIGSEGV handler would probably mean that the > whole ouch-lets-restart-on-error logic would become ineffective, > unless we go check with address caused the error and decided if it was > part of the stack or not. Meh. I am pretty staunchly set against trying this. This is putting complex tape over the problem. And we'd have significant problems discerning the different kinds of SIGBUS errors or such. Isn't the far more obvious thing ot just not build postgres with -pie on 32bit? It's hardly a security benefit if it allows plain user to crash the server. Besides the stack problem, have you measured whether it's viable to use -pie on 32bit performancewise? That's stuff not that cheap, especially on 32bit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2014-05-19 13:53:18 +0200, Christoph Berg wrote: > * PostgreSQL allocates lots of heap using brk() instead of mmap() It doesn't really do that, btw. It's the libc's mmap that makes those decisions, not postgres. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Isn't the far more obvious thing ot just not build postgres with -pie on > 32bit? It's hardly a security benefit if it allows plain user to crash > the server. Yeah, that's what I was doing when I was at Red Hat --- PIE mode would be nice, but not when it breaks basic functionality. I think throwing an error out of a SIGBUS handler is right out. There would be no way to know exactly what code we were interrupting. It's the same reason we don't let, eg, the SIGALRM handler throw a timeout error directly (in most places anyway). >> * PostgreSQL allocates lots of heap using brk() instead of mmap() > It doesn't really do that, btw. It's the libc's mmap that makes those > decisions, not postgres. It occurs to me that maybe this is a glibc bug, not a kernel bug? regards, tom lane
On 2014-05-19 09:53:11 -0400, Tom Lane wrote: > I think throwing an error out of a SIGBUS handler is right out. There > would be no way to know exactly what code we were interrupting. It's > the same reason we don't let, eg, the SIGALRM handler throw a timeout > error directly (in most places anyway). Agreed. I think if we really, really feel the need to do something about this - which I don't - we could allocate a separate stack very early on and use that. > >> * PostgreSQL allocates lots of heap using brk() instead of mmap() > > > It doesn't really do that, btw. It's the libc's mmap that makes those > > decisions, not postgres. > > It occurs to me that maybe this is a glibc bug, not a kernel bug? You think malloc() should try to be careful when calling brk() and check beforehand wether it'll conflict with stack_base + RLIMIT_STACK? That's not a bad argument, but it still seems a really bad choice to leave that little space for the heap. Especially when it's dependant on -pie being used. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Andres Freund 2014-05-19 <20140519141221.GC5098@alap3.anarazel.de> > On 2014-05-19 09:53:11 -0400, Tom Lane wrote: > > I think throwing an error out of a SIGBUS handler is right out. There > > would be no way to know exactly what code we were interrupting. It's > > the same reason we don't let, eg, the SIGALRM handler throw a timeout > > error directly (in most places anyway). Right. I just mentioned that for completeness. > Agreed. I think if we really, really feel the need to do something about > this - which I don't - we could allocate a separate stack very early on > and use that. Hmm, that'd be an extension of the other idea, "write something deep in the stack on startup". This is probably less evil, though I agree it's a big hammer for solving something that should probably be fixed elsewhere. > > >> * PostgreSQL allocates lots of heap using brk() instead of mmap() > > > > > It doesn't really do that, btw. It's the libc's mmap that makes those > > > decisions, not postgres. > > > > It occurs to me that maybe this is a glibc bug, not a kernel bug? > > You think malloc() should try to be careful when calling brk() and check > beforehand wether it'll conflict with stack_base + RLIMIT_STACK? That's > not a bad argument, but it still seems a really bad choice to leave that > little space for the heap. Especially when it's dependant on -pie being > used. It's probably both, the default ASLR layout providing too little heap, plus malloc() running into the stack area - I'm not sure if the former is the kernel's fault or libc/ld.so's, probably they need to work together on that anyway. Disabling -pie for all 32bit archs seems to be the way to go for us now. Does this topic warrant being mentioned in the docs? Christoph
Re: To Tom Lane 2014-05-19 <20140519144717.GG7296@msgid.df7cb.de> > Disabling -pie for all 32bit archs seems to be the way to go for us > now. FTR, I've just had a look at armhf (arm-linux-gnueabihf), the address layout looks exactly the same there, and 9.3 crashes easily, so it's really a problem of all Linux 32bit archs. I'm puzzled the regression tests passed there [1], but anyway, we'll probably get rid of -pie. [1] https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.3&arch=armhf&ver=9.3.4-1&stamp=1395348483 > Does this topic warrant being mentioned in the docs? Christoph -- cb@df7cb.de | http://www.df7cb.de/
Christoph Berg <cb@df7cb.de> writes: > FTR, I've just had a look at armhf (arm-linux-gnueabihf), the address > layout looks exactly the same there, and 9.3 crashes easily, so it's > really a problem of all Linux 32bit archs. I'm puzzled the regression > tests passed there [1], but anyway, we'll probably get rid of -pie. Well, the failure is probabilistic --- it might sometimes pass because the random address layout is not so tight. regards, tom lane