Thread: 9.4 beta1 crash on Debian sid/i386

9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

13 May 2014, 20:53:43

Building 9.4 beta1 on Debian sid/i386 fails during the regression
tests. amd64 works fine, as does i386 on the released distributions.

parallel group (11 tests):  create_cast create_aggregate drop_if_exists typed_table create_function_3 vacuum
constraintscreate_table_like triggers inherit updatable_views    create_aggregate         ... ok    create_function_3
    ... ok    create_cast              ... ok    constraints              ... ok    triggers                 ... ok
inherit                 ... ok    create_table_like        ... ok    typed_table              ... ok    vacuum
        ... ok    drop_if_exists           ... ok    updatable_views          ... ok 
test sanity_check             ... ok
test errors                   ... FAILED (test process exited with exit code 2)
test select                   ... FAILED (test process exited with exit code 2)
[...]

LOG:  server process (PID 8403) was terminated by signal 7: Bus error
DETAIL:  Failed process was running: select infinite_recurse();
LOG:  terminating any other active server processes

*** /home/cbe/projects/postgresql/9.4/trunk/build/../src/test/regress/expected/errors.out       2014-05-11
23:16:48.000000000+0200 
--- /srv/projects/postgresql/9.4/trunk/build/src/test/regress/results/errors.out        2014-05-13 22:16:05.337798163
+0200
***************
*** 444,447 **** 'select infinite_recurse()' language sql; \set VERBOSITY terse select infinite_recurse();
! ERROR:  stack depth limit exceeded
--- 444,447 ---- 'select infinite_recurse()' language sql; \set VERBOSITY terse select infinite_recurse();
! connection to server was lost

Reading symbols from /srv/projects/postgresql/9.4/trunk/build/src/backend/postgres...done.
[New LWP 8403]

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: cbe regression [local] SELECT                                       '.
Program terminated with signal 7, Bus error.
#0  0xf72abbc3 in ScanKeywordLookup (text=0xed2ec330 "select", keywords=0xf773ca60 <ScanKeywords>,    num_keywords=405)
 at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/kwlookup.c:42 
42    {
(gdb) bt
#0  0xf72abbc3 in ScanKeywordLookup (text=0xed2ec330 "select", keywords=0xf773ca60 <ScanKeywords>,    num_keywords=405)
 at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/kwlookup.c:42 
#1  0xf72aa41f in core_yylex (yylval_param=yylval_param@entry=0xffdcb104,
yylloc_param=yylloc_param@entry=0xffdcb108,yyscanner=yyscanner@entry=0xed2ec2a8) at scan.l:950 
#2  0xf72abe30 in base_yylex (lvalp=lvalp@entry=0xffdcb104, llocp=llocp@entry=0xffdcb108,
yyscanner=yyscanner@entry=0xed2ec2a8)  at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/parser.c:99
#3  0xf72949c8 in base_yyparse (yyscanner=yyscanner@entry=0xed2ec2a8) at gram.c:21442
#4  0xf72abd02 in raw_parser (str=str@entry=0xed2ec1e8 "select infinite_recurse()")   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/parser/parser.c:52
#5  0xf7441ac7 in pg_parse_query (   query_string=query_string@entry=0xed2ec1e8 "select infinite_recurse()")   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/postgres.c:563
#6  0xf73c77a8 in inline_function (context=0xffdcbc1c, func_tuple=0xedad9b98,    funcvariadic=<optimized out>,
args=0x0,input_collid=0, result_collid=0, result_type=23,    funcid=29299)   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/optimizer/util/clauses.c:4268
#7  simplify_function (funcid=29299, result_type=23, result_typmod=-1,    result_collid=result_collid@entry=0,
input_collid=input_collid@entry=0,   args_p=args_p@entry=0xffdcba9c, funcvariadic=funcvariadic@entry=0 '\000',
process_args=process_args@entry=1'\001', allow_non_const=allow_non_const@entry=1 '\001',
context=context@entry=0xffdcbc1c)  at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/optimizer/util/clauses.c:3796
[...]
... help bt ... ah I can also ask it to print the outermost stack frames separately...
(gdb) bt -10
#35639 0xf733927c in ExecutorRun (queryDesc=queryDesc@entry=0xf945d188,
direction=direction@entry=ForwardScanDirection,count=count@entry=0)   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/executor/execMain.c:256
#35640 0xf7445463 in PortalRunSelect (portal=portal@entry=0xf945b180, forward=forward@entry=1 '\001',    count=0,
count@entry=2147483647,dest=dest@entry=0xf94304c8)   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/pquery.c:946
#35641 0xf744694e in PortalRun (portal=portal@entry=0xf945b180, count=count@entry=2147483647,
isTopLevel=isTopLevel@entry=1'\001', dest=dest@entry=0xf94304c8,    altdest=altdest@entry=0xf94304c8,
completionTag=completionTag@entry=0xfff9740c"")   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/pquery.c:790
#35642 0xf744421e in exec_simple_query (query_string=0xf942f1a0 "select infinite_recurse();")   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/postgres.c:1045
#35643 PostgresMain (argc=1, argv=argv@entry=0xf93cff50, dbname=0xf93cfde0 "regression",    username=0xf93cfdd0 "cbe")
at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/tcop/postgres.c:4004 
#35644 0xf71e068e in BackendRun (port=0xf93ed768)   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:4117
#35645 BackendStartup (port=0xf93ed768)   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:3791
#35646 ServerLoop ()   at /home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:1570
#35647 0xf73e738a in PostmasterMain (argc=argc@entry=6, argv=argv@entry=0xf93cf510)   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/postmaster/postmaster.c:1223
#35648 0xf71e1976 in main (argc=6, argv=0xf93cf510)   at
/home/cbe/projects/postgresql/9.4/trunk/build/../src/backend/main/main.c:225

../../../src/bin/pg_config/pg_config
BINDIR = /usr/lib/postgresql/9.4/bin
DOCDIR = /usr/share/doc/postgresql-doc-9.4
HTMLDIR = /usr/share/doc/postgresql-doc-9.4
INCLUDEDIR = /usr/include/postgresql
PKGINCLUDEDIR = /usr/include/postgresql
INCLUDEDIR-SERVER = /usr/include/postgresql/9.4/server
LIBDIR = /usr/lib
PKGLIBDIR = /usr/lib/postgresql/9.4/lib
LOCALEDIR = /usr/share/locale
MANDIR = /usr/share/postgresql/9.4/man
SHAREDIR = /usr/share/postgresql/9.4
SYSCONFDIR = /etc/postgresql-common
PGXS = /usr/lib/postgresql/9.4/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt'
'--with-tclconfig=/usr/lib/i386-linux-gnu/tcl8.6''--with-includes=/usr/include/tcl8.6' 'PYTHON=/usr/bin/python'
'--mandir=/usr/share/postgresql/9.4/man''--docdir=/usr/share/doc/postgresql-doc-9.4'
'--sysconfdir=/etc/postgresql-common''--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/9.4'
'--bindir=/usr/lib/postgresql/9.4/bin''--libdir=/usr/lib/' '--libexecdir=/usr/lib/postgresql/'
'--includedir=/usr/include/postgresql/''--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety'
'--enable-debug''--disable-rpath' '--with-ossp-uuid' '--with-gnu-ld' '--with-pgport=5432'
'--with-system-tzdata=/usr/share/zoneinfo''CFLAGS=-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat
-Werror=format-security-fPIC -pie -I/usr/include/mit-krb5 -DLINUX_OOM_SCORE_ADJ=0' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now
-Wl,--as-needed-L/usr/lib/mit-krb5 -L/usr/lib/i386-linux-gnu/mit-krb5' '--with-krb5' '--with-gssapi' '--with-ldap'
'CPPFLAGS=-D_FORTIFY_SOURCE=2'
CC = gcc
CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.6
CFLAGS = -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie
-I/usr/include/mit-krb5-DLINUX_OOM_SCORE_ADJ=0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Wendif-labels-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -L/usr/lib/mit-krb5
-L/usr/lib/i386-linux-gnu/mit-krb5-Wl,--as-needed 
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -ledit -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 9.4beta1

gcc (Debian 4.8.2-21) 4.8.2
libc6:i386          2.18-5

Christoph
--
cb@df7cb.de | http://www.df7cb.de/

Re: 9.4 beta1 crash on Debian sid/i386

From

Tom Lane

Date:

14 May 2014, 00:42:51

Christoph Berg <cb@df7cb.de> writes:
> Building 9.4 beta1 on Debian sid/i386 fails during the regression
> tests. amd64 works fine, as does i386 on the released distributions.

It would appear that something is wrong with check_stack_depth(),
and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.
None of that logic has changed in awhile.  You might try checking what
the max_stack_depth GUC gets set to by default in each build, and how that
compares to what "ulimit -s" has to say.  If it looks sane, try tracing
through check_stack_depth.
        regards, tom lane

Re: 9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

17 May 2014, 22:40:50

Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us>
> Christoph Berg <cb@df7cb.de> writes:
> > Building 9.4 beta1 on Debian sid/i386 fails during the regression
> > tests. amd64 works fine, as does i386 on the released distributions.
> 
> It would appear that something is wrong with check_stack_depth(),
> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.
> None of that logic has changed in awhile.  You might try checking what
> the max_stack_depth GUC gets set to by default in each build, and how that
> compares to what "ulimit -s" has to say.  If it looks sane, try tracing
> through check_stack_depth.

ulimit -s is 8192 (kB); max_stack_depth is 2MB.

check_stack_depth looks right, max_stack_depth_bytes there is 2097152
and I can see stack_base_ptr - &stack_top_loc grow over repeated
invocations of the function (stack_depth itself is optimized out).
Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".

Using "b check_stack_depth if (stack_base_ptr - &stack_top_loc) >
1000000", I could see the stack size at 1000264, though when I then
tried with > 1900000, it caught SIGBUS again.

In the meantime, the problem has manifested itself also on other
architectures: armel, armhf, and mipsel (the build logs are at [1],
though they don't contain anything except a "FATAL:  the database
system is in recovery mode").

[1] https://buildd.debian.org/status/logs.php?pkg=postgresql-9.4&ver=9.4~beta1-1

Interestingly, the Debian buildd managed to run the testsuite for
i386, while I could reproduce the problem on the pgapt build machine
and on my notebook, so there must be some system difference. Possibly
the reason is these two machines are running a 64bit kernel and I'm
building in a 32bit chroot, though that hasn't been a problem before.

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/

Re: 9.4 beta1 crash on Debian sid/i386

From

Tom Lane

Date:

18 May 2014, 04:00:20

Christoph Berg <cb@df7cb.de> writes:
> Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us>
>> It would appear that something is wrong with check_stack_depth(),
>> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.

> ulimit -s is 8192 (kB); max_stack_depth is 2MB.

> check_stack_depth looks right, max_stack_depth_bytes there is 2097152
> and I can see stack_base_ptr - &stack_top_loc grow over repeated
> invocations of the function (stack_depth itself is optimized out).
> Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".

Hm.  Did you check that stack_base_ptr is non-NULL?  If it were somehow
not getting set, that would disable the error report.  But on most
architectures that would also result in silly values for the pointer
difference, so I doubt this is the issue.

> Interestingly, the Debian buildd managed to run the testsuite for
> i386, while I could reproduce the problem on the pgapt build machine
> and on my notebook, so there must be some system difference. Possibly
> the reason is these two machines are running a 64bit kernel and I'm
> building in a 32bit chroot, though that hasn't been a problem before.

I'm suspicious that something has changed in your build environment,
because that stack-checking logic hasn't changed since these commits:

Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Branch: master Release: REL9_2_BR [ef3883d13] 2012-04-08 19:07:55 +0300
Branch: REL9_1_STABLE Release: REL9_1_4 [ef29bb1f7] 2012-04-08 19:08:13 +0300
Branch: REL9_0_STABLE Release: REL9_0_8 [77dc2b0a4] 2012-04-08 19:09:12 +0300
Branch: REL8_4_STABLE Release: REL8_4_12 [89da5dc6d] 2012-04-08 19:09:26 +0300
Branch: REL8_3_STABLE Release: REL8_3_19 [ddeac5dec] 2012-04-08 19:09:37 +0300
   Do stack-depth checking in all postmaster children.      We used to only initialize the stack base pointer when
startingup a regular   backend, not in other processes. In particular, autovacuum workers can run   arbitrary user
code,and without stack-depth checking, infinite recursion   in e.g an index expression will bring down the whole
cluster.

The lack of reports from the buildfarm or other users is also evidence
against there being a widespread issue here.

A different thought: I have heard of environments in which the available
stack depth is much less than what ulimit would suggest because the ulimit
space gets split up for multiple per-thread stacks.  That should not be
happening in a Postgres backend, since we don't do threading, but I'm
running out of ideas to investigate ...
        regards, tom lane

Re: 9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

18 May 2014, 09:08:45

Re: Tom Lane 2014-05-18 <9058.1400385611@sss.pgh.pa.us>
> Christoph Berg <cb@df7cb.de> writes:
> > Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us>
> >> It would appear that something is wrong with check_stack_depth(),
> >> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.
> 
> > ulimit -s is 8192 (kB); max_stack_depth is 2MB.
> 
> > check_stack_depth looks right, max_stack_depth_bytes there is 2097152
> > and I can see stack_base_ptr - &stack_top_loc grow over repeated
> > invocations of the function (stack_depth itself is optimized out).
> > Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".
> 
> Hm.  Did you check that stack_base_ptr is non-NULL?  If it were somehow
> not getting set, that would disable the error report.  But on most
> architectures that would also result in silly values for the pointer
> difference, so I doubt this is the issue.

stack_base_ptr was non-NULL. The stack size started around 3 or 5kB
(don't remember exactly), and grew by something like a few 100B in
each iteration, so this looked sane.

> > Interestingly, the Debian buildd managed to run the testsuite for
> > i386, while I could reproduce the problem on the pgapt build machine
> > and on my notebook, so there must be some system difference. Possibly
> > the reason is these two machines are running a 64bit kernel and I'm
> > building in a 32bit chroot, though that hasn't been a problem before.
> 
> I'm suspicious that something has changed in your build environment,
> because that stack-checking logic hasn't changed since these commits:

It's something in the combination of build and runtime environment. I
can reproduce the problem in the package that the Debian
i386/experimental buildd has compiled, including passing the
regression tests there. Possibly a change in libc there. I'll try to
ask some kernel/libc people if they have an idea. My current bet is on
the gcc hardening flags we are using.

> The lack of reports from the buildfarm or other users is also evidence
> against there being a widespread issue here.

The only animal running Debian testing/unstable I can see is dugong,
which is ia64 - which has been removed from Debian some months ago.
I guess I should look into setting up a new animal for this.

> A different thought: I have heard of environments in which the available
> stack depth is much less than what ulimit would suggest because the ulimit
> space gets split up for multiple per-thread stacks.  That should not be
> happening in a Postgres backend, since we don't do threading, but I'm
> running out of ideas to investigate ...

I've done some builds now and there's no clear picture yet when the
problem is occurring. Still trying...

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/

Re: 9.4 beta1 crash on Debian sid/i386

From

Andres Freund

Date:

18 May 2014, 09:14:53

On 2014-05-18 11:08:34 +0200, Christoph Berg wrote:
> > > Interestingly, the Debian buildd managed to run the testsuite for
> > > i386, while I could reproduce the problem on the pgapt build machine
> > > and on my notebook, so there must be some system difference. Possibly
> > > the reason is these two machines are running a 64bit kernel and I'm
> > > building in a 32bit chroot, though that hasn't been a problem before.
> > 
> > I'm suspicious that something has changed in your build environment,
> > because that stack-checking logic hasn't changed since these commits:
> 
> It's something in the combination of build and runtime environment. I
> can reproduce the problem in the package that the Debian
> i386/experimental buildd has compiled, including passing the
> regression tests there. Possibly a change in libc there. I'll try to
> ask some kernel/libc people if they have an idea. My current bet is on
> the gcc hardening flags we are using.

As another datapoint: I don't see the problem on 32bit packages build
with a 64bit gcc with -m32 on debian unstable on my laptop. Didn't try
hard though.

Did you measure how large the stack actually was when you got the
SIGBUS? Should be possible to determine that by computing the offset
using some local stack variable in one of the depeest stack frames.

Greetings,

Andres Freund

Re: 9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

18 May 2014, 21:36:01

Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de>
> Did you measure how large the stack actually was when you got the
> SIGBUS? Should be possible to determine that by computing the offset
> using some local stack variable in one of the depeest stack frames.

Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a
process that just got SIGBUS. This seems to be in line with
stack_base_ptr = 0xffd1c317 and the fcinfo address in

#0  hashname (fcinfo=fcinfo@entry=0xffb38024)   at
/build/postgresql-9.4-4lNBaG/postgresql-9.4-9.4~beta1/build/../src/backend/access/hash/hashfunc.c:143

(Things work fine when I set max_stack_depth = '1900kB'.)

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/

Re: 9.4 beta1 crash on Debian sid/i386

From

Tom Lane

Date:

18 May 2014, 21:41:26

Christoph Berg <cb@df7cb.de> writes:
> Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de>
>> Did you measure how large the stack actually was when you got the
>> SIGBUS? Should be possible to determine that by computing the offset
>> using some local stack variable in one of the depeest stack frames.

> Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a
> process that just got SIGBUS. This seems to be in line with
> stack_base_ptr = 0xffd1c317 and the fcinfo address in

OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about
the available stack depth.  I'd classify that as a kernel bug.  I wonder
if it's a different manifestation of this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=952946

A different line of thought is that if ulimit -s is 8192, why are we
not getting 8MB of stack?  But in any case, if we're only going to
get 1944kB, getrlimit ought to tell us that.
        regards, tom lane

Re: 9.4 beta1 crash on Debian sid/i386

From

Andres Freund

Date:

18 May 2014, 21:52:42

On 2014-05-18 17:41:17 -0400, Tom Lane wrote:
> Christoph Berg <cb@df7cb.de> writes:
> > Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de>
> >> Did you measure how large the stack actually was when you got the
> >> SIGBUS? Should be possible to determine that by computing the offset
> >> using some local stack variable in one of the depeest stack frames.
> 
> > Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a
> > process that just got SIGBUS. This seems to be in line with
> > stack_base_ptr = 0xffd1c317 and the fcinfo address in
> 
> OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about
> the available stack depth.  I'd classify that as a kernel bug.  I wonder
> if it's a different manifestation of this issue:
> https://bugzilla.redhat.com/show_bug.cgi?id=952946

That'd explain why I couldn't reproduce it. And I seme to recall some
messages about the hardening stuff in debian accidentally being lost
some time ago. So if that got re-introduced into 9.4... The CFLAGS
certainly indicate that -pie is getting used.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.4 beta1 crash on Debian sid/i386

From

Tom Lane

Date:

18 May 2014, 21:56:55

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-05-18 17:41:17 -0400, Tom Lane wrote:
>> OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about
>> the available stack depth.  I'd classify that as a kernel bug.  I wonder
>> if it's a different manifestation of this issue:
>> https://bugzilla.redhat.com/show_bug.cgi?id=952946

> That'd explain why I couldn't reproduce it. And I seme to recall some
> messages about the hardening stuff in debian accidentally being lost
> some time ago. So if that got re-introduced into 9.4... The CFLAGS
> certainly indicate that -pie is getting used.

Yeah.  Re-reading the Red Hat bug, it seems like an exact match for
this issue.  The dependency on ASLR means that the identical run
might sometimes work and sometimes crash, which would explain why
Christoph was getting less-than-consistent results.

The bad news is that the kernel guys have been ignoring the issue
for over a year.  Dunno if some pressure from the Debian camp would
help raise their priority for this.
        regards, tom lane

Re: 9.4 beta1 crash on Debian sid/i386

From

Andres Freund

Date:

18 May 2014, 21:58:38

On 2014-05-18 23:52:32 +0200, Andres Freund wrote:
> On 2014-05-18 17:41:17 -0400, Tom Lane wrote:
> > Christoph Berg <cb@df7cb.de> writes:
> > > Re: Andres Freund 2014-05-18 <20140518091445.GU23662@alap3.anarazel.de>
> > >> Did you measure how large the stack actually was when you got the
> > >> SIGBUS? Should be possible to determine that by computing the offset
> > >> using some local stack variable in one of the depeest stack frames.
> > 
> > > Looking at /proc/*/maps, the stack is ffb38000-ffd1e000 = 1944kB for a
> > > process that just got SIGBUS. This seems to be in line with
> > > stack_base_ptr = 0xffd1c317 and the fcinfo address in
> > 
> > OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about
> > the available stack depth.  I'd classify that as a kernel bug.  I wonder
> > if it's a different manifestation of this issue:
> > https://bugzilla.redhat.com/show_bug.cgi?id=952946
> 
> That'd explain why I couldn't reproduce it. And I seme to recall some
> messages about the hardening stuff in debian accidentally being lost
> some time ago. So if that got re-introduced into 9.4... The CFLAGS
> certainly indicate that -pie is getting used.

Indeed. If I add -pie to my 32bit vpath's configure invocation it
crashes, too. Not that that helps much to resolve the bug, given it's
been sedentary for a long while :(.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.4 beta1 crash on Debian sid/i386

From

Andres Freund

Date:

18 May 2014, 22:14:21

On 2014-05-18 17:56:48 -0400, Tom Lane wrote:
> The bad news is that the kernel guys have been ignoring the issue
> for over a year.  Dunno if some pressure from the Debian camp would
> help raise their priority for this.

I guess we should forward the bug to the lkml/linux-mm lists. I think a
fair number of people involved in those areas won't read RH bugzilla
without pointed towards it, err, pointedly.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

19 May 2014, 09:18:15

Re: Tom Lane 2014-05-18 <26862.1400449277@sss.pgh.pa.us>
> OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about
> the available stack depth.  I'd classify that as a kernel bug.  I wonder
> if it's a different manifestation of this issue:
> https://bugzilla.redhat.com/show_bug.cgi?id=952946
> 
> A different line of thought is that if ulimit -s is 8192, why are we
> not getting 8MB of stack?  But in any case, if we're only going to
> get 1944kB, getrlimit ought to tell us that.

The issue looks exactly like what you are writing in that bugzilla
bug, including the fact that [stack] in /proc/maps gets replaced by
[heap] once the bus error happens (Comment 11).

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/

Re: 9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

19 May 2014, 11:53:27

Re: To Tom Lane 2014-05-19 <20140519091808.GA7296@msgid.df7cb.de>
> Re: Tom Lane 2014-05-18 <26862.1400449277@sss.pgh.pa.us>
> > OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about
> > the available stack depth.  I'd classify that as a kernel bug.  I wonder
> > if it's a different manifestation of this issue:
> > https://bugzilla.redhat.com/show_bug.cgi?id=952946
> > 
> > A different line of thought is that if ulimit -s is 8192, why are we
> > not getting 8MB of stack?  But in any case, if we're only going to
> > get 1944kB, getrlimit ought to tell us that.
> 
> The issue looks exactly like what you are writing in that bugzilla
> bug, including the fact that [stack] in /proc/maps gets replaced by
> [heap] once the bus error happens (Comment 11).

I've done some more digging. The problem exists also on plain 32bit
kernels, not only 64bit running a 32bit userland. (Tested on Debian
Wheezy's 3.2.57 kernel.)

The problem seems to be that the address layout puts heap and stack
too close together - there's only about 125MB between the start of
heap and the end of stack. Apparently 9.4 is a bit more memory-hungry
on the heap side when running infinite_recurse() so it SIGBUSses
before it reaches the 2MB max_stack_depth. In 9.3 I can easily see the
same problem with max_stack_depth = '7MB', when at the time of the
crash, the stack is 2797568 bytes as reported by /proc/maps, and in
9.1, the crash happens at 3084288. (Both do catch the problem properly
with max_stack_depth = '2MB' at which point 2105344 bytes stack are
allocated.)

Debian/Ubuntu have been using hardened PostgreSQL builds for years
now, including running the regression tests - apparently we were
always close to a crash, it just had not happened yet.

So there's a few points to consider:
* ASLR leaves only 125MB for brk()-style heap plus stack
* RLIMIT_STACK is treated as an upper limit, not a reservation
* PostgreSQL thinks max_stack_depth=2MB plus check_stack_depth() is safe, instead of having a SIGBUS handler
* PostgreSQL allocates lots of heap using brk() instead of mmap()

If any of that wouldn't hold, the problem wouln't appear.

I'm not sure where to go from here. Getting the kernel (or the libc)
changed seems hard, and that would probably only affect future
distributions anyway. A short-term fix might be to reduce
max_stack_depth for the regression tests, which tests the
functionality, but leaves the problem open for production.
Implementing a SIGBUS/SIGSEGV handler would probably mean that the
whole ouch-lets-restart-on-error logic would become ineffective,
unless we go check with address caused the error and decided if it was
part of the stack or not.

An hack would be to touch some address deep in the stack early at
backend start, so the address space would be reserved for the stack.
Though it seems ugly to do that for all backends, not only that are
actually using much stack. (The cost would be one memory page, which
isn't too much, otoh.)

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/

Re: 9.4 beta1 crash on Debian sid/i386

From

Andres Freund

Date:

19 May 2014, 12:43:21

On 2014-05-19 13:53:18 +0200, Christoph Berg wrote:
> I've done some more digging. The problem exists also on plain 32bit
> kernels, not only 64bit running a 32bit userland. (Tested on Debian
> Wheezy's 3.2.57 kernel.)

Too bad.

> Debian/Ubuntu have been using hardened PostgreSQL builds for years
> now, including running the regression tests - apparently we were
> always close to a crash, it just had not happened yet.

There might be some user defined workloads triggering it as well...

> So there's a few points to consider:
> * ASLR leaves only 125MB for brk()-style heap plus stack
> * RLIMIT_STACK is treated as an upper limit, not a reservation
> * PostgreSQL thinks max_stack_depth=2MB plus check_stack_depth() is
>   safe, instead of having a SIGBUS handler
> * PostgreSQL allocates lots of heap using brk() instead of mmap()

* postgres on debian is build with -pie.

> If any of that wouldn't hold, the problem wouln't appear.

> I'm not sure where to go from here. Getting the kernel (or the libc)
> changed seems hard, and that would probably only affect future
> distributions anyway.

Hm, this certainly looks like the kind of bug that should get backported
to -stable et al.

> A short-term fix might be to reduce
> max_stack_depth for the regression tests, which tests the
> functionality, but leaves the problem open for production.
> Implementing a SIGBUS/SIGSEGV handler would probably mean that the
> whole ouch-lets-restart-on-error logic would become ineffective,
> unless we go check with address caused the error and decided if it was
> part of the stack or not.

Meh. I am pretty staunchly set against trying this. This is putting
complex tape over the problem. And we'd have significant problems
discerning the different kinds of SIGBUS errors or such.

Isn't the far more obvious thing ot just not build postgres with -pie on
32bit? It's hardly a security benefit if it allows plain user to crash
the server.
Besides the stack problem, have you measured whether it's viable to use
-pie on 32bit performancewise? That's stuff not that cheap, especially
on 32bit.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.4 beta1 crash on Debian sid/i386

From

Andres Freund

Date:

19 May 2014, 12:46:55

Hi,

On 2014-05-19 13:53:18 +0200, Christoph Berg wrote:
> * PostgreSQL allocates lots of heap using brk() instead of mmap()

It doesn't really do that, btw. It's the libc's mmap that makes those
decisions, not postgres.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.4 beta1 crash on Debian sid/i386

From

Tom Lane

Date:

19 May 2014, 13:53:25

Andres Freund <andres@2ndquadrant.com> writes:
> Isn't the far more obvious thing ot just not build postgres with -pie on
> 32bit? It's hardly a security benefit if it allows plain user to crash
> the server.

Yeah, that's what I was doing when I was at Red Hat --- PIE mode would
be nice, but not when it breaks basic functionality.

I think throwing an error out of a SIGBUS handler is right out.  There
would be no way to know exactly what code we were interrupting.  It's
the same reason we don't let, eg, the SIGALRM handler throw a timeout
error directly (in most places anyway).

>> * PostgreSQL allocates lots of heap using brk() instead of mmap()

> It doesn't really do that, btw. It's the libc's mmap that makes those
> decisions, not postgres.

It occurs to me that maybe this is a glibc bug, not a kernel bug?
        regards, tom lane

Re: 9.4 beta1 crash on Debian sid/i386

From

Andres Freund

Date:

19 May 2014, 14:12:28

On 2014-05-19 09:53:11 -0400, Tom Lane wrote:
> I think throwing an error out of a SIGBUS handler is right out.  There
> would be no way to know exactly what code we were interrupting.  It's
> the same reason we don't let, eg, the SIGALRM handler throw a timeout
> error directly (in most places anyway).

Agreed. I think if we really, really feel the need to do something about
this - which I don't - we could allocate a separate stack very early on
and use that.

> >> * PostgreSQL allocates lots of heap using brk() instead of mmap()
> 
> > It doesn't really do that, btw. It's the libc's mmap that makes those
> > decisions, not postgres.
> 
> It occurs to me that maybe this is a glibc bug, not a kernel bug?

You think malloc() should try to be careful when calling brk() and check
beforehand wether it'll conflict with stack_base + RLIMIT_STACK? That's
not a bad argument, but it still seems a really bad choice to leave that
little space for the heap. Especially when it's dependant on -pie being
used.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

19 May 2014, 14:47:23

Re: Andres Freund 2014-05-19 <20140519141221.GC5098@alap3.anarazel.de>
> On 2014-05-19 09:53:11 -0400, Tom Lane wrote:
> > I think throwing an error out of a SIGBUS handler is right out.  There
> > would be no way to know exactly what code we were interrupting.  It's
> > the same reason we don't let, eg, the SIGALRM handler throw a timeout
> > error directly (in most places anyway).

Right. I just mentioned that for completeness.

> Agreed. I think if we really, really feel the need to do something about
> this - which I don't - we could allocate a separate stack very early on
> and use that.

Hmm, that'd be an extension of the other idea, "write something deep
in the stack on startup". This is probably less evil, though I agree
it's a big hammer for solving something that should probably be fixed
elsewhere.

> > >> * PostgreSQL allocates lots of heap using brk() instead of mmap()
> > 
> > > It doesn't really do that, btw. It's the libc's mmap that makes those
> > > decisions, not postgres.
> > 
> > It occurs to me that maybe this is a glibc bug, not a kernel bug?
> 
> You think malloc() should try to be careful when calling brk() and check
> beforehand wether it'll conflict with stack_base + RLIMIT_STACK? That's
> not a bad argument, but it still seems a really bad choice to leave that
> little space for the heap. Especially when it's dependant on -pie being
> used.

It's probably both, the default ASLR layout providing too little heap,
plus malloc() running into the stack area - I'm not sure if the former
is the kernel's fault or libc/ld.so's, probably they need to work
together on that anyway.

Disabling -pie for all 32bit archs seems to be the way to go for us
now.

Does this topic warrant being mentioned in the docs?

Christoph

Re: 9.4 beta1 crash on Debian sid/i386

From

Christoph Berg

Date:

19 May 2014, 16:53:20

Re: To Tom Lane 2014-05-19 <20140519144717.GG7296@msgid.df7cb.de>
> Disabling -pie for all 32bit archs seems to be the way to go for us
> now.

FTR, I've just had a look at armhf (arm-linux-gnueabihf), the address
layout looks exactly the same there, and 9.3 crashes easily, so it's
really a problem of all Linux 32bit archs. I'm puzzled the regression
tests passed there [1], but anyway, we'll probably get rid of -pie.

[1] https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.3&arch=armhf&ver=9.3.4-1&stamp=1395348483

> Does this topic warrant being mentioned in the docs?

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/

Re: 9.4 beta1 crash on Debian sid/i386

From

Tom Lane

Date:

19 May 2014, 17:32:00

Christoph Berg <cb@df7cb.de> writes:
> FTR, I've just had a look at armhf (arm-linux-gnueabihf), the address
> layout looks exactly the same there, and 9.3 crashes easily, so it's
> really a problem of all Linux 32bit archs. I'm puzzled the regression
> tests passed there [1], but anyway, we'll probably get rid of -pie.

Well, the failure is probabilistic --- it might sometimes pass because
the random address layout is not so tight.
        regards, tom lane