"Make" versus effective stack limit in regression tests - Mailing list pgsql-hackers

From Tom Lane
Subject "Make" versus effective stack limit in regression tests
Date
Msg-id 2549.1288993553@sss.pgh.pa.us
Whole thread Raw
Responses Re: "Make" versus effective stack limit in regression tests
List pgsql-hackers
I wondered why some of the buildfarm machines were showing
max_stack_depth = 100kB, and Andrew Dunstan kindly lent me the
use of "dungbeetle" to check it out.  What I found out:

1. max_stack_depth has the expected value (equal to ulimit -s)
in any manually started postmaster.  It only drops to 100kB
in the "make check" environment.

2. postgres.c's getrlimit(RLIMIT_STACK) call returns the expected
values in a manual start:rlim_cur = 10485760 rlim_max = -1
but in a "make check" run:rlim_cur = -1 rlim_max = -1
ie, the soft limit has been reset to RLIM_INFINITY.
get_stack_depth_rlimit chooses to treat this as "unknown", resulting
in setting max_stack_depth to the minimal value.

3. Further experimentation proves that "make" is resetting the limit
for any program it invokes.

I couldn't reproduce this on my Fedora 13 machine, even though it
is nominally running the same gmake 3.81 as dungbeetle's Fedora 6.
So I took a look into Fedora git, and sure enough, there's a relevant
patch there.  It seems that gmake 3.81 tries to force up the
RLIMIT_STACK rlim_cur to rlim_max, because it relies heavily on alloca()
and so needs lots of stack space.  Fedora 7 and up have patched it to
restore the caller's setting before actually invoking any programs:
https://bugzilla.redhat.com/show_bug.cgi?id=214033

I haven't done the research to find out which gmake versions have this
behavior or which other Linux distros are carrying similar patches,
but I'm sure this explains why some of the buildfarm members report
max_stack_depth = 100kB when most others don't.

Anyway, what this points up is that we are making a very conservative
assumption about what to do when getrlimit() returns RLIM_INFINITY.
It does not seem real reasonable to interpret that as 100kB on any
modern platform.  I'm inclined to interpret it as 4MB, which is the
same default stack limit that we use on Windows.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: CREATE CONSTRAINT TRIGGER
Next
From: Greg Smith
Date:
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+