Re: Getting server crash after running sqlsmith - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Getting server crash after running sqlsmith
Date
Msg-id CA+Tgmobz6AHtQVqHQa-CCm4_yWygZ8HC0KUMBUw63583KiyypA@mail.gmail.com
Whole thread Raw
In response to Getting server crash after running sqlsmith  (tushar <tushar.ahuja@enterprisedb.com>)
List pgsql-hackers
On Tue, Mar 28, 2017 at 9:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Hm ... I don't see a crash here, but I wonder whether you have parameters
>>> set that would cause this query to be run as a parallel query?  Because
>>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
>>> probably insane.
>
>> /me blinks
>
>> Uh, what's insane about that?  All it does is test a GUC (which is
>> surely parallel-safe) and call SendPostmasterSignal (which seems safe,
>> too).
>
> Well, if you don't like that theory, what's yours?

Gremlins?

The stack trace seems to show that the process is receiving SIGUSR1 at
a very high rate.  Every time sigusr1_handler() reaches
PG_SETMASK(&UnBlockSig), it immediately gets a SIGUSR1 and jumps back
into sigusr1_handler().  Now, this seems like a design flaw in
sigusr1_handler().  Likely the operating system blocks SIGUSR1 on
entry to the signal handler so that it's not possible for a high rate
of signal delivery to blow out the stack, but we forcibly unblock it
before returning, thus exposing ourselves to blowing out the stack.
And we have, apparently, no stack depth check here nor any other way
of preventing the infinite recursion.

I imagine here the behavior is platform-dependent, but I'd guess that
select pg_current_logfile() from generate_series(1,1000000) g might
reproduce this on affected platforms with or without parallel query in
the mix.  It looks like we've conveniently provided both a function
that can be used to SIGUSR1 the heck out of the postmaster and a
postmaster that is, at least on such platforms, vulnerable to crashing
if you do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Getting server crash after running sqlsmith
Next
From: Peter Eisentraut
Date:
Subject: Re: cast result of copyNode()