Re: gharial segfaulting on REL_12_STABLE only - Mailing list pgsql-hackers

From Tom Lane
Subject Re: gharial segfaulting on REL_12_STABLE only
Date
Msg-id 3067.1566870481@sss.pgh.pa.us
Whole thread Raw
In response to gharial segfaulting on REL_12_STABLE only  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: gharial segfaulting on REL_12_STABLE only
List pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> This is apparently an EDB-owned machine but I have no access to it
> currently (I could ask if necessary).  For some reason it's been
> failing for a week, but only on REL_12_STABLE, with this in the log:

Yeah, I've been puzzling over that to little avail.

> It's hard to see how cdc8d371e2, the only non-doc commit listed on the
> first failure, could have anything to do with that.

Exactly :-(.  It seems completely reproducible since then, but how
could that have triggered a failure over here?  And why only in this
branch?  The identical patch went into HEAD.

> 2019-08-20 04:31:48.886 MDT [13421:4] LOG:  server process (PID 13871)
> was terminated by signal 11: unrecognized signal
> 2019-08-20 04:31:48.886 MDT [13421:5] DETAIL:  Failed process was
> running: SET default_table_access_method = '';

> Apparently HPUX's sys_siglist doesn't recognise that most popular of
> signals, 11, but by googling I see that it has its traditional meaning
> there.

HPUX hasn't *got* sys_siglist, nor strsignal() which is what we're
actually relying on these days (cf. pgstrsignal.c).  I was puzzled
by that too to start with, though.  I wonder if we shouldn't rearrange
pg_strsignal so that the message in the !HAVE_STRSIGNAL case is
something like "signal names not available on this platform" rather
than something that looks like we should've recognized it and didn't.

> 2019-08-20 04:31:22.422 MDT [13871:34] pg_regress/create_am LOG:
> statement: SET default_table_access_method = '';

> Perhaps it was really running the next statement.

Hard to see how, because this should have reported

ERROR:  invalid value for parameter "default_table_access_method": ""
DETAIL:  default_table_access_method cannot be empty.

but it didn't get that far.  It seems like it must have died either
in the (utterly trivial) check that leads to the above-quoted
complaint, or somewhere in the ereport mechanism.  Neither theory
seems very credible.

The seeming action-at-a-distance nature of the failure has me
speculating about compiler or linker bugs, but I dislike
jumping to that type of conclusion without hard evidence.

A stack trace would likely be really useful right about now.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: gharial segfaulting on REL_12_STABLE only
Next
From: Tom Lane
Date:
Subject: Re: old_snapshot_threshold vs indexes