Thread: fatal flex error in guc-file.l kills the postmaster

fatal flex error in guc-file.l kills the postmaster

From
Noah Misch
Date:
While testing the configuration include directory patch, I decided to try
the regular postgresql.conf "include" on a directory:

  include 'base'

Reloading the configuration then rather rudely kills the postmaster:

  FATAL:  input in flex scanner failed

Setting aside whether we should offer a better diagnostic when a user points
"include" at a directory, the yy_fatal_error hack that made sense for scan.l
doesn't cut it for guc-file.l.  Like other guc-file.l errors, we need to
choose the elevel based on which process is running the code.  ERROR is never
the right choice.  We should instead stash the message, longjmp out of the
flex parser, and proceed to handle the error mostly like a regular syntax
error.  See attached small patch.

It seems plainly hackish to mutate the body of yy_fatal_error() by #define'ing
fprintf().  Why didn't we instead #define YY_FATAL_ERROR and provide our own
complete replacement?  (Both strategies are undocumented.)  I haven't changed
that decision in this patch, though.

On a related note, the out-of-memory handling during config file reload is
inconsistent.  guc-file.l uses palloc/pstrdup in various places.  An OOM at
those sites during a reload would kill the postmaster.  guc.c has functions
like guc_malloc that handle variable elevels.  However, various check_* and
assign_* functions, such as check_log_destination(), use pstrdup() and
guc_malloc(ERROR).  Failing there during a reload would also kill the
postmaster.  Is it worth doing better here?

Incidentally, the wal writer process no longer exits promptly on postmaster
death.  I didn't look into that further.

Thanks,
nm

Attachment

Re: fatal flex error in guc-file.l kills the postmaster

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> Setting aside whether we should offer a better diagnostic when a user points
> "include" at a directory, the yy_fatal_error hack that made sense for scan.l
> doesn't cut it for guc-file.l.  Like other guc-file.l errors, we need to
> choose the elevel based on which process is running the code.  ERROR is never
> the right choice.  We should instead stash the message, longjmp out of the
> flex parser, and proceed to handle the error mostly like a regular syntax
> error.  See attached small patch.

Well, if you're going to criticize the original method as being hackish,
you really need to offer a cleaner substitute than this one ;-).  In
particular I'm not happy with adding a sigsetjmp() cycle for every
single token parsed.

> It seems plainly hackish to mutate the body of yy_fatal_error() by #define'ing
> fprintf().

I agree, but the flex boys haven't provided any nicer API ...
at least not unless we'd like to convert to C++.

> Why didn't we instead #define YY_FATAL_ERROR and provide our own
> complete replacement?

For one reason, yy_fatal_error would still be there and would draw an
"unused static function" warning.

> On a related note, the out-of-memory handling during config file reload is
> inconsistent.  guc-file.l uses palloc/pstrdup in various places.  An OOM at
> those sites during a reload would kill the postmaster.

If you've got OOM in the postmaster, you're dead anyway.  I feel no
compulsion to worry about this.

            regards, tom lane

Re: fatal flex error in guc-file.l kills the postmaster

From
Noah Misch
Date:
On Sat, Dec 17, 2011 at 11:04:43PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Setting aside whether we should offer a better diagnostic when a user points
> > "include" at a directory, the yy_fatal_error hack that made sense for scan.l
> > doesn't cut it for guc-file.l.  Like other guc-file.l errors, we need to
> > choose the elevel based on which process is running the code.  ERROR is never
> > the right choice.  We should instead stash the message, longjmp out of the
> > flex parser, and proceed to handle the error mostly like a regular syntax
> > error.  See attached small patch.
>
> Well, if you're going to criticize the original method as being hackish,
> you really need to offer a cleaner substitute than this one ;-).  In
> particular I'm not happy with adding a sigsetjmp() cycle for every
> single token parsed.

On the contrary, I want to make it even more of a hack to get better behavior!

Here's a version that calls sigsetjmp() once per file.  While postgresql.conf
scanning hardly seems like the place to be shaving cycles, this does catch
fatal errors in functions other than yylex(), notably yy_create_buffer().

> > On a related note, the out-of-memory handling during config file reload is
> > inconsistent.  guc-file.l uses palloc/pstrdup in various places.  An OOM at
> > those sites during a reload would kill the postmaster.
>
> If you've got OOM in the postmaster, you're dead anyway.  I feel no
> compulsion to worry about this.

Works for me.

Thanks,
nm

Attachment

Re: fatal flex error in guc-file.l kills the postmaster

From
Robert Haas
Date:
On Sun, Dec 18, 2011 at 11:53 AM, Noah Misch <noah@leadboat.com> wrote:
> Here's a version that calls sigsetjmp() once per file. =A0While postgresq=
l.conf
> scanning hardly seems like the place to be shaving cycles, this does catch
> fatal errors in functions other than yylex(), notably yy_create_buffer().

This strikes me as more clever than necessary:

#define fprintf(file, fmt, msg) \
    0; /* eat cast to void */ \
    GUC_flex_fatal_errmsg =3D msg; \
    siglongjmp(*GUC_flex_fatal_jmp, 1)

Can't we just define a function jumpoutoftheparser() here and call
that function rather than doing that /* eat cast to void */ hack?
That would also involve fewer assumptions about the coding style
emitted by flex.  For example, if flex chose to do something like:

  if (bumpity) fprintf(__FILE__, "%s", "dinglewump");

...the proposed definition would be a disaster.  I doubt that inlining
is a material performance benefit here; siglongjmp() can't be all that
cheap, and the error path shouldn't be all that frequent.

Instead of making ParseConfigFp responsible for restoring
GUC_flex_fatal_jmp after calling anything that might recursively call
ParseConfigFp, I think it would make more sense to define it to stash
away the previous value and restore it on exit.  That way it wouldn't
need to know which of the things that it calls might in turn
recursively call it, which seems likely to reduce the chances of
present or future bugs.  A few comments about whichever way we go with
it seem like a good idea, too.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fatal flex error in guc-file.l kills the postmaster

From
Noah Misch
Date:
On Mon, Jan 16, 2012 at 08:54:53PM -0500, Robert Haas wrote:
> On Sun, Dec 18, 2011 at 11:53 AM, Noah Misch <noah@leadboat.com> wrote:
> > Here's a version that calls sigsetjmp() once per file. ?While postgresql.conf
> > scanning hardly seems like the place to be shaving cycles, this does catch
> > fatal errors in functions other than yylex(), notably yy_create_buffer().
>
> This strikes me as more clever than necessary:
>
> #define fprintf(file, fmt, msg) \
>     0; /* eat cast to void */ \
>     GUC_flex_fatal_errmsg = msg; \
>     siglongjmp(*GUC_flex_fatal_jmp, 1)
>
> Can't we just define a function jumpoutoftheparser() here and call
> that function rather than doing that /* eat cast to void */ hack?
> That would also involve fewer assumptions about the coding style
> emitted by flex.  For example, if flex chose to do something like:
>
>   if (bumpity) fprintf(__FILE__, "%s", "dinglewump");
>
> ...the proposed definition would be a disaster.  I doubt that inlining
> is a material performance benefit here; siglongjmp() can't be all that
> cheap, and the error path shouldn't be all that frequent.
>
> Instead of making ParseConfigFp responsible for restoring
> GUC_flex_fatal_jmp after calling anything that might recursively call
> ParseConfigFp, I think it would make more sense to define it to stash
> away the previous value and restore it on exit.  That way it wouldn't
> need to know which of the things that it calls might in turn
> recursively call it, which seems likely to reduce the chances of
> present or future bugs.  A few comments about whichever way we go with
> it seem like a good idea, too.

Agreed on all points.  I also changed the save/restore of ConfigFileLineno to
work the same way.  New version attached.

Thanks,
nm

Attachment

Re: fatal flex error in guc-file.l kills the postmaster

From
Robert Haas
Date:
On Tue, Jan 17, 2012 at 1:23 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Jan 16, 2012 at 08:54:53PM -0500, Robert Haas wrote:
>> On Sun, Dec 18, 2011 at 11:53 AM, Noah Misch <noah@leadboat.com> wrote:
>> > Here's a version that calls sigsetjmp() once per file. ?While postgres=
ql.conf
>> > scanning hardly seems like the place to be shaving cycles, this does c=
atch
>> > fatal errors in functions other than yylex(), notably yy_create_buffer=
().
>>
>> This strikes me as more clever than necessary:
>>
>> #define fprintf(file, fmt, msg) \
>> =A0 =A0 0; /* eat cast to void */ \
>> =A0 =A0 GUC_flex_fatal_errmsg =3D msg; \
>> =A0 =A0 siglongjmp(*GUC_flex_fatal_jmp, 1)
>>
>> Can't we just define a function jumpoutoftheparser() here and call
>> that function rather than doing that /* eat cast to void */ hack?
>> That would also involve fewer assumptions about the coding style
>> emitted by flex. =A0For example, if flex chose to do something like:
>>
>> =A0 if (bumpity) fprintf(__FILE__, "%s", "dinglewump");
>>
>> ...the proposed definition would be a disaster. =A0I doubt that inlining
>> is a material performance benefit here; siglongjmp() can't be all that
>> cheap, and the error path shouldn't be all that frequent.
>>
>> Instead of making ParseConfigFp responsible for restoring
>> GUC_flex_fatal_jmp after calling anything that might recursively call
>> ParseConfigFp, I think it would make more sense to define it to stash
>> away the previous value and restore it on exit. =A0That way it wouldn't
>> need to know which of the things that it calls might in turn
>> recursively call it, which seems likely to reduce the chances of
>> present or future bugs. =A0A few comments about whichever way we go with
>> it seem like a good idea, too.
>
> Agreed on all points. =A0I also changed the save/restore of ConfigFileLin=
eno to
> work the same way. =A0New version attached.

OK, committed.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company