Thread: RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From
"Mikheev, Vadim"
Date:
> > I've tried to move "dangerous" ops with non-zero probability of
> > elog(ERROR) (eg new file block allocation) out of crit sections.
> > Anyway we need in ERROR-->STOP for safety when changes 
> > aren't logged.
> 
> Why is that safer than just treating an ERROR as an ERROR?  
> It seems to me there's a real risk of a crash/restart loop if we
> force a restart whenever we see an xlog-related problem.

Why don't we elog(ERROR) in assert checking but abort?
Consider elog(STOP) on any errors inside critical sections
as assert checking. Rule is simple - validate operation before
applying it to permanent storage - and it's better to force
any future development to follow this rule by any means.
It's very easy to don't notice ERROR - it's just transaction
abort and transaction abort is normal thing, - but errors inside
critical sections are *unexpected* things which mean that something
totally wrong in code.

As for crash/restart loop, Hiroshi rised this issue ~month ago and I
was going to avoid elog(STOP) in AM-specific redo functions and do
elog(LOG) instead, wherever possible, but was busy with CRC/backup stuff
- ok, I'll look there soon.

Vadim


Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> It's very easy to don't notice ERROR - it's just transaction
> abort and transaction abort is normal thing, - but errors inside
> critical sections are *unexpected* things which mean that something
> totally wrong in code.

Okay.  That means we do need two kinds of critical sections, then,
because the crit sections I've just sprinkled everywhere are not that
critical ;-).  They just want to hold off cancel/die interrupts.

I'll take care of fixing what I broke, but does anyone have suggestions
for good names for the two concepts?  The best I could come up with
offhand is BEGIN/END_CRIT_SECTION and BEGIN/END_SUPER_CRIT_SECTION,
but I'm not pleased with that... Ideas?
        regards, tom lane


Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From
Thomas Swan
Date:
>I'll take care of fixing what I broke, but does anyone have suggestions
>for good names for the two concepts?  The best I could come up with
>offhand is BEGIN/END_CRIT_SECTION and BEGIN/END_SUPER_CRIT_SECTION,
>but I'm not pleased with that... Ideas?

Let CRITICAL be critical.  If the other section are there just to be 
cautious.  Then the name should represent that.  While I like the 
BEGIN/END_OH_MY_GOD_IF_THIS_GETS_INTERRUPTED_YOU_DONT_WANT_TO_KNOW 
markers.. They are a little hard to work with.

Possibly try demoting the NON_CRITICAL_SECTIONS to something like the 
following.

BEGIN/END_CAUTION_SECTION,
BEGIN/END_WATCH_SECTION