Re: ERROR during end-of-xact/FATAL - Mailing list pgsql-hackers

From Robert Haas
Subject Re: ERROR during end-of-xact/FATAL
Date
Msg-id CA+TgmoZYMstaWAjH2AfNH5NHNWeBgcBQgeZVWu=Z2outyjx4FA@mail.gmail.com
Whole thread Raw
In response to Re: ERROR during end-of-xact/FATAL  (Noah Misch <noah@leadboat.com>)
Responses Re: ERROR during end-of-xact/FATAL  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Wed, Nov 13, 2013 at 11:04 AM, Noah Misch <noah@leadboat.com> wrote:
>> So, in short, ERROR + ERROR*10 = PANIC, but FATAL + ERROR*10 = FATAL.
>> That's bizarre.
>
> Quite so.
>
>> Given that that's where we are, promoting an ERROR during FATAL
>> processing to PANIC doesn't seem like it's losing much; we're
>> essentially already doing that in the (probably more likely) case of a
>> persistent ERROR during ERROR processing.  But since PANIC sucks, I'd
>> rather go the other direction: let's make an ERROR during ERROR
>> processing promote to FATAL.  And then let's do what you write above:
>> make sure that there's a separate on-shmem-exit callback for each
>> critical shared memory resource and that we call of those during FATAL
>> processing.
>
> Many of the factors that can cause AbortTransaction() to fail can also cause
> CommitTransaction() to fail, and those would still PANIC if the transaction
> had an xid.  How practical might it be to also escape from an error during
> CommitTransaction() with a FATAL instead of PANIC?  There's more to fix up in
> that case (sinval, NOTIFY), but it may be within reach.  If such a technique
> can only reasonably fix abort, though, I have doubts it buys us enough.

The critical stuff that's got to happen after
RecordTransactionCommit() appears to be ProcArrayEndTransaction() and
AtEOXact_Inval(). Unfortunately, the latter is well after the point
when we're supposed to only be doing "non-critical resource cleanup",
nonwithstanding which it appears to be critical.

So here's a sketch.  Hoist the preparatory logic in
RecordTransactionCommit() - smgrGetPendingDeletes,
xactGetCommittedChildren, and xactGetCommittedInvalidationMessages up
into the caller and do it before setting TRANS_COMMIT.  If any of that
stuff fails, we'll land in AbortTransaction() which must cope.  As
soon as we exit the commit critical section, set a flag somewhere
(where?) indicating that we have written our commit record; when that
flag is set, (a) promote any ERROR after that point through the end of
commit cleanup to FATAL and (b) if we enter AbortTransaction(), don't
try to RecordTransactionAbort().

I can't see that the notification stuff requires fixup in this case;
AFAICS, it is just adjusting backend-local state, and it's OK to
disregard any problems there during a FATAL exit.  Do you see
something to the contrary?  But invalidation messages are a problem:
if we commit and exit without sending our queued-up invalidation
messages, Bad Things Will Happen.  Perhaps we could arrange things so
that in that case only, we just PANIC.   That would allow most write
transactions to get by with FATAL, promoting to PANIC only in the case
of transactions that have modified system catalogs and only until the
invalidations have actually been sent.  Avoiding the PANIC in that
case seems to require some additional wizardry which is not entirely
clear to me at this time.

I think we'll have to approach the various problems in this area
stepwise, or we'll never make any progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: strncpy is not a safe version of strcpy
Next
From: Merlin Moncure
Date:
Subject: Re: Proof of concept: standalone backend with full FE/BE protocol