Thread: We've broken something in error recovery

We've broken something in error recovery

From
Tom Lane
Date:
In a somewhat misguided attempt to test something else, I did this in
CVS HEAD:
do $$beginfor i in 1 .. 10000 loop  execute 'create table t' || i::text || ' (f1 int primary key)';end loop;end$$;

This ran for awhile and then ran out of lock table space, which was
not surprising in hindsight:

ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

But what was surprising was what happened next: the autovac launcher
immediately crashed.

TRAP: FailedAssertion("!(nestLevel > 0 && nestLevel <= GUCNestLevel)", File: "guc.c", Line: 3907)
LOG:  autovacuum launcher process (PID 25220) was terminated by signal 6

Stack trace looks like

#4  0x4e85b4 in ExceptionalCondition (   conditionName=0x1ac4ac "!(nestLevel > 0 && nestLevel <= GUCNestLevel)",
errorType=0x1abf44"FailedAssertion", fileName=0x1abee4 "guc.c",    lineNumber=3907) at assert.c:57
 
#5  0x501f48 in AtEOXact_GUC (isCommit=-86 '�', nestLevel=84) at guc.c:3907
#6  0x20618c in AbortTransaction () at xact.c:2194
#7  0x20688c in AbortCurrentTransaction () at xact.c:2568
#8  0x3b0f84 in AutoVacLauncherMain (argc=2063670312, argv=0x7b03b94c)   at autovacuum.c:491
#9  0x3b0bd8 in StartAutoVacLauncher () at autovacuum.c:371

Haven't dug any deeper yet --- who's touched this code lately?
        regards, tom lane


Re: We've broken something in error recovery

From
Andrew Dunstan
Date:

Tom Lane wrote:
> #4  0x4e85b4 in ExceptionalCondition (
>     conditionName=0x1ac4ac "!(nestLevel > 0 && nestLevel <= GUCNestLevel)", 
>     errorType=0x1abf44 "FailedAssertion", fileName=0x1abee4 "guc.c", 
>     lineNumber=3907) at assert.c:57
> #5  0x501f48 in AtEOXact_GUC (isCommit=-86 'ª', nestLevel=84) at guc.c:3907
> #6  0x20618c in AbortTransaction () at xact.c:2194
>
>   

This looks like maybe a corrupted stack - the args to AtEOXact_GUC at 
that location in xact.c are hardwired.

cheers

andrew


Re: We've broken something in error recovery

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> #5  0x501f48 in AtEOXact_GUC (isCommit=-86 '�', nestLevel=84) at guc.c:3907

> This looks like maybe a corrupted stack - the args to AtEOXact_GUC at 
> that location in xact.c are hardwired.

No, that's just a fairly typical behavior of debugging with -O greater
than zero --- the registers holding those parameter values got recycled
for something else.  This is a rather old version of gdb and it doesn't
always print <<value optimized away>> when it should.
        regards, tom lane


Re: We've broken something in error recovery

From
Tom Lane
Date:
I wrote:
> #4  0x4e85b4 in ExceptionalCondition (
>     conditionName=0x1ac4ac "!(nestLevel > 0 && nestLevel <= GUCNestLevel)", 
>     errorType=0x1abf44 "FailedAssertion", fileName=0x1abee4 "guc.c", 
>     lineNumber=3907) at assert.c:57
> #5  0x501f48 in AtEOXact_GUC (isCommit=-86 '�', nestLevel=84) at guc.c:3907
> #6  0x20618c in AbortTransaction () at xact.c:2194
> #7  0x20688c in AbortCurrentTransaction () at xact.c:2568
> #8  0x3b0f84 in AutoVacLauncherMain (argc=2063670312, argv=0x7b03b94c)
>     at autovacuum.c:491

On investigation I think that Assert may just be overenthusiastic.
The problem is that StartTransaction is failing at
VirtualXactLockTableInsert, for lack of any shared memory to acquire
the lock with; and then we try to do AbortTransaction and GUC is
unhappy because it's not been initialized yet.  So this isn't a
new bug at all, it's been there awhile ...
        regards, tom lane