Ah-hah, I see the problem: EndPortalAllocMode() - Mailing list pgsql-hackers

From Tom Lane
Subject Ah-hah, I see the problem: EndPortalAllocMode()
Date
Msg-id 6303.934582673@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I discovered that I could reproduce the coredump Oliver and Tony were
talking about by the simple expedient of removing pg_vlock manually
while vacuum is running.  Armed with a debugger it didn't take long to
find out what's going wrong:

(a) vacuum.c does a CommitTransaction to commit its final table's   worth of fixes.

(b) CommitTransaction calls EndPortalAllocMode.

(c) vacuum calls vc_shutdown, which tries to remove pg_vlock,   and reports an error when the unlink() call fails.

(d) during error cleanup, AbortTransaction is called.

(e) AbortTransaction calls EndPortalAllocMode.  There has been no   intervening StartPortalAllocMode, so the portal's
contextstack   is empty.  EndPortalAllocMode tries to free a nonexistent memory   context (or, if you have asserts
turnedon, dies with an assert   failure).  Ka-boom.
 

It seems to me that EndPortalAllocMode ought to be a little more
forgiving of being called when the portal's context stack is empty.
Otherwise, it is unsafe to call elog() from anywhere except within
a transaction, because any attempt to abort a non-existent transaction
*will* coredump in this code.

However, I'd like confirmation from someone who knows portalmem.c
a little better that this is a good change to make.  Is there a
better way?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Aborted Transaction During Vacuum
Next
From: Oleg Bartunov
Date:
Subject: How to get 'psql -q' runs really quiet ?