Thread: Ah-hah, I see the problem: EndPortalAllocMode()

Ah-hah, I see the problem: EndPortalAllocMode()

From
Tom Lane
Date:
I discovered that I could reproduce the coredump Oliver and Tony were
talking about by the simple expedient of removing pg_vlock manually
while vacuum is running.  Armed with a debugger it didn't take long to
find out what's going wrong:

(a) vacuum.c does a CommitTransaction to commit its final table's   worth of fixes.

(b) CommitTransaction calls EndPortalAllocMode.

(c) vacuum calls vc_shutdown, which tries to remove pg_vlock,   and reports an error when the unlink() call fails.

(d) during error cleanup, AbortTransaction is called.

(e) AbortTransaction calls EndPortalAllocMode.  There has been no   intervening StartPortalAllocMode, so the portal's
contextstack   is empty.  EndPortalAllocMode tries to free a nonexistent memory   context (or, if you have asserts
turnedon, dies with an assert   failure).  Ka-boom.
 

It seems to me that EndPortalAllocMode ought to be a little more
forgiving of being called when the portal's context stack is empty.
Otherwise, it is unsafe to call elog() from anywhere except within
a transaction, because any attempt to abort a non-existent transaction
*will* coredump in this code.

However, I'd like confirmation from someone who knows portalmem.c
a little better that this is a good change to make.  Is there a
better way?
        regards, tom lane