Thread: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)

On Fri, 2009-01-16 at 18:43 -0500, Tom Lane wrote:

> What is happening is that autovacuum_do_vac_analyze contains
> 
>         old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
>         ...
>         vacuum(vacstmt, relids);
>         ...
>         MemoryContextSwitchTo(old_cxt);
> 
> and at the time it is called by process_whole_db, CurrentMemoryContext
> points at TopTransactionContext.  Which gets destroyed because
> vacuum()
> internally finishes that transaction and starts a new one.  When we
> come out of vacuum(), CurrentMemoryContext again points at
> TopTransactionContext, but *its not the same one*.  The closing
> MemoryContextSwitchTo is installing a stale pointer, which then
> remains active into CommitTransaction.  It's a wonder this code ever
> works.

Can we add something to memory contexts to make this fail every time?

Seems like we should be able to detect this class of error. If we can't
it seems likely that a number of similar cases exist with other
contexts.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Simon Riggs <simon@2ndQuadrant.com> writes:
> Can we add something to memory contexts to make this fail every time?

No, not really.  AFAICS the reason for Alvaro not seeing it must be that
on his machine the new transaction happens to allocate its
TopTransactionContext control block right in the same place where the
old one was.

We could have a debugging mode in which pfree'd space is never recycled
for reuse (just filled with 0xdeadbeef or whatever and left to sit).
But it would not be practical for anything except short debugging
sessions.  In fact, for a case like this, only *very* short debugging
sessions, because you couldn't even recycle after transaction commit.
So the chance of finding problems in seldom-exercised code paths, as
this one is, would be pretty small anyway.
        regards, tom lane


On Sat, 2009-01-17 at 11:35 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Can we add something to memory contexts to make this fail every time?
> 
> No, not really.  AFAICS the reason for Alvaro not seeing it must be that
> on his machine the new transaction happens to allocate its
> TopTransactionContext control block right in the same place where the
> old one was.

Can we put a identifier into header of each control block, an ascending
value that is unlikely duplicate between calls? That way we'd be able to
tell immediately it wasn't the same block, even if it is in the same
place. (In assert mode only, of course).

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Can we add something to memory contexts to make this fail every time?
>
> No, not really.  AFAICS the reason for Alvaro not seeing it must be that
> on his machine the new transaction happens to allocate its
> TopTransactionContext control block right in the same place where the
> old one was.

But freed memory is clobbered, so if we were to have an assert that
checks the node tag, it should show up.  In fact, we do have such an
assert, but only for compilers other than GCC, because the inline
version of palloc() cannot have it for lack of infrastructure.

Maybe we should patch palloc.h so that it only uses the inline version
when not in assert mode.  Something like the attached, except that for
some reason the test case still fails to report any error for me :-(
Maybe the cpp boolean logic is fooling me.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Can we add something to memory contexts to make this fail every time?
>
> No, not really.  AFAICS the reason for Alvaro not seeing it must be that
> on his machine the new transaction happens to allocate its
> TopTransactionContext control block right in the same place where the
> old one was.
>
> We could have a debugging mode in which pfree'd space is never recycled
> for reuse (just filled with 0xdeadbeef or whatever and left to sit).

Hm, I wonder how much more practical it would be if we recycled it but offset
the pointer by maxalign. We would waste 4/8 bytes per palloc/free cycle
instead of the whole chunk.

(Whether palloc could actually do this would be another question, there are
lots of allocator algorithms that wouldn't be able to, I think.)

If we had a more formalized save_memory_context()/restore_memory_context()
which gave you more than just a pointer we could do better for this particular
case. save_memory_context() could hand you a struct with a pointer as well as
some sanity check values about the context you're saving.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


Simon Riggs <simon@2ndQuadrant.com> writes:
> Can we put a identifier into header of each control block, an ascending
> value that is unlikely duplicate between calls? That way we'd be able to
> tell immediately it wasn't the same block,

Same block than what?  Unless you can somehow hide that ID number in
a MemoryContext pointer, the staleness of the pointer won't be evident.
        regards, tom lane


Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> No, not really.  AFAICS the reason for Alvaro not seeing it must be that
>> on his machine the new transaction happens to allocate its
>> TopTransactionContext control block right in the same place where the
>> old one was.

> But freed memory is clobbered, so if we were to have an assert that
> checks the node tag, it should show up.  In fact, we do have such an
> assert, but only for compilers other than GCC, because the inline
> version of palloc() cannot have it for lack of infrastructure.

Well, but production installations don't have either memory clobbering
or Asserts, so fooling with that wouldn't have helped anyway.  I suspect
what really happened here is that the bug was created by some late
change during 8.1 development, and nobody ever exercised the
anti-wraparound code path after that in an assert-enabled build :-(
In a non-assert build there's a fairly good chance that it'd still
work because the context header would still be there undamaged.

One thing we could try that would cost a lot less than de-inlining
palloc is to have MemoryContextDelete intentionally zero the methods
pointer.  That still does nothing for the create-new-context-in-same-
spot issue, but at least it would catch palloc in a context that had
been deleted and not yet recycled.
        regards, tom lane


Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:

> > But freed memory is clobbered, so if we were to have an assert that
> > checks the node tag, it should show up.  In fact, we do have such an
> > assert, but only for compilers other than GCC, because the inline
> > version of palloc() cannot have it for lack of infrastructure.
> 
> Well, but production installations don't have either memory clobbering
> or Asserts, so fooling with that wouldn't have helped anyway.  I suspect
> what really happened here is that the bug was created by some late
> change during 8.1 development, and nobody ever exercised the
> anti-wraparound code path after that in an assert-enabled build :-(
> In a non-assert build there's a fairly good chance that it'd still
> work because the context header would still be there undamaged.

Well, my builds are all assert-enabled, and I still wasn't able to make
it crash in any way (the new context being allocated in the same
position as the old one is the only explanation I have, but I did not
investigate whether that's what happening).  Maybe Greg Stark's idea of
offsetting pointers returned by palloc could have helped to find the
problem from the outset.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support