Thread: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Simon Riggs
Date:
On Fri, 2009-01-16 at 18:43 -0500, Tom Lane wrote: > What is happening is that autovacuum_do_vac_analyze contains > > old_cxt = MemoryContextSwitchTo(AutovacMemCxt); > ... > vacuum(vacstmt, relids); > ... > MemoryContextSwitchTo(old_cxt); > > and at the time it is called by process_whole_db, CurrentMemoryContext > points at TopTransactionContext. Which gets destroyed because > vacuum() > internally finishes that transaction and starts a new one. When we > come out of vacuum(), CurrentMemoryContext again points at > TopTransactionContext, but *its not the same one*. The closing > MemoryContextSwitchTo is installing a stale pointer, which then > remains active into CommitTransaction. It's a wonder this code ever > works. Can we add something to memory contexts to make this fail every time? Seems like we should be able to detect this class of error. If we can't it seems likely that a number of similar cases exist with other contexts. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Re: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > Can we add something to memory contexts to make this fail every time? No, not really. AFAICS the reason for Alvaro not seeing it must be that on his machine the new transaction happens to allocate its TopTransactionContext control block right in the same place where the old one was. We could have a debugging mode in which pfree'd space is never recycled for reuse (just filled with 0xdeadbeef or whatever and left to sit). But it would not be practical for anything except short debugging sessions. In fact, for a case like this, only *very* short debugging sessions, because you couldn't even recycle after transaction commit. So the chance of finding problems in seldom-exercised code paths, as this one is, would be pretty small anyway. regards, tom lane
Re: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Simon Riggs
Date:
On Sat, 2009-01-17 at 11:35 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > Can we add something to memory contexts to make this fail every time? > > No, not really. AFAICS the reason for Alvaro not seeing it must be that > on his machine the new transaction happens to allocate its > TopTransactionContext control block right in the same place where the > old one was. Can we put a identifier into header of each control block, an ascending value that is unlikely duplicate between calls? That way we'd be able to tell immediately it wasn't the same block, even if it is in the same place. (In assert mode only, of course). -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Re: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Alvaro Herrera
Date:
Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > Can we add something to memory contexts to make this fail every time? > > No, not really. AFAICS the reason for Alvaro not seeing it must be that > on his machine the new transaction happens to allocate its > TopTransactionContext control block right in the same place where the > old one was. But freed memory is clobbered, so if we were to have an assert that checks the node tag, it should show up. In fact, we do have such an assert, but only for compilers other than GCC, because the inline version of palloc() cannot have it for lack of infrastructure. Maybe we should patch palloc.h so that it only uses the inline version when not in assert mode. Something like the attached, except that for some reason the test case still fails to report any error for me :-( Maybe the cpp boolean logic is fooling me. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Re: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Gregory Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Simon Riggs <simon@2ndQuadrant.com> writes: >> Can we add something to memory contexts to make this fail every time? > > No, not really. AFAICS the reason for Alvaro not seeing it must be that > on his machine the new transaction happens to allocate its > TopTransactionContext control block right in the same place where the > old one was. > > We could have a debugging mode in which pfree'd space is never recycled > for reuse (just filled with 0xdeadbeef or whatever and left to sit). Hm, I wonder how much more practical it would be if we recycled it but offset the pointer by maxalign. We would waste 4/8 bytes per palloc/free cycle instead of the whole chunk. (Whether palloc could actually do this would be another question, there are lots of allocator algorithms that wouldn't be able to, I think.) If we had a more formalized save_memory_context()/restore_memory_context() which gave you more than just a pointer we could do better for this particular case. save_memory_context() could hand you a struct with a pointer as well as some sanity check values about the context you're saving. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
Re: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > Can we put a identifier into header of each control block, an ascending > value that is unlikely duplicate between calls? That way we'd be able to > tell immediately it wasn't the same block, Same block than what? Unless you can somehow hide that ID number in a MemoryContext pointer, the staleness of the pointer won't be evident. regards, tom lane
Re: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> No, not really. AFAICS the reason for Alvaro not seeing it must be that >> on his machine the new transaction happens to allocate its >> TopTransactionContext control block right in the same place where the >> old one was. > But freed memory is clobbered, so if we were to have an assert that > checks the node tag, it should show up. In fact, we do have such an > assert, but only for compilers other than GCC, because the inline > version of palloc() cannot have it for lack of infrastructure. Well, but production installations don't have either memory clobbering or Asserts, so fooling with that wouldn't have helped anyway. I suspect what really happened here is that the bug was created by some late change during 8.1 development, and nobody ever exercised the anti-wraparound code path after that in an assert-enabled build :-( In a non-assert build there's a fairly good chance that it'd still work because the context header would still be there undamaged. One thing we could try that would cost a lot less than de-inlining palloc is to have MemoryContextDelete intentionally zero the methods pointer. That still does nothing for the create-new-context-in-same- spot issue, but at least it would catch palloc in a context that had been deleted and not yet recycled. regards, tom lane
Re: MemoryContextSwitchTo (Re: [GENERAL] Autovacuum daemon terminated by signal 11)
From
Alvaro Herrera
Date:
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > But freed memory is clobbered, so if we were to have an assert that > > checks the node tag, it should show up. In fact, we do have such an > > assert, but only for compilers other than GCC, because the inline > > version of palloc() cannot have it for lack of infrastructure. > > Well, but production installations don't have either memory clobbering > or Asserts, so fooling with that wouldn't have helped anyway. I suspect > what really happened here is that the bug was created by some late > change during 8.1 development, and nobody ever exercised the > anti-wraparound code path after that in an assert-enabled build :-( > In a non-assert build there's a fairly good chance that it'd still > work because the context header would still be there undamaged. Well, my builds are all assert-enabled, and I still wasn't able to make it crash in any way (the new context being allocated in the same position as the old one is the only explanation I have, but I did not investigate whether that's what happening). Maybe Greg Stark's idea of offsetting pointers returned by palloc could have helped to find the problem from the outset. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support