On 06/24/2014 05:47 PM, Alvaro Herrera wrote:
>>> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
>>> index 3c1c81a..4264373 100644
>>> --- a/src/backend/storage/smgr/md.c
>>> +++ b/src/backend/storage/smgr/md.c
>>> @@ -219,6 +219,16 @@ mdinit(void)
>>> &hash_ctl,
>>> HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
>>> pendingUnlinks = NIL;
>>> +
>>> + /*
>>> + * XXX: The checkpointer needs to add entries to the pending ops
>>> + * table when absorbing fsync requests. That is done within a critical
>>> + * section. It means that there's a theoretical possibility that you
>>> + * run out of memory while absorbing fsync requests, which leads to
>>> + * a PANIC. Fortunately the hash table is small so that's unlikely to
>>> + * happen in practice.
>>> + */
>>> + MemoryContextAllowInCriticalSection(MdCxt, true);
>>> }
>>> }
>>
>> Isn't that allowing a bit too much? We e.g. shouldn't allow
>> _fdvec_alloc() within a crritical section. Might make sense to create a
>> child context for it.
>
> I agree.
Ok. That leaves nothing but _fdvec_alloc() in MdCxt.
> Rahila Syed wrote:
>
>> The patch on compilation gives following error,
>>
>> mcxt.c: In function ‘MemoryContextAllowInCriticalSection’:
>> mcxt.c:322: error: ‘struct MemoryContextData’ has no member named
>> ‘allowInCriticalSection’
>>
>> The member in MemoryContextData is defined as 'allowInCritSection' while
>> the MemoryContextAllowInCriticalSection accesses the field as
>> 'context->allowInCriticalSection'.
>
> It appears Heikki did a search'n replace for "->allowInCritSection"
> before submitting, which failed to match the struct declaration.
Uh, looks like I failed to test this at all with assertions enabled.
Once you fix that error, you get a lot of assertion failures :-(.
Attached is a new attempt:
* walDebugCxt is now created at backend startup (in XLOGShmemInit()).
This fixes the issue that Andres pointed out, that the context creation
would PANIC if the first XLogInsert in a backend happens within a
critical section. LWLOCK_STATS had the same issue, it would fail if the
first LWLock acquisition in a process happens within a critical section.
I fixed that by adding an explicit InitLWLockAccess() function and moved
the initialization there.
* hash_create also creates a new memory context within the given
context. That new context needs to inherit the allowInCritSection flag,
otherwise allocating entries in the hash will still fail. Both
LWLOCK_STATS and the pending ops hash table in the checkpointer had this
issue.
* Checkpointer does one palloc to allocate private space to copy the
requests to, in addition to using the hash table. I moved that palloc to
before entering the critical section.
- Heikki