Re: Allocation in critical section after node exits archive recovery - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Allocation in critical section after node exits archive recovery
Date
Msg-id 547827A1.1040501@vmware.com
Whole thread Raw
In response to Allocation in critical section after node exits archive recovery  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 11/28/2014 08:35 AM, Michael Paquier wrote:
> Hi all,
>
> While performing some archive recovery tests I found the following
> assertion failure once standby gets out of archive recovery (HEAD at
> a5eb85e):
> (lldb)
> * thread #1: tid = 0x0000, 0x00007fff8faaf282
> libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal
> SIGSTOP
>    * frame #0: 0x00007fff8faaf282 libsystem_kernel.dylib`__pthread_kill + 10
>      frame #1: 0x00007fff888774c3 libsystem_pthread.dylib`pthread_kill + 90
>      frame #2: 0x00007fff898acb73 libsystem_c.dylib`abort + 129
>      frame #3: 0x0000000107106029
> postgres`ExceptionalCondition(conditionName=0x000000010725770b,
> errorType=0x0000000107219b83, fileName=0x00000001072c4c84,
> lineNumber=555) + 137 at assert.c:54
>      frame #4: 0x0000000107140ec5
> postgres`MemoryContextCreate(tag=T_AllocSetContext, size=192,
> methods=0x00000001072fc7b0, parent=0x00007fca6a405710,
> name=0x0000000107257e1b) + 117 at mcxt.c:555
>      frame #5: 0x000000010713e6c7
> postgres`AllocSetContextCreate(parent=0x00007fca6a405710,
> name=0x0000000107257e1b, minContextSize=0, initBlockSize=8192,
> maxBlockSize=8388608) + 71 at aset.c:444
>      frame #6: 0x0000000106c51449 postgres`InitXLogInsert + 73 at
> xloginsert.c:869
>      frame #7: 0x0000000106c450ee postgres`InitXLOGAccess + 190 at xlog.c:7281
>      frame #8: 0x0000000106c43e7c postgres`LocalSetXLogInsertAllowed +
> 76 at xlog.c:7133
>      frame #9: 0x0000000106c445dd postgres`CreateCheckPoint(flags=38) +
> 1165 at xlog.c:7693
>      frame #10: 0x0000000106ebf935 postgres`CheckpointerMain + 1989 at
> checkpointer.c:502
>      frame #11: 0x0000000106c5a0c3
> postgres`AuxiliaryProcessMain(argc=2, argv=0x00007fff5909bb30) + 1907
> at bootstrap.c:426
>      frame #12: 0x0000000106ecf696
> postgres`StartChildProcess(type=CheckpointerProcess) + 358 at
> postmaster.c:5109
>      frame #13: 0x0000000106ece229
> postgres`sigusr1_handler(postgres_signal_arg=30) + 265 at
> postmaster.c:4795
>      frame #14: 0x00007fff91468f1a libsystem_platform.dylib`_sigtramp + 26
>      frame #15: 0x00007fff8faaf3f7 libsystem_kernel.dylib`__select + 11
>      frame #16: 0x0000000106ecd717 postgres`PostmasterMain(argc=3,
> argv=0x00007fca6a4056a0) + 5479 at postmaster.c:1220
>      frame #17: 0x0000000106e11fc5 postgres`main(argc=3,
> argv=0x00007fca6a4056a0) + 773 at main.c:220
>
> Logs:
> 2014-11-28 15:28:20.733 JST: LOG:  selected new timeline ID: 2
> cp: /Users/ioltas/archive/5432/00000001.history: No such file or
> directory
> 2014-11-28 15:28:20.776 JST: LOG:  archive recovery complete
> TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 555)
>
> This area of the code has been changed recently by the WAL API patch.
> Note that I haven't looked yet at implementing a fix.

Fixed, thanks. I had tested PITR, but apparently not with hot_standby=off.

It might be more robust to call InitXLogInsert() directly at process 
startup, somewhere in postinit.c. Currently, it's called by 
InitXLOGAccess(), which has the advantage that processes that have no 
intention of creating WAL records don't need to allocate the XLogInsert 
working buffers, notably hot standby backends. But it means that it's 
not safe to call RecoveryInProgress() in a critical section, when 
InitXLOGAccess() hasn't been called yet. All the critical sections that 
cover WAL-logging page modifications are safe, because you wouldn't get 
to the critical section without having called InitXLOGAccess() earlier, 
but it's a bit fiddly.

I think I'll leave it as it is for now, but if any more issues like this 
crop up, I'll move the call to InitXLogInsert to backend startup.

- Heikki




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Using pg_rewind for differential backup
Next
From: Heikki Linnakangas
Date:
Subject: Re: Using pg_rewind for differential backup