Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease - Mailing list pgsql-hackers

From: "Andres Freund" <andres@2ndquadrant.com>
> It's x86, right? Then it's unlikely to be actual unordered memory
> accesses, but if the compiler reordered:
>    LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
>    proc = head;
>    head = proc->lwWaitLink;
>    proc->lwWaitLink = NULL;
>    proc->lwWaiting = false;
>    PGSemaphoreUnlock(&proc->sem);
> to
>    LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
>    proc = head;
>    proc->lwWaiting = false;
>    head = proc->lwWaitLink;
>    proc->lwWaitLink = NULL;
>    PGSemaphoreUnlock(&proc->sem);
> which it is permitted to do, yes, that could cause symptoms like you
> describe.

Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for
x86_64.  The PostgreSQL was built with GCC.

> Any chance you have the binaries the customer ran back then around?
> Disassembling that piece of code might give you a hint whether that's a
> possible cause.

I'm sorry I can't provide the module, but I attached the disassembled code
code for lwlockRelease and LWLockAcquire in the executable.  I'm not sure
this proves something.

FYI, the following stack traces are the ones obtained during two instances
of hang.

#0  0x00000036102eaf77 in semop () from /lib64/libc.so.6
#1  0x0000000000614707 in PGSemaphoreLock ()
#2  0x0000000000659d5b in LWLockAcquire ()
#3  0x000000000047983d in RelationGetBufferForTuple ()
#4  0x0000000000477f86 in heap_insert ()
#5  0x00000000005a4a12 in ExecModifyTable ()
#6  0x000000000058d928 in ExecProcNode ()
#7  0x000000000058c762 in standard_ExecutorRun ()
#8  0x00007f0cb37f99cb in pgss_ExecutorRun () from
/opt/symfoserver64/lib/pg_stat_statements.so
#9  0x00007f0cb357f545 in explain_ExecutorRun () from
/opt/symfoserver64/lib/auto_explain.so
#10 0x000000000066a59e in ProcessQuery ()
#11 0x000000000066a7ef in PortalRunMulti ()
#12 0x000000000066afd2 in PortalRun ()
#13 0x0000000000666fcb in exec_simple_query ()
#14 0x0000000000668058 in PostgresMain ()
#15 0x0000000000622ef1 in PostmasterMain ()
#16 0x00000000005c0723 in main ()

#0  0x00000036102eaf77 in semop () from /lib64/libc.so.6
#1  0x0000000000614707 in PGSemaphoreLock ()
#2  0x0000000000659d5b in LWLockAcquire ()
#3  0x000000000047983d in RelationGetBufferForTuple ()
#4  0x0000000000477f86 in heap_insert ()
#5  0x00000000005a4a12 in ExecModifyTable ()
#6  0x000000000058d928 in ExecProcNode ()
#7  0x000000000058c762 in standard_ExecutorRun ()
#8  0x00007f0cb37f99cb in pgss_ExecutorRun () from
/opt/symfoserver64/lib/pg_stat_statements.so
#9  0x00007f0cb357f545 in explain_ExecutorRun () from
/opt/symfoserver64/lib/auto_explain.so
#10 0x000000000066a59e in ProcessQuery ()
#11 0x000000000066a7ef in PortalRunMulti ()
#12 0x000000000066afd2 in PortalRun ()
#13 0x0000000000666fcb in exec_simple_query ()
#14 0x0000000000668058 in PostgresMain ()
#15 0x0000000000622ef1 in PostmasterMain ()
#16 0x00000000005c0723 in main ()


#0  0x00000036102eaf77 in semop () from /lib64/libc.so.6
#1  0x0000000000614707 in PGSemaphoreLock ()
#2  0x0000000000659d5b in LWLockAcquire ()
#3  0x000000000064bb8c in ProcArrayEndTransaction ()
#4  0x0000000000491216 in CommitTransaction ()
#5  0x00000000004925a5 in CommitTransactionCommand ()
#6  0x0000000000664cf7 in finish_xact_command ()
#7  0x0000000000667145 in exec_simple_query ()
#8  0x0000000000668058 in PostgresMain ()
#9  0x0000000000622ef1 in PostmasterMain ()
#10 0x00000000005c0723 in main ()


Regards
MauMau


Attachment

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: [BUG] Archive recovery failure on 9.3+.
Next
From: Andres Freund
Date:
Subject: Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease