Re: [HACKERS] Deadlock in XLogInsert at AIX - Mailing list pgsql-hackers

From Noah Misch
Subject Re: [HACKERS] Deadlock in XLogInsert at AIX
Date
Msg-id 20191009063900.GA4066266@rfd.leadboat.com
Whole thread Raw
In response to Re: [HACKERS] Deadlock in XLogInsert at AIX  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Deadlock in XLogInsert at AIX
List pgsql-hackers
On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > [ fetch-add-gcc-xlc-unify-v2.patch ]
> 
> This still fails on Apple's compilers.  The first failure I get is
> 
> ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I../../../src/include
-I/usr/local/include-isysroot /Developer/SDKs/MacOSX10.5.sdk    -c -o nodeHashjoin.o nodeHashjoin.c
 
> /var/tmp//ccXUM8ep.s:449:Parameter error: r0 not allowed for parameter 2 (code as 0 not r0)
> 
> Line 449 of the assembly file is the addi in
> 
> LM87:
>                 sync
>         lwarx   r0,0,r2
>         addi    r11,r0,1
>         stwcx.  r11,0,r2
>         bne             $-12
>         isync
> 
> which I suppose comes out of PG_PPC_FETCH_ADD.  I find this idea of
> constructing assembly code by string-pasting pretty unreadable and am not
> tempted to try to debug it, but I don't immediately see why this doesn't
> work when the existing s_lock.h code does.  I think that the assembler
> error message is probably misleading: while it seems to be saying to
> s/r0/0/ in the addi, gcc consistently uses "rN" syntax for the second
> parameter elsewhere.  I do note that gcc never generates r0 as addi's
> second parameter in several files I checked through, so maybe what it
> means is "you need to use some other register"?  (Which would imply that
> the constraint for this asm argument is too loose.)

Thanks for testing.  That error boils down to "need to use some other
register".  The second operand of addi is one of the ppc instruction operands
that can hold a constant zero or a register number[1], so the proper
constraint is "b".  I've made it so and added a comment.  I should probably
update s_lock.h, too, in a later patch.  I don't know how it has
mostly-avoided this failure mode, but its choice of constraint could explain
https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com

> I'm also wondering why this isn't following s_lock.h's lead as to
> USE_PPC_LWARX_MUTEX_HINT and USE_PPC_LWSYNC.

Most or all of today's pg_atomic_compare_exchange_u32() usage does not have
the property that the mutex hint would signal.

pg_atomic_compare_exchange_u32() specifies "Full barrier semantics", which
lwsync does not provide.  We might want to relax the specification to make
lwsync acceptable, but that would be a separate, architecture-independent
project.  (generic-gcc.h:pg_atomic_compare_exchange_u32_impl() speculates
along those lines, writing "FIXME: we can probably use a lower consistency
model".)


[1] "Notice that addi and addis use the value 0, not the contents of GPR 0, if
RA=0." -- https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Safeguards against incorrect fd flags for fsync()
Next
From: Antonin Houska
Date:
Subject: Re: Transparent Data Encryption (TDE) and encrypted files