Thread: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

[HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Tom Lane

Date:

30 December 2016, 08:44:33

In pursuit of knowledge about clock_gettime(), I installed FreeBSD 10.3
on a PowerBook I had laying about.  I was disappointed to find out that
HEAD fails regression tests on that platform:

*** /usr/home/tgl/pgsql/src/test/regress/expected/lock.out      Thu Dec 29 19:37:50 2016
--- /usr/home/tgl/pgsql/src/test/regress/results/lock.out       Fri Dec 30 00:31:01 2016
***************
*** 63,70 **** -- atomic ops tests RESET search_path; SELECT test_atomic_ops();
!  test_atomic_ops
! -----------------
!  t
! (1 row)
!
--- 63,66 ---- -- atomic ops tests RESET search_path; SELECT test_atomic_ops();
! ERROR:  flag: set spuriously #2

======================================================================

After some digging I found out that the atomics code appears to believe
that the PPC platform has byte-wide atomics, which of course is nonsense.
That causes it to define pg_atomic_flag with the "char" variant, and
what we end up with is word-wide operations (lwarx and friends) operating
on a byte-wide struct.  Failure is not exactly astonishing; what's
surprising is that we don't get stack-clobber core dumps or such.
Undef'ing HAVE_GCC__SYNC_CHAR_TAS makes it work.

Perhaps it could be argued that there's a FreeBSD compiler bug here,
but what I'm wondering is why configure believes that this:
 [char lock = 0;  __sync_lock_test_and_set(&lock, 1);  __sync_lock_release(&lock);])],

is going to draw a hard error on platforms without byte-wide atomics.
My sense of C semantics is that the best you could hope for is a warning
--- and a look at the config.log on this platform says we're not
even getting that.

In short, I'd put the blame here on a ridiculously optimistic configure
check.
        regards, tom lane

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Andres Freund

Date:

30 December 2016, 14:10:38

Hi,

On 2016-12-30 00:44:33 -0500, Tom Lane wrote:
> *** /usr/home/tgl/pgsql/src/test/regress/expected/lock.out      Thu Dec 29 19:37:50 2016
> --- /usr/home/tgl/pgsql/src/test/regress/results/lock.out       Fri Dec 30 00:31:01 2016
> ***************
> *** 63,70 ****
>   -- atomic ops tests
>   RESET search_path;
>   SELECT test_atomic_ops();
> !  test_atomic_ops
> ! -----------------
> !  t
> ! (1 row)
> !
> --- 63,66 ----
>   -- atomic ops tests
>   RESET search_path;
>   SELECT test_atomic_ops();
> ! ERROR:  flag: set spuriously #2
>
> ======================================================================

Hm, not good.

> After some digging I found out that the atomics code appears to believe
> that the PPC platform has byte-wide atomics, which of course is nonsense.
> That causes it to define pg_atomic_flag with the "char" variant, and
> what we end up with is word-wide operations (lwarx and friends) operating
> on a byte-wide struct.  Failure is not exactly astonishing; what's
> surprising is that we don't get stack-clobber core dumps or such.
> Undef'ing HAVE_GCC__SYNC_CHAR_TAS makes it work.
>
> Perhaps it could be argued that there's a FreeBSD compiler bug here,
> but what I'm wondering is why configure believes that this:
>
>   [char lock = 0;
>    __sync_lock_test_and_set(&lock, 1);
>    __sync_lock_release(&lock);])],
>
> is going to draw a hard error on platforms without byte-wide atomics.
> My sense of C semantics is that the best you could hope for is a
> warning

Well, in theory these aren't actual functions but intrinsics with
special behaviour ("by being overloaded so that they work on multiple
types."). The compiler is supposed to warn and link to an external
function if a certain operation is not available:
   Not all operations are supported by all target processors. If a   particular operation cannot be implemented on the
targetprocessor, a   warning is generated and a call to an external function is   generated. The external function
carriesthe same name as the built-in   version, with an additional suffix ‘_n’ where n is the size of the data   type.

So that assumption made in that configure test doesn't seem that
unreasonable.  What assembler do byte-wide atomics generate on that
platform?  Which compiler/version is this?

Regards,

Andres

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Tom Lane

Date:

30 December 2016, 18:48:22

Andres Freund <andres@anarazel.de> writes:
> On 2016-12-30 00:44:33 -0500, Tom Lane wrote:
>> Perhaps it could be argued that there's a FreeBSD compiler bug here,
>> but what I'm wondering is why configure believes that this:
>> 
>> [char lock = 0;
>> __sync_lock_test_and_set(&lock, 1);
>> __sync_lock_release(&lock);])],
>> 
>> is going to draw a hard error on platforms without byte-wide atomics.
>> My sense of C semantics is that the best you could hope for is a
>> warning

> Well, in theory these aren't actual functions but intrinsics with
> special behaviour ("by being overloaded so that they work on multiple
> types."). The compiler is supposed to warn and link to an external
> function if a certain operation is not available:

Yeah, I read that.  But configure tests don't normally notice warnings.
Therefore, even if the compiler is behaving fully per spec (and I think
FreeBSD's might not be), we are going to think that char and word-wide
operations are interchangeable even when one is an efficient, in-line
operation and the other is inefficiently emulated by a function or
kernel trap.  This doesn't really seem good enough, especially for
standard architectures where we know darn well what ought to happen.

There's a reason why we've not converted s_lock.h to rely on compiler
intrinsics everywhere, and this is exactly it: the quality of the
intrinsics implementations are horridly variable.

> So that assumption made in that configure test doesn't seem that
> unreasonable.  What assembler do byte-wide atomics generate on that
> platform?  Which compiler/version is this?

$ gcc -v
Using built-in specs.
Target: powerpc-undermydesk-freebsd
Configured with: FreeBSD/powerpc system compiler
Thread model: posix
gcc version 4.2.1 20070831 patched [FreeBSD]

I tried "gcc -Wall -O0 -g -S bogus.c" on this input:

int
test_tas_char(volatile char *ptr)
{
  return __sync_lock_test_and_set(ptr, 1) == 0;
}

int
test_tas_int(volatile int *ptr)
{
  return __sync_lock_test_and_set(ptr, 1) == 0;
}

and got no warnings and the attached output.  I'm not very good at reading
PPC assembler, but I think what is happening in the "char" case is that
gcc is trying to emulate a byte-wide operation using a word-wide one,
ie an lwarx/stwcx. loop.  Even if that works (and apparently it does not),
we would not want to use it since it implies contention between adjacent
atomic flags.

BTW, I looked around to see why the PPC critters in the buildfarm aren't
failing, and the answer is that they all have compilers that are too old
to have __sync_lock_test_and_set at all, so that the configure probes
just fail outright.  But the probes can't tell the difference between
implementations we want to use and implementations we don't.

            regards, tom lane

    .file    "bogus.c"
    .section    .debug_abbrev,"",@progbits
.Ldebug_abbrev0:
    .section    .debug_info,"",@progbits
.Ldebug_info0:
    .section    .debug_line,"",@progbits
.Ldebug_line0:
    .section    ".text"
.Ltext0:
    .align 2
    .globl test_tas_char
    .type    test_tas_char, @function
test_tas_char:
.LFB2:
    .file 1 "bogus.c"
    .loc 1 3 0
    stwu 1,-32(1)
.LCFI0:
    stw 31,28(1)
.LCFI1:
    mr 31,1
.LCFI2:
    stw 3,8(31)
    .loc 1 4 0
    lwz 7,8(31)
    stw 7,12(31)
    lwz 8,12(31)
    lbz 8,0(8)
    stb 8,17(31)
.L2:
    lbz 9,17(31)
    stb 9,16(31)
    li 11,1
    lwz 10,12(31)
    rlwinm 9,10,3,27,28
    xori 9,9,24
    lbz 0,16(31)
    rlwinm 10,0,0,24,31
    slw 10,10,9
    rlwinm 0,11,0,0xff
    rlwinm 11,0,0,24,31
    slw 11,11,9
    li 0,255
    slw 0,0,9
    lwz 7,12(31)
    rlwinm 9,7,0,0,29
    sync
.L4:
    lwarx 8,0,9
    and 7,8,0
    cmpw 0,7,10
    bne- 0,.L5
    andc 8,8,0
    or 8,8,11
    stwcx. 8,0,9
    bne- 0,.L4
    isync
.L5:
    mr 0,7
    stb 0,17(31)
    lbz 9,17(31)
    lbz 0,16(31)
    cmpw 7,9,0
    bne 7,.L2
    lbz 0,16(31)
    cmpwi 7,0,0
    mfcr 0
    rlwinm 0,0,31,1
    .loc 1 5 0
    mr 3,0
    lwz 11,0(1)
    lwz 31,-4(11)
    mr 1,11
    blr
.LFE2:
    .size    test_tas_char,.-test_tas_char
    .align 2
    .globl test_tas_int
    .type    test_tas_int, @function
test_tas_int:
.LFB3:
    .loc 1 9 0
    stwu 1,-32(1)
.LCFI3:
    stw 31,28(1)
.LCFI4:
    mr 31,1
.LCFI5:
    stw 3,8(31)
    .loc 1 10 0
    lwz 9,8(31)
    li 10,1
    sync
.L8:
    lwarx 0,0,9
    mr 11,10
    stwcx. 11,0,9
    bne- 0,.L8
    isync
    cmpwi 7,0,0
    mfcr 0
    rlwinm 0,0,31,1
    .loc 1 11 0
    mr 3,0
    lwz 11,0(1)
    lwz 31,-4(11)
    mr 1,11
    blr
.LFE3:
    .size    test_tas_int,.-test_tas_int
    .section    .debug_frame,"",@progbits
.Lframe0:
    .4byte    .LECIE0-.LSCIE0
.LSCIE0:
    .4byte    0xffffffff
    .byte    0x1
    .string    ""
    .uleb128 0x1
    .sleb128 -4
    .byte    0x6c
    .byte    0xc
    .uleb128 0x1
    .uleb128 0x0
    .align 2
.LECIE0:
.LSFDE0:
    .4byte    .LEFDE0-.LASFDE0
.LASFDE0:
    .4byte    .Lframe0
    .4byte    .LFB2
    .4byte    .LFE2-.LFB2
    .byte    0x4
    .4byte    .LCFI0-.LFB2
    .byte    0xe
    .uleb128 0x20
    .byte    0x4
    .4byte    .LCFI1-.LCFI0
    .byte    0x9f
    .uleb128 0x1
    .byte    0x4
    .4byte    .LCFI2-.LCFI1
    .byte    0xd
    .uleb128 0x1f
    .align 2
.LEFDE0:
.LSFDE2:
    .4byte    .LEFDE2-.LASFDE2
.LASFDE2:
    .4byte    .Lframe0
    .4byte    .LFB3
    .4byte    .LFE3-.LFB3
    .byte    0x4
    .4byte    .LCFI3-.LFB3
    .byte    0xe
    .uleb128 0x20
    .byte    0x4
    .4byte    .LCFI4-.LCFI3
    .byte    0x9f
    .uleb128 0x1
    .byte    0x4
    .4byte    .LCFI5-.LCFI4
    .byte    0xd
    .uleb128 0x1f
    .align 2
.LEFDE2:
    .section    ".text"
.Letext0:
    .section    .debug_loc,"",@progbits
.Ldebug_loc0:
.LLST0:
    .4byte    .LFB2-.Ltext0
    .4byte    .LCFI0-.Ltext0
    .2byte    0x1
    .byte    0x51
    .4byte    .LCFI0-.Ltext0
    .4byte    .LCFI2-.Ltext0
    .2byte    0x2
    .byte    0x71
    .sleb128 32
    .4byte    .LCFI2-.Ltext0
    .4byte    .LFE2-.Ltext0
    .2byte    0x2
    .byte    0x8f
    .sleb128 32
    .4byte    0x0
    .4byte    0x0
.LLST1:
    .4byte    .LFB3-.Ltext0
    .4byte    .LCFI3-.Ltext0
    .2byte    0x1
    .byte    0x51
    .4byte    .LCFI3-.Ltext0
    .4byte    .LCFI5-.Ltext0
    .2byte    0x2
    .byte    0x71
    .sleb128 32
    .4byte    .LCFI5-.Ltext0
    .4byte    .LFE3-.Ltext0
    .2byte    0x2
    .byte    0x8f
    .sleb128 32
    .4byte    0x0
    .4byte    0x0
    .section    .debug_info
    .4byte    0xdf
    .2byte    0x2
    .4byte    .Ldebug_abbrev0
    .byte    0x4
    .uleb128 0x1
    .string    "GNU C 4.2.1 20070831 patched [FreeBSD]"
    .byte    0x1
    .string    "bogus.c"
    .string    "/home/tgl"
    .4byte    .Ltext0
    .4byte    .Letext0
    .4byte    .Ldebug_line0
    .uleb128 0x2
    .byte    0x1
    .string    "test_tas_char"
    .byte    0x1
    .byte    0x3
    .byte    0x1
    .4byte    0x88
    .4byte    .LFB2
    .4byte    .LFE2
    .4byte    .LLST0
    .4byte    0x88
    .uleb128 0x3
    .string    "ptr"
    .byte    0x1
    .byte    0x2
    .4byte    0x8f
    .byte    0x2
    .byte    0x91
    .sleb128 -24
    .byte    0x0
    .uleb128 0x4
    .byte    0x4
    .byte    0x5
    .string    "int"
    .uleb128 0x5
    .byte    0x4
    .4byte    0x95
    .uleb128 0x6
    .4byte    0x9a
    .uleb128 0x4
    .byte    0x1
    .byte    0x8
    .string    "char"
    .uleb128 0x2
    .byte    0x1
    .string    "test_tas_int"
    .byte    0x1
    .byte    0x9
    .byte    0x1
    .4byte    0x88
    .4byte    .LFB3
    .4byte    .LFE3
    .4byte    .LLST1
    .4byte    0xd7
    .uleb128 0x3
    .string    "ptr"
    .byte    0x1
    .byte    0x8
    .4byte    0xd7
    .byte    0x2
    .byte    0x91
    .sleb128 -24
    .byte    0x0
    .uleb128 0x5
    .byte    0x4
    .4byte    0xdd
    .uleb128 0x6
    .4byte    0x88
    .byte    0x0
    .section    .debug_abbrev
    .uleb128 0x1
    .uleb128 0x11
    .byte    0x1
    .uleb128 0x25
    .uleb128 0x8
    .uleb128 0x13
    .uleb128 0xb
    .uleb128 0x3
    .uleb128 0x8
    .uleb128 0x1b
    .uleb128 0x8
    .uleb128 0x11
    .uleb128 0x1
    .uleb128 0x12
    .uleb128 0x1
    .uleb128 0x10
    .uleb128 0x6
    .byte    0x0
    .byte    0x0
    .uleb128 0x2
    .uleb128 0x2e
    .byte    0x1
    .uleb128 0x3f
    .uleb128 0xc
    .uleb128 0x3
    .uleb128 0x8
    .uleb128 0x3a
    .uleb128 0xb
    .uleb128 0x3b
    .uleb128 0xb
    .uleb128 0x27
    .uleb128 0xc
    .uleb128 0x49
    .uleb128 0x13
    .uleb128 0x11
    .uleb128 0x1
    .uleb128 0x12
    .uleb128 0x1
    .uleb128 0x40
    .uleb128 0x6
    .uleb128 0x1
    .uleb128 0x13
    .byte    0x0
    .byte    0x0
    .uleb128 0x3
    .uleb128 0x5
    .byte    0x0
    .uleb128 0x3
    .uleb128 0x8
    .uleb128 0x3a
    .uleb128 0xb
    .uleb128 0x3b
    .uleb128 0xb
    .uleb128 0x49
    .uleb128 0x13
    .uleb128 0x2
    .uleb128 0xa
    .byte    0x0
    .byte    0x0
    .uleb128 0x4
    .uleb128 0x24
    .byte    0x0
    .uleb128 0xb
    .uleb128 0xb
    .uleb128 0x3e
    .uleb128 0xb
    .uleb128 0x3
    .uleb128 0x8
    .byte    0x0
    .byte    0x0
    .uleb128 0x5
    .uleb128 0xf
    .byte    0x0
    .uleb128 0xb
    .uleb128 0xb
    .uleb128 0x49
    .uleb128 0x13
    .byte    0x0
    .byte    0x0
    .uleb128 0x6
    .uleb128 0x35
    .byte    0x0
    .uleb128 0x49
    .uleb128 0x13
    .byte    0x0
    .byte    0x0
    .byte    0x0
    .section    .debug_pubnames,"",@progbits
    .4byte    0x31
    .2byte    0x2
    .4byte    .Ldebug_info0
    .4byte    0xe3
    .4byte    0x52
    .string    "test_tas_char"
    .4byte    0xa2
    .string    "test_tas_int"
    .4byte    0x0
    .section    .debug_aranges,"",@progbits
    .4byte    0x1c
    .2byte    0x2
    .4byte    .Ldebug_info0
    .byte    0x4
    .byte    0x0
    .2byte    0x0
    .2byte    0x0
    .4byte    .Ltext0
    .4byte    .Letext0-.Ltext0
    .4byte    0x0
    .4byte    0x0
    .ident    "GCC: (GNU) 4.2.1 20070831 patched [FreeBSD]"
    .section    .note.GNU-stack,"",@progbits

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Andres Freund

Date:

30 December 2016, 19:06:24

On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> On 2016-12-30 00:44:33 -0500, Tom Lane wrote:
>>> Perhaps it could be argued that there's a FreeBSD compiler bug here,
>>> but what I'm wondering is why configure believes that this:
>>>
>>> [char lock = 0;
>>> __sync_lock_test_and_set(&lock, 1);
>>> __sync_lock_release(&lock);])],
>>>
>>> is going to draw a hard error on platforms without byte-wide
>atomics.
>>> My sense of C semantics is that the best you could hope for is a
>>> warning
>
>> Well, in theory these aren't actual functions but intrinsics with
>> special behaviour ("by being overloaded so that they work on multiple
>> types."). The compiler is supposed to warn and link to an external
>> function if a certain operation is not available:
>
>Yeah, I read that. But configure tests don't normally notice warnings.

IIRC those functions are *not* supposed to be provided, I.e. they should result in a linker error.

BTW, it's unclear why there's no 1 byte atomics support in that compiler, isn't it? On a ll/sc arch like ppc it should (and IIRC is) be supported efficiently. As you suggest, using a lwarx should just work.

> I'm not very good at reading
PPC assembler, but I think what is happening in the "char" case is that
gcc is trying to emulate a byte-wide operation using a word-wide one,
ie an lwarx/stwcx. loop. Even if that works (and apparently it does not),
we would not want to use it since it implies contention between adjacent
atomic flags.

That's the case as long as they're on a single cache line. Whether it's the same word, double/half word shouldn't matter.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Andres Freund

Date:

30 December 2016, 19:14:24

On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>and got no warnings and the attached output.  I'm not very good at
>reading
>PPC assembler, but I think what is happening in the "char" case is that
>gcc is trying to emulate a byte-wide operation using a word-wide one,
>ie an lwarx/stwcx. loop. 

Hm.  This seems to suggest a straight out code generation bug in that compiler, not a failure in intrinsic detection.

I'll note that there's certainly ppc64 machine with that intrinsic working (tested that on the community hydra during
atomicsdevelopment).  So either it's a bug specific to some compiler version,  or 32bit ppc.

I assume there's no trivial way to get a newer compiler on that machine?
Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Tom Lane

Date:

30 December 2016, 20:18:48

Andres Freund <andres@anarazel.de> writes:
> On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> and got no warnings and the attached output.  I'm not very good at
>> reading PPC assembler, but I think what is happening in the "char" case is that
>> gcc is trying to emulate a byte-wide operation using a word-wide one,
>> ie an lwarx/stwcx. loop.

> Hm.  This seems to suggest a straight out code generation bug in that compiler, not a failure in intrinsic detection.

It does smell like a codegen bug; I'm planning to file a FreeBSD bug
report once I have more data (like whether 11.0 is busted similarly).

But that doesn't really detract from my point, which is that it's
totally silly for us to imagine that char and word-wide atomic ops are
interchangeable on all platforms and we can flip a coin to decide which
to use as long as the configure probes don't fail outright.  Even given
working code for the byte case, we ought to prefer int atomics on PPC.
On other platforms, maybe the preference goes the other way.  I'd be
inclined to follow the hard-won knowledge embedded in s_lock.h about
which to prefer.

Or in words of one syllable: this isn't the first bug we've seen in
compiler intrinsics, and it won't be the last.  We need a way to deal
with that, not just claim it isn't our problem.
        regards, tom lane

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Tom Lane

Date:

03 January 2017, 00:04:17

I wrote:
> But that doesn't really detract from my point, which is that it's
> totally silly for us to imagine that char and word-wide atomic ops are
> interchangeable on all platforms and we can flip a coin to decide which
> to use as long as the configure probes don't fail outright.  Even given
> working code for the byte case, we ought to prefer int atomics on PPC.
> On other platforms, maybe the preference goes the other way.  I'd be
> inclined to follow the hard-won knowledge embedded in s_lock.h about
> which to prefer.

After further study, I'm inclined to just propose that we flip the default
width of pg_atomic_flag in generic-gcc.h: use int not char if both are
available.  The only modern platform where that's the wrong thing is x86,
and that case is irrelevant here because we'll be using arch-x86.h not
generic-gcc.h.

A survey of s_lock.h shows that we prefer char-width slock_t only on
these architectures:

x86
sparc (but not sparcv9, there we use int)
m68k
vax

So basically, the existing coding is optimizing for obsolete hardware,
and not even very much of that.
        regards, tom lane

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Robert Haas

Date:

03 January 2017, 18:59:11

On Mon, Jan 2, 2017 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> But that doesn't really detract from my point, which is that it's
>> totally silly for us to imagine that char and word-wide atomic ops are
>> interchangeable on all platforms and we can flip a coin to decide which
>> to use as long as the configure probes don't fail outright.  Even given
>> working code for the byte case, we ought to prefer int atomics on PPC.
>> On other platforms, maybe the preference goes the other way.  I'd be
>> inclined to follow the hard-won knowledge embedded in s_lock.h about
>> which to prefer.
>
> After further study, I'm inclined to just propose that we flip the default
> width of pg_atomic_flag in generic-gcc.h: use int not char if both are
> available.  The only modern platform where that's the wrong thing is x86,
> and that case is irrelevant here because we'll be using arch-x86.h not
> generic-gcc.h.
>
> A survey of s_lock.h shows that we prefer char-width slock_t only on
> these architectures:
>
> x86
> sparc (but not sparcv9, there we use int)
> m68k
> vax

I don't think that's right, because on my MacBook Pro:

(gdb) p sizeof(slock_t)
$1 = 1
(gdb)

On Linux x86_64:

(gdb) p sizeof(slock_t)
$1 = 1

I think we would be well-advised to get the size of slock_t down to a
single byte on as many platforms as possible, because when it's any
wider than that it makes some critical structures that would otherwise
fit into a cache line start to not fit, and that can have a very big
impact on performance.  I'm disappointed to notice that it's 4 bytes
wide on hydra (ppc64).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Tom Lane

Date:

03 January 2017, 19:11:51

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 2, 2017 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After further study, I'm inclined to just propose that we flip the default
>> width of pg_atomic_flag in generic-gcc.h: use int not char if both are
>> available.  The only modern platform where that's the wrong thing is x86,
>> and that case is irrelevant here because we'll be using arch-x86.h not
>> generic-gcc.h.
>> 
>> A survey of s_lock.h shows that we prefer char-width slock_t only on
>> these architectures:
>> 
>> x86
>> sparc (but not sparcv9, there we use int)
>> m68k
>> vax

> I don't think that's right, because on my MacBook Pro:

... which is an x86, which won't be affected by the proposed change.

> I think we would be well-advised to get the size of slock_t down to a
> single byte on as many platforms as possible, because when it's any
> wider than that it makes some critical structures that would otherwise
> fit into a cache line start to not fit, and that can have a very big
> impact on performance.

I really doubt that that's a good argument for choosing a markedly less
efficient locking primitive, which is what's at stake for PPC.  I have
no info about the other architectures.

Also, since pg_atomic_flag is currently used in a grand total of zero
places (other than the test case in regress.c), arguing that making
it word-wide will bloat critical data structures is flat wrong.
        regards, tom lane

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Robert Haas

Date:

03 January 2017, 19:35:35

On Tue, Jan 3, 2017 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> A survey of s_lock.h shows that we prefer char-width slock_t only on
>>> these architectures:
>>>
>>> x86
>>> sparc (but not sparcv9, there we use int)
>>> m68k
>>> vax
>
>> I don't think that's right, because on my MacBook Pro:
>
> ... which is an x86, which won't be affected by the proposed change.

Uh, x86 means 32-bit to me, and this MacBook Pro is definitely x86_64.

>> I think we would be well-advised to get the size of slock_t down to a
>> single byte on as many platforms as possible, because when it's any
>> wider than that it makes some critical structures that would otherwise
>> fit into a cache line start to not fit, and that can have a very big
>> impact on performance.
>
> I really doubt that that's a good argument for choosing a markedly less
> efficient locking primitive, which is what's at stake for PPC.  I have
> no info about the other architectures.

I don't know for certain, but I would not be willing to take that on
faith.  I'm not sure if you've read all of the discussion threads on
this mailing list about fitting things into cache lines and/or
aligning things to cache lines to avoid major performance regressions
on large servers, but there have been quite a few of those over the
last few years and there will doubtless be more.

> Also, since pg_atomic_flag is currently used in a grand total of zero
> places (other than the test case in regress.c), arguing that making
> it word-wide will bloat critical data structures is flat wrong.

Well that just begs the question of whether we should rip it out.  If
it's unused, then, yes, the performance characteristics don't matter,
but if it's gonna get used for anything important, I maintain that
both the speed of the implementation and the number of bytes that it
consumes will be relevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Tom Lane

Date:

03 January 2017, 20:07:36

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 3, 2017 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> A survey of s_lock.h shows that we prefer char-width slock_t only on
>>>> these architectures:
>>>> 
>>>> x86
>>>> sparc (but not sparcv9, there we use int)
>>>> m68k
>>>> vax

>>> I don't think that's right, because on my MacBook Pro:

>> ... which is an x86, which won't be affected by the proposed change.

> Uh, x86 means 32-bit to me, and this MacBook Pro is definitely x86_64.

Sorry for the confusion.  x86 subsumes x86_64 so far as the atomics
code is concerned, and I was following that terminology.

>> Also, since pg_atomic_flag is currently used in a grand total of zero
>> places (other than the test case in regress.c), arguing that making
>> it word-wide will bloat critical data structures is flat wrong.

> Well that just begs the question of whether we should rip it out.  If
> it's unused, then, yes, the performance characteristics don't matter,
> but if it's gonna get used for anything important, I maintain that
> both the speed of the implementation and the number of bytes that it
> consumes will be relevant.

[ shrug... ]  I have not seen you putting any effort into optimizing
slock_t on non-x86 architectures, where it might make a difference today.
Why all the concern about shaving hypothetical future bytes for
pg_atomic_flag?

But to reduce this to the simplest possible terms: it really does not
matter whether the implementation saves bytes or cycles or anything else,
if it fails to *work*.  The existing coding for PPC fails on FreeBSD,
and likely on some other platforms using the same compiler.  I'd be the
first to say that that doesn't reflect well on the state of their PPC
port, but it's reality that we ought to be cognizant of.  And we've
worked around similar bugs nearby, eg see this bit in atomics.h:
* Given a gcc-compatible xlc compiler, prefer the xlc implementation.  The* ppc64le "IBM XL C/C++ for Linux, V13.1.2"
implementsboth interfaces, but* __sync_lock_test_and_set() of one-byte types elicits SIGSEGV.

From here it seems like you're arguing that we mustn't give FreeBSD
the identical pass that we already gave IBM, for what is nearly the
same bug.
        regards, tom lane

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Robert Haas

Date:

04 January 2017, 00:30:27

On Tue, Jan 3, 2017 at 12:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ shrug... ]  I have not seen you putting any effort into optimizing
> slock_t on non-x86 architectures, where it might make a difference today.
> Why all the concern about shaving hypothetical future bytes for
> pg_atomic_flag?

I don't know what you're getting all wrapped around the axle here
about.  I already explained why I think it's an issue.  You can
disagree if you like.  As to whether I've put any effort into
optimizing slock_t on non-x86 architectures, I commented in my first
post to this thread about my disappointment regarding the situation on
ppc64.  I didn't realize that we had that issue and I think it would
be worth fixing at some point, but I haven't quite gotten around to it
in the 4 hours since I had that realization.  I have previously done
work on non-x86 spinlocks, though
(c01c25fbe525869fa81237954727e1eb4b7d4a14).

> But to reduce this to the simplest possible terms: it really does not
> matter whether the implementation saves bytes or cycles or anything else,
> if it fails to *work*.  The existing coding for PPC fails on FreeBSD,
> and likely on some other platforms using the same compiler.  I'd be the
> first to say that that doesn't reflect well on the state of their PPC
> port, but it's reality that we ought to be cognizant of.  And we've
> worked around similar bugs nearby, eg see this bit in atomics.h:
>
>  * Given a gcc-compatible xlc compiler, prefer the xlc implementation.  The
>  * ppc64le "IBM XL C/C++ for Linux, V13.1.2" implements both interfaces, but
>  * __sync_lock_test_and_set() of one-byte types elicits SIGSEGV.
>
> From here it seems like you're arguing that we mustn't give FreeBSD
> the identical pass that we already gave IBM, for what is nearly the
> same bug.

The only point I'm making here is that the width of a spinlock is not
irrelevant.  Or at least, it didn't use to be. lwlock.h says:
* LWLOCK_MINIMAL_SIZE should be 32 on basically all common platforms, but* because slock_t is more than 2 bytes on some
obscureplatforms, we allow* for the possibility that it might be 64.

That comment is actually nonsense since
008608b9d51061b1f598c197477b3dc7be9c4a64 but before that it was
relevant.  Also, before 48354581a49c30f5757c203415aa8412d85b0f70, a
BufferDesc fit within a 64-byte cache line if slock_t was 1 or 2
bytes, a point commented on explicitly by
6150a1b08a9fe7ead2b25240be46dddeae9d98e1.   So we've clearly made
efforts in that direction in the past.  Looking a little more, though,
since both lwlock.c and bufmgr.c have been rewritten to use atomics
directly, there might not be any remaining places where the spinlocks
get hot enough to care about the extra bytes.  All things being equal
I still think a narrower one is significantly better than a wider one,
but we can always leave solving that problem to a day when the
difference can be proved out by benchmarks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3

From

Tom Lane

Date:

04 January 2017, 01:03:59

Robert Haas <robertmhaas@gmail.com> writes:
> The only point I'm making here is that the width of a spinlock is not
> irrelevant.

Sure, but it does not follow that we need to get all hot and bothered
about the width of pg_atomic_flag, which has yet to find its first
use-case.  When and if its width becomes a demonstrable issue,
we'll have some work to do in this area ... but that was true already.

> All things being equal
> I still think a narrower one is significantly better than a wider one,
> but we can always leave solving that problem to a day when the
> difference can be proved out by benchmarks.

I think we can agree on that conclusion.
        regards, tom lane