Thread: [HACKERS] Broken atomics code on PPC with FreeBSD 10.3
In pursuit of knowledge about clock_gettime(), I installed FreeBSD 10.3 on a PowerBook I had laying about. I was disappointed to find out that HEAD fails regression tests on that platform: *** /usr/home/tgl/pgsql/src/test/regress/expected/lock.out Thu Dec 29 19:37:50 2016 --- /usr/home/tgl/pgsql/src/test/regress/results/lock.out Fri Dec 30 00:31:01 2016 *************** *** 63,70 **** -- atomic ops tests RESET search_path; SELECT test_atomic_ops(); ! test_atomic_ops ! ----------------- ! t ! (1 row) ! --- 63,66 ---- -- atomic ops tests RESET search_path; SELECT test_atomic_ops(); ! ERROR: flag: set spuriously #2 ====================================================================== After some digging I found out that the atomics code appears to believe that the PPC platform has byte-wide atomics, which of course is nonsense. That causes it to define pg_atomic_flag with the "char" variant, and what we end up with is word-wide operations (lwarx and friends) operating on a byte-wide struct. Failure is not exactly astonishing; what's surprising is that we don't get stack-clobber core dumps or such. Undef'ing HAVE_GCC__SYNC_CHAR_TAS makes it work. Perhaps it could be argued that there's a FreeBSD compiler bug here, but what I'm wondering is why configure believes that this: [char lock = 0; __sync_lock_test_and_set(&lock, 1); __sync_lock_release(&lock);])], is going to draw a hard error on platforms without byte-wide atomics. My sense of C semantics is that the best you could hope for is a warning --- and a look at the config.log on this platform says we're not even getting that. In short, I'd put the blame here on a ridiculously optimistic configure check. regards, tom lane
Hi, On 2016-12-30 00:44:33 -0500, Tom Lane wrote: > *** /usr/home/tgl/pgsql/src/test/regress/expected/lock.out Thu Dec 29 19:37:50 2016 > --- /usr/home/tgl/pgsql/src/test/regress/results/lock.out Fri Dec 30 00:31:01 2016 > *************** > *** 63,70 **** > -- atomic ops tests > RESET search_path; > SELECT test_atomic_ops(); > ! test_atomic_ops > ! ----------------- > ! t > ! (1 row) > ! > --- 63,66 ---- > -- atomic ops tests > RESET search_path; > SELECT test_atomic_ops(); > ! ERROR: flag: set spuriously #2 > > ====================================================================== Hm, not good. > After some digging I found out that the atomics code appears to believe > that the PPC platform has byte-wide atomics, which of course is nonsense. > That causes it to define pg_atomic_flag with the "char" variant, and > what we end up with is word-wide operations (lwarx and friends) operating > on a byte-wide struct. Failure is not exactly astonishing; what's > surprising is that we don't get stack-clobber core dumps or such. > Undef'ing HAVE_GCC__SYNC_CHAR_TAS makes it work. > > Perhaps it could be argued that there's a FreeBSD compiler bug here, > but what I'm wondering is why configure believes that this: > > [char lock = 0; > __sync_lock_test_and_set(&lock, 1); > __sync_lock_release(&lock);])], > > is going to draw a hard error on platforms without byte-wide atomics. > My sense of C semantics is that the best you could hope for is a > warning Well, in theory these aren't actual functions but intrinsics with special behaviour ("by being overloaded so that they work on multiple types."). The compiler is supposed to warn and link to an external function if a certain operation is not available: Not all operations are supported by all target processors. If a particular operation cannot be implemented on the targetprocessor, a warning is generated and a call to an external function is generated. The external function carriesthe same name as the built-in version, with an additional suffix ‘_n’ where n is the size of the data type. So that assumption made in that configure test doesn't seem that unreasonable. What assembler do byte-wide atomics generate on that platform? Which compiler/version is this? Regards, Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-12-30 00:44:33 -0500, Tom Lane wrote: >> Perhaps it could be argued that there's a FreeBSD compiler bug here, >> but what I'm wondering is why configure believes that this: >> >> [char lock = 0; >> __sync_lock_test_and_set(&lock, 1); >> __sync_lock_release(&lock);])], >> >> is going to draw a hard error on platforms without byte-wide atomics. >> My sense of C semantics is that the best you could hope for is a >> warning > Well, in theory these aren't actual functions but intrinsics with > special behaviour ("by being overloaded so that they work on multiple > types."). The compiler is supposed to warn and link to an external > function if a certain operation is not available: Yeah, I read that. But configure tests don't normally notice warnings. Therefore, even if the compiler is behaving fully per spec (and I think FreeBSD's might not be), we are going to think that char and word-wide operations are interchangeable even when one is an efficient, in-line operation and the other is inefficiently emulated by a function or kernel trap. This doesn't really seem good enough, especially for standard architectures where we know darn well what ought to happen. There's a reason why we've not converted s_lock.h to rely on compiler intrinsics everywhere, and this is exactly it: the quality of the intrinsics implementations are horridly variable. > So that assumption made in that configure test doesn't seem that > unreasonable. What assembler do byte-wide atomics generate on that > platform? Which compiler/version is this? $ gcc -v Using built-in specs. Target: powerpc-undermydesk-freebsd Configured with: FreeBSD/powerpc system compiler Thread model: posix gcc version 4.2.1 20070831 patched [FreeBSD] I tried "gcc -Wall -O0 -g -S bogus.c" on this input: int test_tas_char(volatile char *ptr) { return __sync_lock_test_and_set(ptr, 1) == 0; } int test_tas_int(volatile int *ptr) { return __sync_lock_test_and_set(ptr, 1) == 0; } and got no warnings and the attached output. I'm not very good at reading PPC assembler, but I think what is happening in the "char" case is that gcc is trying to emulate a byte-wide operation using a word-wide one, ie an lwarx/stwcx. loop. Even if that works (and apparently it does not), we would not want to use it since it implies contention between adjacent atomic flags. BTW, I looked around to see why the PPC critters in the buildfarm aren't failing, and the answer is that they all have compilers that are too old to have __sync_lock_test_and_set at all, so that the configure probes just fail outright. But the probes can't tell the difference between implementations we want to use and implementations we don't. regards, tom lane .file "bogus.c" .section .debug_abbrev,"",@progbits .Ldebug_abbrev0: .section .debug_info,"",@progbits .Ldebug_info0: .section .debug_line,"",@progbits .Ldebug_line0: .section ".text" .Ltext0: .align 2 .globl test_tas_char .type test_tas_char, @function test_tas_char: .LFB2: .file 1 "bogus.c" .loc 1 3 0 stwu 1,-32(1) .LCFI0: stw 31,28(1) .LCFI1: mr 31,1 .LCFI2: stw 3,8(31) .loc 1 4 0 lwz 7,8(31) stw 7,12(31) lwz 8,12(31) lbz 8,0(8) stb 8,17(31) .L2: lbz 9,17(31) stb 9,16(31) li 11,1 lwz 10,12(31) rlwinm 9,10,3,27,28 xori 9,9,24 lbz 0,16(31) rlwinm 10,0,0,24,31 slw 10,10,9 rlwinm 0,11,0,0xff rlwinm 11,0,0,24,31 slw 11,11,9 li 0,255 slw 0,0,9 lwz 7,12(31) rlwinm 9,7,0,0,29 sync .L4: lwarx 8,0,9 and 7,8,0 cmpw 0,7,10 bne- 0,.L5 andc 8,8,0 or 8,8,11 stwcx. 8,0,9 bne- 0,.L4 isync .L5: mr 0,7 stb 0,17(31) lbz 9,17(31) lbz 0,16(31) cmpw 7,9,0 bne 7,.L2 lbz 0,16(31) cmpwi 7,0,0 mfcr 0 rlwinm 0,0,31,1 .loc 1 5 0 mr 3,0 lwz 11,0(1) lwz 31,-4(11) mr 1,11 blr .LFE2: .size test_tas_char,.-test_tas_char .align 2 .globl test_tas_int .type test_tas_int, @function test_tas_int: .LFB3: .loc 1 9 0 stwu 1,-32(1) .LCFI3: stw 31,28(1) .LCFI4: mr 31,1 .LCFI5: stw 3,8(31) .loc 1 10 0 lwz 9,8(31) li 10,1 sync .L8: lwarx 0,0,9 mr 11,10 stwcx. 11,0,9 bne- 0,.L8 isync cmpwi 7,0,0 mfcr 0 rlwinm 0,0,31,1 .loc 1 11 0 mr 3,0 lwz 11,0(1) lwz 31,-4(11) mr 1,11 blr .LFE3: .size test_tas_int,.-test_tas_int .section .debug_frame,"",@progbits .Lframe0: .4byte .LECIE0-.LSCIE0 .LSCIE0: .4byte 0xffffffff .byte 0x1 .string "" .uleb128 0x1 .sleb128 -4 .byte 0x6c .byte 0xc .uleb128 0x1 .uleb128 0x0 .align 2 .LECIE0: .LSFDE0: .4byte .LEFDE0-.LASFDE0 .LASFDE0: .4byte .Lframe0 .4byte .LFB2 .4byte .LFE2-.LFB2 .byte 0x4 .4byte .LCFI0-.LFB2 .byte 0xe .uleb128 0x20 .byte 0x4 .4byte .LCFI1-.LCFI0 .byte 0x9f .uleb128 0x1 .byte 0x4 .4byte .LCFI2-.LCFI1 .byte 0xd .uleb128 0x1f .align 2 .LEFDE0: .LSFDE2: .4byte .LEFDE2-.LASFDE2 .LASFDE2: .4byte .Lframe0 .4byte .LFB3 .4byte .LFE3-.LFB3 .byte 0x4 .4byte .LCFI3-.LFB3 .byte 0xe .uleb128 0x20 .byte 0x4 .4byte .LCFI4-.LCFI3 .byte 0x9f .uleb128 0x1 .byte 0x4 .4byte .LCFI5-.LCFI4 .byte 0xd .uleb128 0x1f .align 2 .LEFDE2: .section ".text" .Letext0: .section .debug_loc,"",@progbits .Ldebug_loc0: .LLST0: .4byte .LFB2-.Ltext0 .4byte .LCFI0-.Ltext0 .2byte 0x1 .byte 0x51 .4byte .LCFI0-.Ltext0 .4byte .LCFI2-.Ltext0 .2byte 0x2 .byte 0x71 .sleb128 32 .4byte .LCFI2-.Ltext0 .4byte .LFE2-.Ltext0 .2byte 0x2 .byte 0x8f .sleb128 32 .4byte 0x0 .4byte 0x0 .LLST1: .4byte .LFB3-.Ltext0 .4byte .LCFI3-.Ltext0 .2byte 0x1 .byte 0x51 .4byte .LCFI3-.Ltext0 .4byte .LCFI5-.Ltext0 .2byte 0x2 .byte 0x71 .sleb128 32 .4byte .LCFI5-.Ltext0 .4byte .LFE3-.Ltext0 .2byte 0x2 .byte 0x8f .sleb128 32 .4byte 0x0 .4byte 0x0 .section .debug_info .4byte 0xdf .2byte 0x2 .4byte .Ldebug_abbrev0 .byte 0x4 .uleb128 0x1 .string "GNU C 4.2.1 20070831 patched [FreeBSD]" .byte 0x1 .string "bogus.c" .string "/home/tgl" .4byte .Ltext0 .4byte .Letext0 .4byte .Ldebug_line0 .uleb128 0x2 .byte 0x1 .string "test_tas_char" .byte 0x1 .byte 0x3 .byte 0x1 .4byte 0x88 .4byte .LFB2 .4byte .LFE2 .4byte .LLST0 .4byte 0x88 .uleb128 0x3 .string "ptr" .byte 0x1 .byte 0x2 .4byte 0x8f .byte 0x2 .byte 0x91 .sleb128 -24 .byte 0x0 .uleb128 0x4 .byte 0x4 .byte 0x5 .string "int" .uleb128 0x5 .byte 0x4 .4byte 0x95 .uleb128 0x6 .4byte 0x9a .uleb128 0x4 .byte 0x1 .byte 0x8 .string "char" .uleb128 0x2 .byte 0x1 .string "test_tas_int" .byte 0x1 .byte 0x9 .byte 0x1 .4byte 0x88 .4byte .LFB3 .4byte .LFE3 .4byte .LLST1 .4byte 0xd7 .uleb128 0x3 .string "ptr" .byte 0x1 .byte 0x8 .4byte 0xd7 .byte 0x2 .byte 0x91 .sleb128 -24 .byte 0x0 .uleb128 0x5 .byte 0x4 .4byte 0xdd .uleb128 0x6 .4byte 0x88 .byte 0x0 .section .debug_abbrev .uleb128 0x1 .uleb128 0x11 .byte 0x1 .uleb128 0x25 .uleb128 0x8 .uleb128 0x13 .uleb128 0xb .uleb128 0x3 .uleb128 0x8 .uleb128 0x1b .uleb128 0x8 .uleb128 0x11 .uleb128 0x1 .uleb128 0x12 .uleb128 0x1 .uleb128 0x10 .uleb128 0x6 .byte 0x0 .byte 0x0 .uleb128 0x2 .uleb128 0x2e .byte 0x1 .uleb128 0x3f .uleb128 0xc .uleb128 0x3 .uleb128 0x8 .uleb128 0x3a .uleb128 0xb .uleb128 0x3b .uleb128 0xb .uleb128 0x27 .uleb128 0xc .uleb128 0x49 .uleb128 0x13 .uleb128 0x11 .uleb128 0x1 .uleb128 0x12 .uleb128 0x1 .uleb128 0x40 .uleb128 0x6 .uleb128 0x1 .uleb128 0x13 .byte 0x0 .byte 0x0 .uleb128 0x3 .uleb128 0x5 .byte 0x0 .uleb128 0x3 .uleb128 0x8 .uleb128 0x3a .uleb128 0xb .uleb128 0x3b .uleb128 0xb .uleb128 0x49 .uleb128 0x13 .uleb128 0x2 .uleb128 0xa .byte 0x0 .byte 0x0 .uleb128 0x4 .uleb128 0x24 .byte 0x0 .uleb128 0xb .uleb128 0xb .uleb128 0x3e .uleb128 0xb .uleb128 0x3 .uleb128 0x8 .byte 0x0 .byte 0x0 .uleb128 0x5 .uleb128 0xf .byte 0x0 .uleb128 0xb .uleb128 0xb .uleb128 0x49 .uleb128 0x13 .byte 0x0 .byte 0x0 .uleb128 0x6 .uleb128 0x35 .byte 0x0 .uleb128 0x49 .uleb128 0x13 .byte 0x0 .byte 0x0 .byte 0x0 .section .debug_pubnames,"",@progbits .4byte 0x31 .2byte 0x2 .4byte .Ldebug_info0 .4byte 0xe3 .4byte 0x52 .string "test_tas_char" .4byte 0xa2 .string "test_tas_int" .4byte 0x0 .section .debug_aranges,"",@progbits .4byte 0x1c .2byte 0x2 .4byte .Ldebug_info0 .byte 0x4 .byte 0x0 .2byte 0x0 .2byte 0x0 .4byte .Ltext0 .4byte .Letext0-.Ltext0 .4byte 0x0 .4byte 0x0 .ident "GCC: (GNU) 4.2.1 20070831 patched [FreeBSD]" .section .note.GNU-stack,"",@progbits -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> On 2016-12-30 00:44:33 -0500, Tom Lane wrote:
>>> Perhaps it could be argued that there's a FreeBSD compiler bug here,
>>> but what I'm wondering is why configure believes that this:
>>>
>>> [char lock = 0;
>>> __sync_lock_test_and_set(&lock, 1);
>>> __sync_lock_release(&lock);])],
>>>
>>> is going to draw a hard error on platforms without byte-wide
>atomics.
>>> My sense of C semantics is that the best you could hope for is a
>>> warning
>
>> Well, in theory these aren't actual functions but intrinsics with
>> special behaviour ("by being overloaded so that they work on multiple
>> types."). The compiler is supposed to warn and link to an external
>> function if a certain operation is not available:
>
>Yeah, I read that. But configure tests don't normally notice warnings.
IIRC those functions are *not* supposed to be provided, I.e. they should result in a linker error.
BTW, it's unclear why there's no 1 byte atomics support in that compiler, isn't it? On a ll/sc arch like ppc it should (and IIRC is) be supported efficiently. As you suggest, using a lwarx should just work.
> I'm not very good at reading
PPC assembler, but I think what is happening in the "char" case is that
gcc is trying to emulate a byte-wide operation using a word-wide one,
ie an lwarx/stwcx. loop. Even if that works (and apparently it does not),
we would not want to use it since it implies contention between adjacent
atomic flags.
That's the case as long as they're on a single cache line. Whether it's the same word, double/half word shouldn't matter.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: >and got no warnings and the attached output. I'm not very good at >reading >PPC assembler, but I think what is happening in the "char" case is that >gcc is trying to emulate a byte-wide operation using a word-wide one, >ie an lwarx/stwcx. loop. Hm. This seems to suggest a straight out code generation bug in that compiler, not a failure in intrinsic detection. I'll note that there's certainly ppc64 machine with that intrinsic working (tested that on the community hydra during atomicsdevelopment). So either it's a bug specific to some compiler version, or 32bit ppc. I assume there's no trivial way to get a newer compiler on that machine? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> and got no warnings and the attached output. I'm not very good at >> reading PPC assembler, but I think what is happening in the "char" case is that >> gcc is trying to emulate a byte-wide operation using a word-wide one, >> ie an lwarx/stwcx. loop. > Hm. This seems to suggest a straight out code generation bug in that compiler, not a failure in intrinsic detection. It does smell like a codegen bug; I'm planning to file a FreeBSD bug report once I have more data (like whether 11.0 is busted similarly). But that doesn't really detract from my point, which is that it's totally silly for us to imagine that char and word-wide atomic ops are interchangeable on all platforms and we can flip a coin to decide which to use as long as the configure probes don't fail outright. Even given working code for the byte case, we ought to prefer int atomics on PPC. On other platforms, maybe the preference goes the other way. I'd be inclined to follow the hard-won knowledge embedded in s_lock.h about which to prefer. Or in words of one syllable: this isn't the first bug we've seen in compiler intrinsics, and it won't be the last. We need a way to deal with that, not just claim it isn't our problem. regards, tom lane
I wrote: > But that doesn't really detract from my point, which is that it's > totally silly for us to imagine that char and word-wide atomic ops are > interchangeable on all platforms and we can flip a coin to decide which > to use as long as the configure probes don't fail outright. Even given > working code for the byte case, we ought to prefer int atomics on PPC. > On other platforms, maybe the preference goes the other way. I'd be > inclined to follow the hard-won knowledge embedded in s_lock.h about > which to prefer. After further study, I'm inclined to just propose that we flip the default width of pg_atomic_flag in generic-gcc.h: use int not char if both are available. The only modern platform where that's the wrong thing is x86, and that case is irrelevant here because we'll be using arch-x86.h not generic-gcc.h. A survey of s_lock.h shows that we prefer char-width slock_t only on these architectures: x86 sparc (but not sparcv9, there we use int) m68k vax So basically, the existing coding is optimizing for obsolete hardware, and not even very much of that. regards, tom lane
On Mon, Jan 2, 2017 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> But that doesn't really detract from my point, which is that it's >> totally silly for us to imagine that char and word-wide atomic ops are >> interchangeable on all platforms and we can flip a coin to decide which >> to use as long as the configure probes don't fail outright. Even given >> working code for the byte case, we ought to prefer int atomics on PPC. >> On other platforms, maybe the preference goes the other way. I'd be >> inclined to follow the hard-won knowledge embedded in s_lock.h about >> which to prefer. > > After further study, I'm inclined to just propose that we flip the default > width of pg_atomic_flag in generic-gcc.h: use int not char if both are > available. The only modern platform where that's the wrong thing is x86, > and that case is irrelevant here because we'll be using arch-x86.h not > generic-gcc.h. > > A survey of s_lock.h shows that we prefer char-width slock_t only on > these architectures: > > x86 > sparc (but not sparcv9, there we use int) > m68k > vax I don't think that's right, because on my MacBook Pro: (gdb) p sizeof(slock_t) $1 = 1 (gdb) On Linux x86_64: (gdb) p sizeof(slock_t) $1 = 1 I think we would be well-advised to get the size of slock_t down to a single byte on as many platforms as possible, because when it's any wider than that it makes some critical structures that would otherwise fit into a cache line start to not fit, and that can have a very big impact on performance. I'm disappointed to notice that it's 4 bytes wide on hydra (ppc64). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 2, 2017 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> After further study, I'm inclined to just propose that we flip the default >> width of pg_atomic_flag in generic-gcc.h: use int not char if both are >> available. The only modern platform where that's the wrong thing is x86, >> and that case is irrelevant here because we'll be using arch-x86.h not >> generic-gcc.h. >> >> A survey of s_lock.h shows that we prefer char-width slock_t only on >> these architectures: >> >> x86 >> sparc (but not sparcv9, there we use int) >> m68k >> vax > I don't think that's right, because on my MacBook Pro: ... which is an x86, which won't be affected by the proposed change. > I think we would be well-advised to get the size of slock_t down to a > single byte on as many platforms as possible, because when it's any > wider than that it makes some critical structures that would otherwise > fit into a cache line start to not fit, and that can have a very big > impact on performance. I really doubt that that's a good argument for choosing a markedly less efficient locking primitive, which is what's at stake for PPC. I have no info about the other architectures. Also, since pg_atomic_flag is currently used in a grand total of zero places (other than the test case in regress.c), arguing that making it word-wide will bloat critical data structures is flat wrong. regards, tom lane
On Tue, Jan 3, 2017 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> A survey of s_lock.h shows that we prefer char-width slock_t only on >>> these architectures: >>> >>> x86 >>> sparc (but not sparcv9, there we use int) >>> m68k >>> vax > >> I don't think that's right, because on my MacBook Pro: > > ... which is an x86, which won't be affected by the proposed change. Uh, x86 means 32-bit to me, and this MacBook Pro is definitely x86_64. >> I think we would be well-advised to get the size of slock_t down to a >> single byte on as many platforms as possible, because when it's any >> wider than that it makes some critical structures that would otherwise >> fit into a cache line start to not fit, and that can have a very big >> impact on performance. > > I really doubt that that's a good argument for choosing a markedly less > efficient locking primitive, which is what's at stake for PPC. I have > no info about the other architectures. I don't know for certain, but I would not be willing to take that on faith. I'm not sure if you've read all of the discussion threads on this mailing list about fitting things into cache lines and/or aligning things to cache lines to avoid major performance regressions on large servers, but there have been quite a few of those over the last few years and there will doubtless be more. > Also, since pg_atomic_flag is currently used in a grand total of zero > places (other than the test case in regress.c), arguing that making > it word-wide will bloat critical data structures is flat wrong. Well that just begs the question of whether we should rip it out. If it's unused, then, yes, the performance characteristics don't matter, but if it's gonna get used for anything important, I maintain that both the speed of the implementation and the number of bytes that it consumes will be relevant. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 3, 2017 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> A survey of s_lock.h shows that we prefer char-width slock_t only on >>>> these architectures: >>>> >>>> x86 >>>> sparc (but not sparcv9, there we use int) >>>> m68k >>>> vax >>> I don't think that's right, because on my MacBook Pro: >> ... which is an x86, which won't be affected by the proposed change. > Uh, x86 means 32-bit to me, and this MacBook Pro is definitely x86_64. Sorry for the confusion. x86 subsumes x86_64 so far as the atomics code is concerned, and I was following that terminology. >> Also, since pg_atomic_flag is currently used in a grand total of zero >> places (other than the test case in regress.c), arguing that making >> it word-wide will bloat critical data structures is flat wrong. > Well that just begs the question of whether we should rip it out. If > it's unused, then, yes, the performance characteristics don't matter, > but if it's gonna get used for anything important, I maintain that > both the speed of the implementation and the number of bytes that it > consumes will be relevant. [ shrug... ] I have not seen you putting any effort into optimizing slock_t on non-x86 architectures, where it might make a difference today. Why all the concern about shaving hypothetical future bytes for pg_atomic_flag? But to reduce this to the simplest possible terms: it really does not matter whether the implementation saves bytes or cycles or anything else, if it fails to *work*. The existing coding for PPC fails on FreeBSD, and likely on some other platforms using the same compiler. I'd be the first to say that that doesn't reflect well on the state of their PPC port, but it's reality that we ought to be cognizant of. And we've worked around similar bugs nearby, eg see this bit in atomics.h: * Given a gcc-compatible xlc compiler, prefer the xlc implementation. The* ppc64le "IBM XL C/C++ for Linux, V13.1.2" implementsboth interfaces, but* __sync_lock_test_and_set() of one-byte types elicits SIGSEGV. From here it seems like you're arguing that we mustn't give FreeBSD the identical pass that we already gave IBM, for what is nearly the same bug. regards, tom lane
On Tue, Jan 3, 2017 at 12:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ shrug... ] I have not seen you putting any effort into optimizing > slock_t on non-x86 architectures, where it might make a difference today. > Why all the concern about shaving hypothetical future bytes for > pg_atomic_flag? I don't know what you're getting all wrapped around the axle here about. I already explained why I think it's an issue. You can disagree if you like. As to whether I've put any effort into optimizing slock_t on non-x86 architectures, I commented in my first post to this thread about my disappointment regarding the situation on ppc64. I didn't realize that we had that issue and I think it would be worth fixing at some point, but I haven't quite gotten around to it in the 4 hours since I had that realization. I have previously done work on non-x86 spinlocks, though (c01c25fbe525869fa81237954727e1eb4b7d4a14). > But to reduce this to the simplest possible terms: it really does not > matter whether the implementation saves bytes or cycles or anything else, > if it fails to *work*. The existing coding for PPC fails on FreeBSD, > and likely on some other platforms using the same compiler. I'd be the > first to say that that doesn't reflect well on the state of their PPC > port, but it's reality that we ought to be cognizant of. And we've > worked around similar bugs nearby, eg see this bit in atomics.h: > > * Given a gcc-compatible xlc compiler, prefer the xlc implementation. The > * ppc64le "IBM XL C/C++ for Linux, V13.1.2" implements both interfaces, but > * __sync_lock_test_and_set() of one-byte types elicits SIGSEGV. > > From here it seems like you're arguing that we mustn't give FreeBSD > the identical pass that we already gave IBM, for what is nearly the > same bug. The only point I'm making here is that the width of a spinlock is not irrelevant. Or at least, it didn't use to be. lwlock.h says: * LWLOCK_MINIMAL_SIZE should be 32 on basically all common platforms, but* because slock_t is more than 2 bytes on some obscureplatforms, we allow* for the possibility that it might be 64. That comment is actually nonsense since 008608b9d51061b1f598c197477b3dc7be9c4a64 but before that it was relevant. Also, before 48354581a49c30f5757c203415aa8412d85b0f70, a BufferDesc fit within a 64-byte cache line if slock_t was 1 or 2 bytes, a point commented on explicitly by 6150a1b08a9fe7ead2b25240be46dddeae9d98e1. So we've clearly made efforts in that direction in the past. Looking a little more, though, since both lwlock.c and bufmgr.c have been rewritten to use atomics directly, there might not be any remaining places where the spinlocks get hot enough to care about the extra bytes. All things being equal I still think a narrower one is significantly better than a wider one, but we can always leave solving that problem to a day when the difference can be proved out by benchmarks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > The only point I'm making here is that the width of a spinlock is not > irrelevant. Sure, but it does not follow that we need to get all hot and bothered about the width of pg_atomic_flag, which has yet to find its first use-case. When and if its width becomes a demonstrable issue, we'll have some work to do in this area ... but that was true already. > All things being equal > I still think a narrower one is significantly better than a wider one, > but we can always leave solving that problem to a day when the > difference can be proved out by benchmarks. I think we can agree on that conclusion. regards, tom lane